How to extract 4th field if numerics?

CHoggarth · December 20, 2012, 5:50am

I have a file which contains fields comma separated & with each field surrounded by quotes. The 4th field contains either a serial number, the text ABC, the text XYZ or it's blank. I want to only extract records which have a serial number. Here's some sample data:

> cat myfile
"ABC123","00CJ","SP16","","SP265","","S248","19/05/2003"
"ABC123","00CJ","SP16","00240095028","SP265","","S248","19/05/2003"
"ABC123","00CJ","SP16","ABC","SP265","","S248","19/05/2003"
"ABC123","00CJ","SP16","00240104067","SP265","","S248","19/05/2003"
"ABC123","00CJ","SP16","XYZ","SP265","","S248","19/05/2003"
"ABC123","00CJ","SP16","XYZ","SP265","","S248","19/05/2003"
"ABC123","00CJ","SP16","","SP265","","S248","19/05/2003"
"ABC123","00CJ","SP16","00240025035","SP265","","S248","19/05/2003"
>

I thought I could do this using this command but I now realise that the .* construct selects the entire string until it finds a field which satisfies [0-9].* This means that all fields are selected.

> grep  "\".*\",\".*\",\".*\",\"[0-9].*\"" myfile
"ABC123","00CJ","SP16","","SP265","","S248","19/05/2003"
"ABC123","00CJ","SP16","00240095028","SP265","","S248","19/05/2003"
"ABC123","00CJ","SP16","ABC","SP265","","S248","19/05/2003"
"ABC123","00CJ","SP16","00240104067","SP265","","S248","19/05/2003"
"ABC123","00CJ","SP16","XYZ","SP265","","S248","19/05/2003"
"ABC123","00CJ","SP16","XYZ","SP265","","S248","19/05/2003"
"ABC123","00CJ","SP16","","SP265","","S248","19/05/2003"
"ABC123","00CJ","SP16","00240025035","SP265","","S248","19/05/2003"
>

What I want are just these records:

"ABC123","00CJ","SP16","00240095028","SP265","","S248","19/05/2003"
"ABC123","00CJ","SP16","00240104067","SP265","","S248","19/05/2003"
"ABC123","00CJ","SP16","00240025035","SP265","","S248","19/05/2003"

There must be a simple way to select only records with a 4th field that is a numeric value. Can anyone advise please?

Thanks, Chris

radoulov · December 20, 2012, 5:56am

grep -E '^("[^"]+",){3}"[0-9]+"' infile

If your PATH defaults to a grep implementation which is not POSIX, try with egrep.

Jairaj · December 20, 2012, 6:02am

Try this :

awk -F',"' '{print int($4)}' file

---------- Post updated at 06:02 AM ---------- Previous update was at 06:01 AM ----------

Sorry about my previous command.

awk -F',"' '{ if (int($4) > 0) print $0}' file

pamu · December 20, 2012, 6:02am

with awk..

awk -F '","'  '$4 ~ /^[0-9]+$/' file

0 also numeric value.

CHoggarth · December 21, 2012, 4:23am

Thanks for the replies - when I started testing more fully I discivered some other issues with my data which mean I needed to do somethign slightly different.

Thansk for your help though.