How to ignore quoted separators

Hi,

I'm trying to parse a text file which uses commas as field separators. Fields are double quoted, and may themselves contain commas, like this:

"1","John Smith","London","123"
"2","Mary Robertson","Horsham, Sussex","456"

This causes problems for the following command

cut -d"," -f4

For the first line, it correctly extracts the fourth field ["123"], but for the second, it is fooled by the comma in Mary's adress, and thinks the fourth field is [, Sussex"].

Happy to use awk or any other standard utility available. Please help!

thanks

The problem is your field separator is not comma if it's in the field.

At least not a proper one.

One way to get around is to use sed to replace "," with "|" and make pipe your separator (if it's not in the fields of course)

One way would be :

sed 's#","#"|"#g' input | cut -d "|" -f 4

Hope that helps
Regards
Peasant.

1 Like

Using awk you can set the separator to include the quotes:

$ echo '"2","Mary Robertson","Horsham, Sussex","456"' | awk -F'","' '{print $3}'
Horsham, Sussex

You'd have to strip the leading/trailing quote tho when accessing the first/last field, respectively.

$ echo '"2","Mary Robertson","Horsham, Sussex","456"' | awk -F'","' '{print $1}'
"2
$ echo '"2","Mary Robertson","Horsham, Sussex","456"' | awk -F'","' '{gsub(/(^"|"$)/,"");print $1}'
2
$ echo '"2","Mary Robertson","Horsham, Sussex","456"' | awk -F'","' '{gsub(/(^"|"$)/,"");print $4}'
456
1 Like

With Perl:

perl -MText::ParseWords -nle'
  print +(parse_line(",",0, $_))[3];
  ' infile  

GNU awk 4+:

awk '{ print $4 }' FPAT='([^,]+)|("[^"]+")' infile
1 Like
awk -F\" '{print $8}' infile

Or in this case:

awk -F\" '{print $(NF-1)}' infile 

including double quotes:

awk -F, '{print $NF}' infile
1 Like

Hi sven44,

One way with perl:

$ cat infile                                                                                                                                                                                                                                 
"1","John Smith","London","123"                                                                                                                                                                                                              
"2","Mary Robertson","Horsham, Sussex","456"                                                                                                                                                                                                 
$ perl -MText::CSV_XS -e 'while ( $colref = Text::CSV_XS->new->getline( ARGV ) ) {  printf qq[%s\n], $colref->[-1] }' infile                                                                                                               
123                                                                                                                                                                                                                                          
456
1 Like
$ awk -F'"' '{print $8}' sample4.txt
123
456


$ awk -F'"' '{print $(NF-1)}' sample4.txt
123
456
1 Like

Many thanks for helpful replies!

The most robust solution (in that it doesn't make assumptions about whether fields are quoted or not, etc.) seems to be radoulov's - except that my version of awk doesn't seem to support this 'FPAT' variable!

For anyone stumbling across this thread, I also found the following fiendish solution which is similarly robust. Just working out what this means really made my head hurt, but it works!

sed 's/,\("[^"]*"\)*/\n\1/g' infile

Many thanks