awk, comma as field separator and text inside double quotes as a field.

Hi, all
I need to get fields in a line that are separated by commas, some of the fields are enclosed with double quotes, and they are supposed to be treated as a single field even if there are commas inside the quotes.
sample input:

for this line, 5 fields are supposed to be extracted, they are:

Is there an easy way to achieve this using awk?

If Perl is acceptable:

perl -MText::ParseWords -nle'
  print ++$c, ". ", $_ 
    for parse_line(",", 1, $_);
  ' infile 

Do you need to reset the counter on every row?

---------- Post updated at 05:11 PM ---------- Previous update was at 05:04 PM ----------

As far as CSV parsing with awk is concerned see lorance.freeshell.org/csv/

Hi, radoulov
Thank you so much, the perl code is neat, but I have to choose to stick with awk for the moment, cause I don't know much about perl, I just want to analyze a simple accesslog file produced by HTTP server.

Thank you for the link, I'll take a look at it.

you can give this awk script a try...

awk -F, '{
  for (i=1; i<=NF; i++) {
    if (s) {
      if ($i ~ "\"$") {print s","$i; s=""}
      else s = s","$i
    }
    else {
      if ($i ~ "^\".*\"$") print $i
      else if ($i ~ "^\"") s = $i
      else print $i
    }
  }
}' file

I'd suggest sticking with the CSV parser linked, it deals with a lot of things that come up in CSV files. Like field with imbedded CRs or quotes:

Test,csv,file,"Multi
line field", rest
Also,some,imbedded,"Quoted ""strings""",can exist

Try:

sed 's/,\("[^"]*"\)*/\n\1/g'
$ echo 'aaa,"hell world, test text",bbb,ccc," test text"' | sed 's/,\("[^"]*"\)*/\n\1/g'
aaa
"hell world, test text"
bbb
ccc
" test text"

Nope:

$ echo '"hello world, test text", aaa, bbb, ccc' | sed 's/,\("[^"]*"\)*/\n\1/g'

"hello world
 test text"
 aaa
 bbb
 ccc
1 Like

Right, this ought to do it then..

sed 's/\("[^"]*"\)*,\("[^"]*"\)*/\1\n\2/g'
1 Like

Nice work Scrutinizer. You have inspired me to try and enhance it to also support embeded quotes.

This solution puts 1 csv field per line:

csv.sed

sed '
: loop
s/\(,"[^"]*\)""\([^"]*\)""/\1_QUOTE_\2_QUOTE_/g;
t loop
s/\("[^"]*"\)*,\("[^"]*"\)*/\1\n\2/g;
s/"//g;
s/_QUOTE_/"/g;'

Example:

$ echo 'A,"""fight"", or ""flight""",C' | ./csv.sed
A
"fight", or "flight"
C