Hi, all
I need to get fields in a line that are separated by commas, some of the fields are enclosed with double quotes, and they are supposed to be treated as a single field even if there are commas inside the quotes.
sample input:
aaa,"hell world, test text",bbb,ccc," test text"
for this line, 5 fields are supposed to be extracted, they are:
aaa
"hell world, test text"
bbb
ccc
" test text"
Is there an easy way to achieve this using awk?
If Perl is acceptable:
perl -MText::ParseWords -nle'
print ++$c, ". ", $_
for parse_line(",", 1, $_);
' infile
Do you need to reset the counter on every row?
---------- Post updated at 05:11 PM ---------- Previous update was at 05:04 PM ----------
As far as CSV parsing with awk is concerned see lorance.freeshell.org/csv/
radoulov:
If Perl is acceptable:
perl -MText::ParseWords -nle'
print ++$c, ". ", $_
for parse_line(",", 1, $_);
' infile
Do you need to reset the counter on every row?
---------- Post updated at 05:11 PM ---------- Previous update was at 05:04 PM ----------
As far as CSV parsing with awk is concerned see lorance.freeshell.org/csv/
Hi, radoulov
Thank you so much, the perl code is neat, but I have to choose to stick with awk for the moment, cause I don't know much about perl, I just want to analyze a simple accesslog file produced by HTTP server.
Thank you for the link, I'll take a look at it.
you can give this awk script a try...
awk -F, '{
for (i=1; i<=NF; i++) {
if (s) {
if ($i ~ "\"$") {print s","$i; s=""}
else s = s","$i
}
else {
if ($i ~ "^\".*\"$") print $i
else if ($i ~ "^\"") s = $i
else print $i
}
}
}' file
I'd suggest sticking with the CSV parser linked, it deals with a lot of things that come up in CSV files. Like field with imbedded CRs or quotes:
Test,csv,file,"Multi
line field", rest
Also,some,imbedded,"Quoted ""strings""",can exist
Try:
sed 's/,\("[^"]*"\)*/\n\1/g'
$ echo 'aaa,"hell world, test text",bbb,ccc," test text"' | sed 's/,\("[^"]*"\)*/\n\1/g'
aaa
"hell world, test text"
bbb
ccc
" test text"
Nope:
$ echo '"hello world, test text", aaa, bbb, ccc' | sed 's/,\("[^"]*"\)*/\n\1/g'
"hello world
test text"
aaa
bbb
ccc
1 Like
Right, this ought to do it then..
sed 's/\("[^"]*"\)*,\("[^"]*"\)*/\1\n\2/g'
1 Like
Nice work Scrutinizer. You have inspired me to try and enhance it to also support embeded quotes.
This solution puts 1 csv field per line:
csv.sed
sed '
: loop
s/\(,"[^"]*\)""\([^"]*\)""/\1_QUOTE_\2_QUOTE_/g;
t loop
s/\("[^"]*"\)*,\("[^"]*"\)*/\1\n\2/g;
s/"//g;
s/_QUOTE_/"/g;'
Example:
$ echo 'A,"""fight"", or ""flight""",C' | ./csv.sed
A
"fight", or "flight"
C