Working with CSV files values enclosed with ""

santhansk · December 18, 2013, 1:19pm

I have a CSV file as shown below

"1","SANTHA","KUMAR","SAM,MILLER","DEVELOPER","81,INDIA"
"2","KAPIL","DHAMI","ECO SPORT","DEVELOPER","82,INDIA"

File is comma delimited.All the field values are enclosed by double quotes. But while using awk or cut, it interprets the comma which is present in text field (enclosed by "") as a seperate fields.

eg:

awk -F',' '{print NF}' File

above command will give output as 8 fields for first record,
7 fields for second record but actually it is 6

how can i neglect the comma which is enclosed in ".

radoulov · December 18, 2013, 1:39pm

It's easy with Perl:

perl -MText::ParseWords -nle'
  print parse_line(",",0, $_)+0;
  ' infile

And with GNU awk >= 4:

awk '{ print NF }' FPAT='([^,]+)|("[^"]+")' infile

santhansk · December 18, 2013, 1:41pm

Thanks for your response. could you please explain the awk command

RudiC · December 18, 2013, 1:53pm

Read Remove the values from a certain column without deleting the Column name in a .CSV file by RudiC - Shell Programming and Scripting - Unix Linux Forums and adapt to your needs.

Akshay_Hegde · December 18, 2013, 1:56pm

You may try if awk < 4

$ cat file
"1","SANTHA","KUMAR","SAM,MILLER","DEVELOPER","81,INDIA"
"2","KAPIL","DHAMI","ECO SPORT","DEVELOPER","82,INDIA"

awk '      {
             column = 0
               $0   = $0","                                 
while($0)  {
             match($0,/ *"[^"]*" *,|[^,]*,/)
             substr($0,RSTART,RLENGTH)            
             ++column
             $0=substr($0,RLENGTH+1)                 
           }
             print column
           }
     ' file

$ sh tester.sh 
6
6

ctsgnb · December 18, 2013, 2:05pm

Also some ideas on Remove the values from a certain column without deleting the Column name in a .CSV file - Page 2 | Unix Linux Forums | Shell Programming and Scripting

radoulov · December 18, 2013, 2:18pm

It's explained in the GNU awk manual, check Defining Fields By Content.