Help -- Modify awk command to handle "," delimiter

lojkyelo · July 30, 2014, 10:34am

Currently I am using this command to split a CSV file based on the distinct values in the 16th (position) column.

awk -F, 'NR==1 { hdr=$0; next } $16 != prev { prev=name=$16; gsub(/[^[:alnum:]_]/,"",name); $0 = hdr "\n" $0 } { print > ("/Directory/File."name".Download.csv") }' /Directory/File.ALL.Download.csv

Below is a sample of the data:

"Column1-Name", "Column2-Amount",... "Column16-ID"
" A, and B", " 100.00", " 000000001"
" CD", " 100.00", " 000000001"
" EF", " 100.00", " 000000001"

I'm unsure how to modify this to handle the introduction of commas within a field in the record (in the first record "A, and B"). I tried the following with no success:

awk -F FS="\",\"" 'NR==1 { hdr=$0; next } $16 != prev { prev=name=$16; gsub(/[^[:alnum:]_]/,"",name); $0 = hdr "\n" $0 } { print > ("/Directory/File."name".Download.csv") }' /Directory/File.ALL.Download.csv

All suggestions welcome, thanks!

RudiC · July 30, 2014, 11:10am

Try sth along this line:

awk     'NR>1   {FS=OFS="\""; $0=$0                                     # recalculate fields based on " delimiter
                 for (i=2; i<=NF; i+=2) gsub (/,/,"\001", $i)           # replace , in " delimited fields with a token
                 FS=OFS=","; $0=$0                                      # recalculate fields based on , delimiter

                 $2 = $1"abcd"                                          # do whatever you need

                 gsub ("\001", ",")                                     # just before printing anything, put back the inner ","
                }
         1
        ' file

balajesuri · July 30, 2014, 11:21am

If you use awk, you've to write a lot of boiler-plate code. Try perl with the Text::CSV module. It will do this job neatly for you.

#! /usr/bin/perl

use warnings;
use strict;
use Text::CSV;

my ($csv, $fh, $row);
my (@fields);

$csv = Text::CSV->new ({ binary => 1, eol => $/ });

open $fh, "< file";

while ($row = $csv->getline($fh)) {
    @fields = @{ $row };
    print "@fields\n";
}

close $fh;

[user@host ~]$ cat file
a,b,"c,d,e",f,g
p,q,"r,s,t",u,v
[user@host ~]$ ./test.pl
a b c,d,e f g
p q r,s,t u v
[user@host ~]$

lojkyelo · July 30, 2014, 1:30pm

Agreed, but unfortunately in this instance Perl is not available.

---------- Post updated at 12:30 PM ---------- Previous update was at 10:43 AM ----------

After some further review, I realized I don't need the FS= declaration. The following commond provides the desired result.

awk -F "\",\"" 'NR==1 { hdr=$0; next } $16 != prev { prev=name=$16; gsub(/[^[:alnum:]_]/,"",name); $0 = hdr "\n" $0 } { print > ("/Directory/File."name".Download.csv") }' /Directory/File.ALL.Download.csv

Thanks all for your input.