Currently I am using this command to split a CSV file based on the distinct values in the 16th (position) column.
awk -F, 'NR==1 { hdr=$0; next } $16 != prev { prev=name=$16; gsub(/[^[:alnum:]_]/,"",name); $0 = hdr "\n" $0 } { print > ("/Directory/File."name".Download.csv") }' /Directory/File.ALL.Download.csv
Below is a sample of the data:
"Column1-Name", "Column2-Amount",... "Column16-ID"
" A, and B", " 100.00", " 000000001"
" CD", " 100.00", " 000000001"
" EF", " 100.00", " 000000001"
I'm unsure how to modify this to handle the introduction of commas within a field in the record (in the first record "A, and B"). I tried the following with no success:
awk -F FS="\",\"" 'NR==1 { hdr=$0; next } $16 != prev { prev=name=$16; gsub(/[^[:alnum:]_]/,"",name); $0 = hdr "\n" $0 } { print > ("/Directory/File."name".Download.csv") }' /Directory/File.ALL.Download.csv
All suggestions welcome, thanks!
RudiC
July 30, 2014, 11:10am
2
Try sth along this line:
awk 'NR>1 {FS=OFS="\""; $0=$0 # recalculate fields based on " delimiter
for (i=2; i<=NF; i+=2) gsub (/,/,"\001", $i) # replace , in " delimited fields with a token
FS=OFS=","; $0=$0 # recalculate fields based on , delimiter
$2 = $1"abcd" # do whatever you need
gsub ("\001", ",") # just before printing anything, put back the inner ","
}
1
' file
If you use awk, you've to write a lot of boiler-plate code. Try perl with the Text::CSV module. It will do this job neatly for you.
#! /usr/bin/perl
use warnings;
use strict;
use Text::CSV;
my ($csv, $fh, $row);
my (@fields);
$csv = Text::CSV->new ({ binary => 1, eol => $/ });
open $fh, "< file";
while ($row = $csv->getline($fh)) {
@fields = @{ $row };
print "@fields\n";
}
close $fh;
[user@host ~]$ cat file
a,b,"c,d,e",f,g
p,q,"r,s,t",u,v
[user@host ~]$ ./test.pl
a b c,d,e f g
p q r,s,t u v
[user@host ~]$
1 Like
Agreed, but unfortunately in this instance Perl is not available.
---------- Post updated at 12:30 PM ---------- Previous update was at 10:43 AM ----------
After some further review, I realized I don't need the FS= declaration. The following commond provides the desired result.
awk -F "\",\"" 'NR==1 { hdr=$0; next } $16 != prev { prev=name=$16; gsub(/[^[:alnum:]_]/,"",name); $0 = hdr "\n" $0 } { print > ("/Directory/File."name".Download.csv") }' /Directory/File.ALL.Download.csv
Thanks all for your input.