multiple delimeters in awk

baskivs · October 14, 2011, 2:12am

Hi all,

The have 92 columns with combination of "" and , two delimiters and i need to takes some 32 columns only in that. i used awk command to extract .but its not working good.

Example: "aaaa","10,00.00",work,5555,.............

Command i tried :

awk -F"[ ,,"" ]" -v OFS="," 'FNR>1{print $1,$5,$6,$15,$17,$24,$25,$26,$27,$28,$31,$32,$34,$35,$37,$39,$40,$41,$42,$43,$46,$72,$73,$74,$77,$80,$81,$82,$84,$88,$90,$92}' >> outfile.csv

I would really appreciate if some one helps as soon as poss

Thanks,
Baski

jayan_jay · October 14, 2011, 2:17am

$ sed 's,\",,g' infile | awk -F, '{ print $1............$92 }' >> outfile.csv

CarloM · October 14, 2011, 5:21am

Does the output need to be in the same format (i.e. comma-separated and with quotes preserved)?

Note that this will split the single field "10,00.00" into 10 and 00.00.

baskivs · October 14, 2011, 5:39am

Thanks a lot frnds.....

yes Carlom , it it should be preserved with quote...like 10,000 not 10 and 000

but sed is not reading multiple files at time

mean,

$ sed 's,\",,g' infile*.csv | awk -F, '{ print $1............$92 }' >> outfile.csv

rikxik · October 14, 2011, 5:43am

The command looks ok. Post your error.

jayan_jay · October 14, 2011, 5:58am

Sorry .. donot have enough time and hence done it in ugly mode ..

for i in `ls infile*.csv`
do
sed 's,[0-9]\,,&%,g;
  s,\,%\",|,g;
  s,\"\,\",|,g;
  s,\"\,,|,g;
  s,\,%,?,g;
  s,\,,|,g;
  s,?,\,,g;
  s,\",,g' $i |
awk -F\| '{ print $1............$92 }' >> outfile.csv 
done

CarloM · October 14, 2011, 6:21am

This is probably a deeply inelegant way of doing it, but it's just a quickie from some other code I had lying around (that I probably copied from somewhere else!).

#  cat csv2.awk
BEGIN {
        if (NUMCOLS == "") NUMCOLS=32
        if (DELIM == "") DELIM = "\t"
        if (REPL == "") REPL = "~"
}
{
        gsub(DELIM, REPL)
        $0 = gensub(/([^,])\"\"/, "\\1'", "g")
        out = ""
        n = length($0)
        for (i = 1;  i <= n;  i++) {
                if ((ch = substr($0, i, 1)) == "\"") {
                        inString = (inString) ? 0 : 1
                }
                out = out ((ch == "," && ! inString) ? DELIM : ch)
        }
        nfields=split(out,outfields,DELIM);
        for (i=1;(i<=nfields)&&(i<=NUMCOLS);i++) {
                if (i > 1) {
                        printf (",");
                }
                printf ("%s", outfields);
        }
        printf ("\n");
}

#  cat a.csv
"aaaa","10,00.00",work,5555,"aaaa","10,00.00",work,5555,"aaaa","10,00.00",work,5555,"aaaa","10,00.00",work,5555,"aaaa","10,00.00",work,5555,"aaaa","10,00.
 etc...

# awk -f csv2.awk a.csv
"aaaa","10,00.00",work,5555,"aaaa","10,00.00",work,5555,"aaaa","10,00.00",work,5555,"aaaa","10,00.00",work,5555,"aaaa","10,00.00",work,5555,"aaaa","10,00.00",work,5555,"aaaa","10,00.00",work,5555,"aaaa","10,00.00",work,5555

(it will fail if there are any tabs in the data, but you can change the delimiter to something else if need be)

binlib · October 14, 2011, 1:41pm

Get gawk 4. Say you want fields 5, 2, 3, 7 10:

gawk4 'BEGIN {
  n = split("5 2 3 7 10", f, " ")
  FPAT = "([^,]*)|(\"[^\"]*\")"
}
{
  for (i = 1; i < n; ++i)
    printf("%s,", $f)
  print $f[n]
}' file