Extract multiple columns base on double quotes as delimiter

weknowd · August 12, 2015, 10:46am

Hi All,

I have my data like below

"1","abc,db","hac,aron","4","5"

Now I need to extract 1,2,4th columns

Output should be like

"1",abc,db","4"

Am trying to use cut command but not able to get the results.

Thanks in advance.

RudiC · August 12, 2015, 11:20am

The simplest approach might be doubling the field No.s:

awk '{OFS=FS "," FS;  print FS $2 OFS $4 OFS $8 FS}' FS="\"" file
"1","abc,db","4"

Another woud mess around with the individual fields:

awk '
function setFS(O)       {OFS=FS=O
                         $0=$0
                        }

        {setFS("\"");
         for (i=2; i<=NF; i+=2) gsub (/,/,"\a",$i)

         setFS(",")
         $0= $1 OFS $2 OFS $4

         gsub ("\a", ",")
         print
        }
' file
"1","abc,db","4"

RavinderSingh13 · August 12, 2015, 11:42am

Hello weknowd,

Following may also help you in same.

 awk -F'\",\"' '{print $1 FS $2 FS $4 OFS}' OFS="\""  Input_file

Output will be as follows.

 "1","abc,db","4"

Thanks,
R. Singh

Aia · August 12, 2015, 12:06pm

perl -aF'(?<="),(?=")' -nle 'print "@F[0,1,3]"' test.file

Don_Cragun · August 12, 2015, 12:16pm

You can use cut , you just have to keep track of what cut sees as a field separator and what you see as your field separator.

If your fields 2 and 3 ALWAYS contain quoted strings each containing a single comma (as in your sample, you can use cut to get what you want using:

cut -d, -f1-3,6 file

(realizing that cut sees comma separated fields 2 and 3 are your field 2 and it sees comma separated field 4 and 5 as your field 3).

Or using double-quote as your field separator and realizing that cut would then see the field before the 1st double-quote and the field after the last double-quote as empty, and that cut would see the commas you're viewing as field separators as data between field separators, you could try:

cut -d'"' -f1-5,8,11 file

where field 1 is the empty field at the start of the line, 2 is your field 1, 3 is the comma separating your fields 1 and 2, 4 is field your field 2, 5 is the comma separating your fields 2 and 3, 8 is your field 4, and 11 is the empty field at the end of your line.