awk command to print only selected rows in a particular column specified by column name

ks_reddy · September 1, 2012, 9:55am

Dear All,

I have a data file input.csv like below. (Only five column shown here for example.)

Data1,StepNo,Data2,Data3,Data4
2,1,3,4,5
3,1,5,6,7
3,2,4,5,6
5,3,5,5,6

From this I want the below output

Data1,StepNo,Data2,Data3,Data4
2,1,3,4,5
3,1,5,6,7

where the second column StepNo contents are '1'.
I used the below simple script to get this output.

awk -F, '$2==1' input.csv

But many times the second column is not always StepNo as there are many versions of input files with varying column positions and total number of columns.
So I need a script(with awk) to get my required output by specifying column name instead of referring from column number .

Thanks in advance.
Sidda

jim_mcnamara · September 1, 2012, 10:09am

Try this:

col="Something"
awk -v col="$col" ' NR==1 {for(i=1; i<=NF ; i++){ if($1==col) {break} } ; next}
                          {print $i} '  infile.csv

msabhi · September 1, 2012, 10:16am

awk -F, 'NR==1{for(i=1;i<=NF;i++){if($i=="StepNo")x=i;}} NR>1{if($x=="1")print;}' input_file

ks_reddy · September 1, 2012, 11:08am

Hi Jim,
Something missing in this code to get my required output.
But the one liner code given by msabhi is perfect for my requirement with little change (removing NR>1 in the print section to print my headers also in the output).

Thank you very much any way for your quick reply.
Sidda

pamu · September 1, 2012, 11:32am

Edited Jim's solution.....

col="StepNo"
awk -F , -v col="$col" ' NR==1 {for(i=1; i<=NF ; i++){ if($i==col){ n=i ; print; break; } }} {if ( $n == "1" ) { print } }'  file