selecting specific fields in a file (maybe with sed?)

menenuh · January 27, 2011, 5:57am

Hi,

I have a file with following lines:

chr1    10   AC=2;AF=1.00;AN=2;DP=2;Dels=0.00;HRun=0;HaplotypeScore=0.00;MQ=23.00;MQ0=0;QD=14.33;SB=-10.01
chrX    18   AB=0.52;AC=1;AF=0.50;AN=2;DP=203;DS;Dels=0.00;HRun=0;HaplotypeScore=20.01;MQ=15.63;MQ0=85;QD=12.80;SB=-1289.58

I need to extract 4 fields from these lines, the 1st and 2nd column, and AF and DP values. I could have used cut command if AF and DP were printed in the same order, but this is not the case.

I think forming columns (by separating the 3rd line) and removing any column not containing AF or DP would be a nice solution, but I am not an expert on sed. I tried a couple of commands, but to no avail.

citaylor · January 27, 2011, 6:12am

How about:

awk '{ AF=""; DP=""; split($0,fa,/;/); for(f in fa) { if(fa[f] ~ /AF=/) AF=fa[f]; if(fa[f] ~ /DP=/) DP=fa[f]; } print $1 " " $2 " " AF " " DP; }' infile

Scrutinizer · January 27, 2011, 6:14am

Try:

awk -F '[ \t;]*' '{s=$1 OFS $2;for(i=3;i<=NF;i++)if ($i~/DP=|AF=/)s=s OFS $i; print s}' infile