awk to remove range of fields

cmccabe · July 7, 2016, 4:34pm

I am trying to cut a range of fields in awk . The below seems to work for removing field 50, but what is the correct syntax for removing a range ( $50-$62 ). Thank you :).

awk

awk 'BEGIN{FS=OFS="\t"}{$50=""; gsub(/\t\t/,"\t")}1' test.vcf.hg19_multianno.txt > output.csv

Maybe:

awk 'BEGIN{FS=OFS="\t"}{$50:$62=""; gsub(/\t\t/,"\t")}1' test.vcf.hg19_multianno.txt > output.csv

RudiC · July 7, 2016, 5:02pm

awk 'BEGIN{FS=OFS="\t"} {for (i=50; i<=62; i++) $i = ""; gsub(/\t+/,"\t")}1'

(untested)

cmccabe · July 7, 2016, 5:16pm

works great... thank you :).

Don_Cragun · July 7, 2016, 7:08pm

The code that you posted originally and the code suggested by RudiC will remove any empty fields from your input file in addition to the fields you want to remove. The following will only remove fields 50 through 62, inclusive:

awk '
BEGIN {	FS = OFS = "\t"
}
{	for(i = 1; i <= NF; i++)
		if(i < 50 || i > 62)
			printf("%s%s", $i, (i == NF) ? ORF : OFS)
}' test.vcf.hg19_multianno.txt > output.csv

The above code should do what you want (assuming that you have at least 63 fields in each input line). If some lines have less than 63 input fields, slightly different logic would be needed to ensure that each line is properly terminated and that no unneeded field separators are included in the output (after we get a clear description of whether empty fields should be added to the ends of short field count lines or if they should be omitted).

As always, if you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk .

Chubler_XL · July 7, 2016, 7:26pm

This might be a little safer if you have empty fields somewhere on the line or lines with less than 62 fields:

awk -v F=50 -v T=62 '
BEGIN{FS=OFS="\t"}
{ b=T+1
  t=T<NF?T:NF
  for(i=F;i<NF-t+F;i++) $i=$(b++)
  NF=--i}1'

Aia · July 7, 2016, 11:15pm

cmccabe:

I am trying to cut a range of fields in awk . The below seems to work for removing field 50, but what is the correct syntax for removing a range ( $50-$62 ). Thank you :).

awk
awk 'BEGIN{FS=OFS="\t"}{$50=""; gsub(/\t\t/,"\t")}1' test.vcf.hg19_multianno.txt > output.csv
 
Maybe:
awk 'BEGIN{FS=OFS="\t"}{$50:$62=""; gsub(/\t\t/,"\t")}1' test.vcf.hg19_multianno.txt > output.csv 

Alternative?

perl -nale '$"="\t"; print "@F[0..48,62..$#F]"' test.vcf.hg19_multianno.txt > output.csv

cmccabe · July 8, 2016, 3:56pm

Thank you all... works great :).