I have an input that looks like this:
chr1 mm9_knownGene utr3 3204563 3206102 0 - . gene_id "Xkr4"; transcript_id "uc007aeu.1";
chr1 mm9_knownGene utr3 4280927 4283061 0 - . gene_id "Rp1"; transcript_id "uc007aew.1";
chr1 mm9_knownGene utr3 4333588 4334680 0 - . gene_id "Rp1"; transcript_id "uc007aex.2";
chr1 mm9_knownGene utr3 4481009 4481796 0 - . gene_id "Sox17"; transcript_id "uc007aey.1";
chr1 mm9_knownGene utr3 4481009 4481796 0 - . gene_id "Sox17"; transcript_id "uc007aez.1";
chr1 mm9_knownGene utr3 4481009 4481796 0 - . gene_id "Sox17"; transcript_id "uc007afa.1";
chr1 mm9_knownGene utr3 4481009 4481796 0 - . gene_id "Sox17"; transcript_id "uc007afb.1";
chr1 mm9_knownGene utr3 4481009 4481796 0 - . gene_id "Sox17"; transcript_id "uc007afc.1";
chr1 mm9_knownGene utr3 4763279 4766544 0 - . gene_id "Mrpl15"; transcript_id "uc007aff.2";
chr1 mm9_knownGene utr3 4763279 4764532 0 - . gene_id "Mrpl15"; transcript_id "uc007afd.2";
I am changing columns 4 and 5 with this awk line:
awk '{FS=OFS="\t"} {if($5-$4<=200) print $0; else if($5-$4>200) print $1,$2,$3,$4,$4+200,$6,$7,$8,$9}'
Which gives this output:
chr1 mm9_knownGene utr3 3204563 3204763 0 - . gene_id
chr1 mm9_knownGene utr3 4280927 4281127 0 - . gene_id "Rp1"; transcript_id "uc007aew.1";
chr1 mm9_knownGene utr3 4333588 4333788 0 - . gene_id "Rp1"; transcript_id "uc007aex.2";
chr1 mm9_knownGene utr3 4481009 4481209 0 - . gene_id "Sox17"; transcript_id "uc007aey.1";
chr1 mm9_knownGene utr3 4481009 4481209 0 - . gene_id "Sox17"; transcript_id "uc007aez.1";
chr1 mm9_knownGene utr3 4481009 4481209 0 - . gene_id "Sox17"; transcript_id "uc007afa.1";
chr1 mm9_knownGene utr3 4481009 4481209 0 - . gene_id "Sox17"; transcript_id "uc007afb.1";
chr1 mm9_knownGene utr3 4481009 4481209 0 - . gene_id "Sox17"; transcript_id "uc007afc.1";
chr1 mm9_knownGene utr3 4763279 4763479 0 - . gene_id "Mrpl15"; transcript_id "uc007aff.2";
chr1 mm9_knownGene utr3 4763279 4763479 0 - . gene_id "Mrpl15"; transcript_id "uc007afd.2";
It handles columns 4 and 5 fine, but truncates column 9 for only the first line. If I use this awk line, the output is fine:
awk '{FS=OFS="\t"} {print $0}'
However, this awk line duplicates the column 9 truncation error:
awk '{FS=OFS="\t"} {print $1,$2,$3,$4,$5,$6,$7,$8,$9}'
I can add more columns and get more of that first line, which indicates it is treating the first line differently than the rest. I have manually edited the input file to ensure column 9 of the first line does not have any tabs. I have also moved the first line to the end of the file and the new first line shows the same truncation.
Any suggestions? What am I doing wrong here?