awk printing leading tab in output

cmccabe · July 29, 2019, 6:49pm

The awk below executes and produces the current output. it skips the header in row 1 and prints $4,$5,$6 and then adds the header row back.
The problem is that it keeps the tailing tab and prints it in front of $1 . I could add a pipe to remove the tab, but is there a better way to do it with on command? Thank you :).

File tab delimeted

Total_Targets	Targets_less_than250x	Percent_more_than250x
chr11	533813	3633	215009	2077	99.034
chr11	533814	3637
chr11	533815	3637
chr11	533816	3639
chr11	533817	3640
chr11	533818	3639
chr11	533819	3643

Current output

Total_Targets	Targets_less_than250x	Percent_more_than250x
   215009 2077 99.034

Desired output

Total_Targets	Targets_less_than250x	Percent_more_than250x
   215009 2077 99.034

awk

 awk 'NR>1{$1=$2=$3=""; print $0}FNR==1' file

rdrtx1 · July 29, 2019, 7:17pm

awk 'NR>1 && NF > 3 {for (i=4; i<=NF; i++) $(i-3)=$i; NF-=3; print }FNR==1' OFS="\t" File

Don_Cragun · July 29, 2019, 8:28pm

Wouldn't:

awk -F'\t' 'NF>3 {print $4 FS $5 FS $6} FNR==1' File

be easier?

cmccabe · July 29, 2019, 9:04pm

awk -F'\t' 'NF>3 {print $4 FS $5 FS $6} FNR==1' File

so if I understand this the first three fields are skipped and the 4-6 are printed separated by a tab and since the first three fields are skipped so is the trailing tab in the 3rd field?

awk 'NR>1 && NF > 3 {for (i=4; i<=NF; i++) $(i-3)=$i; NF-=3; print }FNR==1' OFS="\t" File

This skips row 1 and prints after field 3 then loops through 4-6 printing them. I think $(i-3)=$i tells awk to loop through each line and the NF-=3 prints fields 4-6 and then the header is added?

I guess my question is in the first awk how does the header row get added?

Both commands work great, just trying to understand. Thank you :).

rdrtx1 · July 29, 2019, 9:24pm

No statements are skipped (no next) so all lines are evaluated.

--- Post updated at 09:24 PM ---

NF > 6 ?  0 : 1      # (probably not)

Don_Cragun · July 29, 2019, 10:58pm

cmccabe:

awk -F'\t' 'NF>3 {print $4 FS $5 FS $6} FNR==1' File
so if I understand this the first three fields are skipped and the 4-6 are printed separated by a tab and since the first three fields are skipped so is the trailing tab in the 3rd field?

The awk print statement says exactly what is to be printed. Since fields 1, 2, and 3 (and the field separators following them) are not mentioned, they are not printed. The FNR==1 (with no specified action) performs the default action ( print $0 ) when that condition evaluates to true. That condition will evaluate to true when the first line of an input file is read (i.e., when the header line of an input file is read). This was taken directly from your code

awk 'NR>1 && NF > 3 {for (i=4; i<=NF; i++) $(i-3)=$i; NF-=3; print }FNR==1' OFS="\t" File
This skips row 1 and prints after field 3 then loops through 4-6 printing them. I think $(i-3)=$i tells awk to loop through each line and the NF-=3 prints fields 4-6 and then the header is added?

I guess my question is in the first awk how does the header row get added?

The condition and action pair:

NR>1 && NF > 3 {for (i=4; i<=NF; i++) $(i-3)=$i; NF-=3; print }

says that for each input line that has a record number (in all input files) greater than 1 AND has more than three fields, do the following. First copy each field with a field number greater than or equal to 4 to the field with field number 3 less than the field being copied. Then reduce the number of fields in the current record by 3. And, then print the updated record.

Note that since your sample input only has 3 fields on the first line, the NR>1 is a no-op and can be omitted.

The header is printed by both of our scripts exactly the same way it was printed by your script. As explained above, the FNR==1 in all of our scripts causes the first line of each input file read to be printed unchanged.

I hope this helps you understand.

Don_Cragun · July 29, 2019, 11:09pm

Hi rdrtx1,
I guess I don't understand why you posted this code. With the sample input provided, the above code copies the input file to the output (since no input line has more than 6 fields).

The command:

awk 'NF > 3 || NR == 1' File

would print just the requested lines but does not discard the first three fields on lines with 6 fields.
Note well: This post has been updated. I got the logic backwards when I first read the script. The first paragraph now correctly specifies how the quoted code behaves. I apologize for any confusion this may have caused.

cmccabe · July 30, 2019, 8:21am

Thank you very much for your help and explanations