awk to change value of field using multiple conditions

In the below awk in the first step I default Classification NF-1 to VUS . Next, I am trying to change the value of Classification (NF) to whatever CLINSIG (NF-1) is. If there is only one condition everything works great, but if there are two conditions it does not work. Is the syntax used incorrect or how do I fix the awk ? Thank you :).

input

Chr    Start    End    Ref    Alt    Func.refGene    PopFreqMax    CLINSIG    Classification
chr1    43395635    43395635    C    T    exonic    0.12    Benign    VUS
chr1    43396414    43396414    G    A    exonic    0.14    Benign    VUS
chr1    172410967    172410967    G    A    exonic    0.66    VUS
chr1    172411496    172411496    A    G    exonic    1    VUS
chr2    51254901    51254901    G    A    exonic    0.48    Likely Benign    VUS
chr2    51254914    51254914    C    T    exonic    0.0023    VUS

awk for step 1

awk 'BEGIN{OFS="\t"} NR>1{$(NF+1)="VUS"} 1' input > out

awk for step 2

awk -v OFS='\t' '$(NF-1)=="Benign" || $(NF-1)=="Likely Benign" {$(NF)=$(NF-1)} {print $0 }' out > final

desired output

Chr    Start    End    Ref    Alt    Func.refGene    PopFreqMax    CLINSIG    Classification
chr1    43395635    43395635    C    T    exonic    0.12    Benign    VUS     Benign
chr1    43396414    43396414    G    A    exonic    0.14    Benign    VUS     Benign
chr1    172410967    172410967    G    A    exonic    0.66    VUS
chr1    172411496    172411496    A    G    exonic    1    VUS
chr2    51254901    51254901    G    A    exonic    0.48    Likely Benign    VUS     Likely Benign
chr2    51254914    51254914    C    T    exonic    0.0023    VUS

Hello cmccabe,

Could you please try following and let me know if this helps you.

awk -v OFS='\t' '$(NF-1)=="Benign" || ($(NF-2) OFS $(NF-1))=="Likely Benign" {$(NF+1)=$(NF-2) OFS $(NF-1)} {print $0 }'  Input_file

So problem in your code was like you missed there $(NF-1) 's value will NOT be Likely Benign because by default awk 's field seprator is a SPACE so that condition will never be true, because value Likely Benign should be equal to $(NF-2) OFS $(NF-1) . Please try above and let me know if this helps you.

Thanks,
R. Singh

1 Like
awk -F"\t" 'NR>1{$0=$0 FS "VUS" (($NF=="Benign" || $NF=="Likely Benign") ? (FS $NF) : "")} 1' file
1 Like

Thank you both :slight_smile:

---------- Post updated 08-03-16 at 10:06 AM ---------- Previous update was 08-02-16 at 03:22 PM ----------

I guess I do not understand hoe NF works. My actual data file is attached and is much larger (file.txt).

awk to get NF

awk 'NR==1{for(i=1;i<=NF;i++){print "Number of field in terms of NF is--> NF-" NF-i", value is-->" $i}}' file.txt

Number of field in terms of NF is--> NF-55, value is-->Chr
Number of field in terms of NF is--> NF-54, value is-->Start
Number of field in terms of NF is--> NF-53, value is-->End
Number of field in terms of NF is--> NF-52, value is-->Ref
Number of field in terms of NF is--> NF-51, value is-->Alt
Number of field in terms of NF is--> NF-50, value is-->Func.refGene
Number of field in terms of NF is--> NF-49, value is-->Gene.refGene
Number of field in terms of NF is--> NF-48, value is-->GeneDetail.refGene
Number of field in terms of NF is--> NF-47, value is-->ExonicFunc.refGene
Number of field in terms of NF is--> NF-46, value is-->AAChange.refGene
Number of field in terms of NF is--> NF-45, value is-->avsnp147
Number of field in terms of NF is--> NF-44, value is-->PopFreqMax
Number of field in terms of NF is--> NF-43, value is-->1000G_ALL
Number of field in terms of NF is--> NF-42, value is-->1000G_AFR
Number of field in terms of NF is--> NF-41, value is-->1000G_AMR
Number of field in terms of NF is--> NF-40, value is-->1000G_EAS
Number of field in terms of NF is--> NF-39, value is-->1000G_EUR
Number of field in terms of NF is--> NF-38, value is-->1000G_SAS
Number of field in terms of NF is--> NF-37, value is-->ExAC_ALL
Number of field in terms of NF is--> NF-36, value is-->ExAC_AFR
Number of field in terms of NF is--> NF-35, value is-->ExAC_AMR
Number of field in terms of NF is--> NF-34, value is-->ExAC_EAS
Number of field in terms of NF is--> NF-33, value is-->ExAC_FIN
Number of field in terms of NF is--> NF-32, value is-->ExAC_NFE
Number of field in terms of NF is--> NF-31, value is-->ExAC_OTH
Number of field in terms of NF is--> NF-30, value is-->ExAC_SAS
Number of field in terms of NF is--> NF-29, value is-->ESP6500siv2_ALL
Number of field in terms of NF is--> NF-28, value is-->ESP6500siv2_AA
Number of field in terms of NF is--> NF-27, value is-->ESP6500siv2_EA
Number of field in terms of NF is--> NF-26, value is-->CG46
Number of field in terms of NF is--> NF-25, value is-->dpsi_max_tissue
Number of field in terms of NF is--> NF-24, value is-->dpsi_zscore
Number of field in terms of NF is--> NF-23, value is-->SIFT_score
Number of field in terms of NF is--> NF-22, value is-->SIFT_pred
Number of field in terms of NF is--> NF-21, value is-->Polyphen2_HDIV_score
Number of field in terms of NF is--> NF-20, value is-->Polyphen2_HDIV_pred
Number of field in terms of NF is--> NF-19, value is-->Polyphen2_HVAR_score
Number of field in terms of NF is--> NF-18, value is-->Polyphen2_HVAR_pred
Number of field in terms of NF is--> NF-17, value is-->LRT_score
Number of field in terms of NF is--> NF-16, value is-->LRT_pred
Number of field in terms of NF is--> NF-15, value is-->MutationTaster_score
Number of field in terms of NF is--> NF-14, value is-->MutationTaster_pred
Number of field in terms of NF is--> NF-13, value is-->MutationAssessor_score
Number of field in terms of NF is--> NF-12, value is-->MutationAssessor_pred
Number of field in terms of NF is--> NF-11, value is-->CLINSIG
Number of field in terms of NF is--> NF-10, value is-->CLNDBN
Number of field in terms of NF is--> NF-9, value is-->CLNACC
Number of field in terms of NF is--> NF-8, value is-->CLNDSDB
Number of field in terms of NF is--> NF-7, value is-->CLNDSDBID
Number of field in terms of NF is--> NF-6, value is-->Quality
Number of field in terms of NF is--> NF-5, value is-->Reads
Number of field in terms of NF is--> NF-4, value is-->Zygosity
Number of field in terms of NF is--> NF-3, value is-->Phred
Number of field in terms of NF is--> NF-2, value is-->Classification
Number of field in terms of NF is--> NF-1, value is-->HGMD
Number of field in terms of NF is--> NF-0, value is-->Sanger

I tried the below awk to produced the attached desired output, which is just "VUS" in the Classification or NF-2 field. Currently I get a result with the data all out of order (attached current.txt. Thank you :).

awk 'BEGIN{OFS="\t"} NR>1{$(NF-2)="VUS"} 1' file.txt > VUS