The awk
below executes and update the desired field in my first awk
. However, the white space between
nonsynonymous SNV
in $9
is being split into tabs and my attempt to correct this does not update the field
unless it is removed. I am not sure what I am doing wrong? Thank you :).
file1
R_Index Chr Start End Ref Alt Func.refGene Gene.refGene GeneDetail.refGene Inheritence ExonicFunc.refGene AAChange.refGene avsnp147 PopFreqMax 1000G_ALL 1000G_AFR 1000G_AMR 1000G_EAS 1000G_EUR 1000G_SAS ExAC_ALL ExAC_AFR ExAC_AMR ExAC_EAS ExAC_FIN ExAC_NFE ExAC_OTH ExAC_SAS ESP6500siv2_ALL ESP6500siv2_AA ESP6500siv2_EA CG46 SIFT_score SIFT_pred Polyphen2_HDIV_score Polyphen2_HDIV_pred Polyphen2_HVAR_score Polyphen2_HVAR_pred LRT_score LRT_pred MutationTaster_score MutationTaster_pred MutationAssessor_score MutationAssessor_pred dpsi_max_tissue dpsi_zscore CLINSIG CLNDBN CLNACC CLNDSDB CLNDSDBID Quality Reads Zygosity Score Classification Rank HGMD Sanger
11 chr2 220494118 220494118 A C exonic SLC4A3 . . nonsynonymous SNV SLC4A3:NM_001326559:exon4:c.470A>C:p.H157P,SLC4A3:NM_005070:exon4:c.470A>C:p.H157P,SLC4A3:NM_201574:exon4:c.470A>C:p.H157P rs597306 1. 0.95 0.84 0.98 1. 1. 1. 0.98 0.84 0.99 1. 1. 1. 0.99 1. 0.95 0.85 1. 0.84 1.0 T 0.0 B 0.0 B 0.013 N 1 P -1.545 N -0.0806 -0.387 . . . . . GOOD 78 hom 22
file2
SLC4A3 unknown
current output
R_Index Chr Start End Ref Alt Func.refGene Gene.refGene GeneDetail.refGene Inheritence ExonicFunc.refGene AAChange.refGene avsnp147 PopFreqMax 1000G_ALL 1000G_AFR 1000G_AMR 1000G_EAS 1000G_EUR 1000G_SAS ExAC_ALL ExAC_AFR ExAC_AMR ExAC_EAS ExAC_FIN ExAC_NFE ExAC_OTH ExAC_SAS ESP6500siv2_ALL ESP6500siv2_AA ESP6500siv2_EA CG46 SIFT_score SIFT_pred Polyphen2_HDIV_score Polyphen2_HDIV_pred Polyphen2_HVAR_score Polyphen2_HVAR_pred LRT_score LRT_pred MutationTaster_score MutationTaster_pred MutationAssessor_score MutationAssessor_pred dpsi_max_tissue dpsi_zscore CLINSIG CLNDBN CLNACC CLNDSDB CLNDSDBID Quality Reads Zygosity Score Classification Rank HGMD Sanger
11 chr2 220494118 220494118 A C exonic SLC4A3 . unknown nonsynonymous SNV SLC4A3:NM_001326559:exon4:c.470A>C:p.H157P,SLC4A3:NM_005070:exon4:c.470A>C:p.H157P,SLC4A3:NM_201574:exon4:c.470A>C:p.H157P rs597306 1. 0.95 0.84 0.98 1. 1. 1. 0.98 0.84 0.99 1. 1. 1. 0.99 1. 0.95 0.85 1. 0.84 1.0 T 0.0 B 0.0 B 0.013 N 1 P -1.545 N -0.0806 -0.387 . . . . . GOOD 78 hom 22
desired output field in bold updated and not split
R_Index Chr Start End Ref Alt Func.refGene Gene.refGene GeneDetail.refGene Inheritence ExonicFunc.refGene AAChange.refGene avsnp147 PopFreqMax 1000G_ALL 1000G_AFR 1000G_AMR 1000G_EAS 1000G_EUR 1000G_SAS ExAC_ALL ExAC_AFR ExAC_AMR ExAC_EAS ExAC_FIN ExAC_NFE ExAC_OTH ExAC_SAS ESP6500siv2_ALL ESP6500siv2_AA ESP6500siv2_EA CG46 SIFT_score SIFT_pred Polyphen2_HDIV_score Polyphen2_HDIV_pred Polyphen2_HVAR_score Polyphen2_HVAR_pred LRT_score LRT_pred MutationTaster_score MutationTaster_pred MutationAssessor_score MutationAssessor_pred dpsi_max_tissue dpsi_zscore CLINSIG CLNDBN CLNACC CLNDSDB CLNDSDBID Quality Reads Zygosity Score Classification Rank HGMD Sanger
11 chr2 220494118 220494118 A C exonic SLC4A3 . unknown nonsynonymous SNV SLC4A3:NM_001326559:exon4:c.470A>C:p.H157P,SLC4A3:NM_005070:exon4:c.470A>C:p.H157P,SLC4A3:NM_201574:exon4:c.470A>C:p.H157P rs597306 1. 0.95 0.84 0.98 1. 1. 1. 0.98 0.84 0.99 1. 1. 1. 0.99 1. 0.95 0.85 1. 0.84 1.0 T 0.0 B 0.0 B 0.013 N 1 P -1.545 N -0.0806 -0.387 . . . . . GOOD 78 hom 22
awk
awk 'FNR==NR {a[$1]=$2; next} a[$8]{$10=a[$8]}1' OFS="\t" file2 file1 > output
To ignore the whitespace I tried:
awk -F '' 'FNR==NR {a[$1]=$2; next} a[$8]{$10=a[$8]}1' OFS="\t" file2 file1 > output