awk to update value in field based on another field

In the tab-delimeted input file below I am trying to use awk to update the value in $2 if TYPE=ins in bold, by adding the value of
HRUN= in italics. In the below since in line 1 TYPE=ins the 117282541 value in $2 has 6 added because that is the value of HRUN= .
Hopefully the awk is a start but I am not how to add the digit in HRUN= , as this will always be different. If TYPE= is anything else besides ins
then the line is printed as is. Thank you :).

input

chr7	117282541	.	C	CT	19.2911	PASS	AF=0.219512;AO=30;DP=130;FAO=27;FDP=123;FR=.;FRO=96;FSAF=17;FSAR=10;FSRF=49;FSRR=47;FWDB=-0.0766145;FXX=0.00806387;HRUN=6;LEN=1;MLLD=7.86443;OALT=T;OID=.;OMAPALT=CT;OPOS=117282542;OREF=-;QD=0.627353;RBI=0.20132;REFB=-0.0755381;REVB=0.186172;RO=74;SAF=22;SAR=8;SRF=38;SRR=36;SSEN=0;SSEP=0;SSSB=0.282205;STB=0.594772;STBP=0.284;TYPE=ins;VARB=-0.0654345;FUNC=[{'origPos':'117282541','origRef':'C','normalizedRef':'C','gene':'CFTR','normalizedPos':'117282541','normalizedAlt':'CT','gt':'pos','codon':'TTT','coding':'c.3767_3768insT','transcript':'NM_000492.3','function':'frameshiftInsertion','protein':'p.Leu1258fs','location':'exonic','origAlt':'CT','exon':'23'}]	GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR	0/1:19:130:123:74:96:30:27:0.219512:8:22:38:36:10:17:49:47
chr7	117417559	.	A	G	173.255	PASS	AF=0.415254;AO=49;DP=118;FAO=49;FDP=118;FR=.,REALIGNEDx0.4322;FRO=69;FSAF=24;FSAR=25;FSRF=35;FSRR=34;FWDB=0.00735029;FXX=0;HRUN=1;LEN=1;MLLD=114.133;OALT=G;OID=.;OMAPALT=G;OPOS=117417559;OREF=A;QD=5.87305;RBI=0.0298232;REFB=0.00339785;REVB=0.0289033;RO=69;SAF=24;SAR=25;SRF=35;SRR=34;SSEN=0;SSEP=0;SSSB=-0.0181233;STB=0.5102;STBP=0.892;TYPE=snp;VARB=-0.00586008;FUNC=[{'transcript':'NM_033427.2','gene':'CTTNBP2','location':'splicesite_3','exon':'8'}]	GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR	0/1:173:118:118:69:69:49:49:0.415254:25:24:35:34:25:24:35:34

desired output tab-delimeted

chr7	117282547	.	C	CT	19.2911	PASS	AF=0.219512;AO=30;DP=130;FAO=27;FDP=123;FR=.;FRO=96;FSAF=17;FSAR=10;FSRF=49;FSRR=47;FWDB=-0.0766145;FXX=0.00806387;HRUN=6;LEN=1;MLLD=7.86443;OALT=T;OID=.;OMAPALT=CT;OPOS=117282542;OREF=-;QD=0.627353;RBI=0.20132;REFB=-0.0755381;REVB=0.186172;RO=74;SAF=22;SAR=8;SRF=38;SRR=36;SSEN=0;SSEP=0;SSSB=0.282205;STB=0.594772;STBP=0.284;TYPE=ins;VARB=-0.0654345;FUNC=[{'origPos':'117282541','origRef':'C','normalizedRef':'C','gene':'CFTR','normalizedPos':'117282541','normalizedAlt':'CT','gt':'pos','codon':'TTT','coding':'c.3767_3768insT','transcript':'NM_000492.3','function':'frameshiftInsertion','protein':'p.Leu1258fs','location':'exonic','origAlt':'CT','exon':'23'}]	GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR	0/1:19:130:123:74:96:30:27:0.219512:8:22:38:36:10:17:49:47
chr7	117417559	.	A	G	173.255	PASS	AF=0.415254;AO=49;DP=118;FAO=49;FDP=118;FR=.,REALIGNEDx0.4322;FRO=69;FSAF=24;FSAR=25;FSRF=35;FSRR=34;FWDB=0.00735029;FXX=0;HRUN=1;LEN=1;MLLD=114.133;OALT=G;OID=.;OMAPALT=G;OPOS=117417559;OREF=A;QD=5.87305;RBI=0.0298232;REFB=0.00339785;REVB=0.0289033;RO=69;SAF=24;SAR=25;SRF=35;SRR=34;SSEN=0;SSEP=0;SSSB=-0.0181233;STB=0.5102;STBP=0.892;TYPE=snp;VARB=-0.00586008;FUNC=[{'transcript':'NM_033427.2','gene':'CTTNBP2','location':'splicesite_3','exon':'8'}]	GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR	0/1:173:118:118:69:69:49:49:0.415254:25:24:35:34:25:24:35:34
awk -F'\t' -v OFS='\t' '{ if (TYPE == "ins") $2=$2+HRUN={x}; print $0 }' file

Hello cmccabe,

Could you please try following and let us know how it goes then.

awk  -F"\t" 'NF{;split($(NF-2), A,";");split(A[36], B,"=");if(B[2]=="ins"){split(A[14], C,"=");$2+=C[2];}} 1'   OFS="\t"  Input_file

Thanks,
R. Singh

1 Like

Thank you very much :).