Trying to use awk
to format the input based on the filed count being 5. Most lines are fine using the awk
below, except the first two lines. I know the reason is the -1 in green
and -2 in blue
. But can not figure out how to not split on the -
if it is followed by a digit then letter. Thank you :).
input
chr9:21971208 CDKN2A c.151-1G>A p.?
chr9:21971209 CDKN2A c.151-2A>G p.?
chr5:112175216 APC c.3925G>T p.E1309*
chr5:112175363 APC c.4072G>A p.A1358T
chr5:112175390 APC c.4099C>T p.Q1367*
EGFR,,EGFR c.2240_2257delTAAGAGAAGCAACATCTC p.L747_P753delinsS,chr7:55242470_55242487delTAAGAGAAGCAACATCTC,3470,,,,
current
chr9 21971208 CDKN2A c.151 1G>A p.?
chr9 21971209 CDKN2A c.151 2A>G p.?
chr5 112175216 112175216 APC p.E1309*
chr5 112175363 112175363 APC p.A1358T
chr5 112175390 112175390 APC p.Q1367*
chr7 55242470 55242487 EGFR c.2240_2257delTAAGAGAAGCAACATCTC p.L747_P753delinsS
desired
chr9 21971208 21971208 CDKN2A c.151-1G>A p.?
chr9 21971209 21971209 CDKN2A c.151-2A>G p.?
chr5 112175216 112175216 APC p.E1309*
chr5 112175363 112175363 APC p.A1358T
chr5 112175390 112175390 APC p.Q1367*
chr7 55242470 55242487 EGFR c.2240_2257delTAAGAGAAGCAACATCTC p.L747_P753delinsS
awk
awk -F'[ :_-]' 'NF==5{$4=$3;$3=$2} {$1=$1} 1' input