The awk
below produces the current output, which will add +1 to $3
. However, I am trying to add the length of the matching characters between $5
and $6
to $3
. I have tried using sub as a variable to store the length but am not able to do so correctly. I added comments to each line and the description has the rules for each line and the math is zero-based
. Thank you :).
description
since line 1 has 4 matching characters between $5 and $6 (GAAA), 4 is added to $3
since line 1 has 5 matching characters between $5 and $6 (GAAAA), 5 is added to $3
file tab-delimited
id1 1 116268178 GAAA GAAAA
id2 1 116268200 GAAAA GAAAAA
current output tab-delimeted
id1 1 116268179 116268179 GAAA GAAAA
id2 1 116268201 116268201 GAAAA GAAAAA
desired output tab-delimeted
id1 1 116268181 116268181 GAAA GAAAA
id2 1 116268204 116268204 GAAAA GAAAAA
awk
awk 'BEGIN{FS=OFS="\t"} # define fs and output
FNR==NR{ # process each field in each line of file
if(length($5) < length($6)) { # condition 2
sub($5,"",$6) && sub($6,"",$5) # removing matching
print $1,$2,$3+1,$3+1,"-",$6 # print desired output
next
}
}' file > output