Hi,
I noticed a weird behaviour with awk.
input:
A|B|1-100|blabla_35_40_blabla;blabla_53_60_blabla;blabla_90_110_blabla
Objective:
For each string separated by ';' in $4, if the first and second numbers are included in the interval in $3, then print "TRUE". Otherwise print "FALSE".
In order to get this output:
A|B|1-100|blabla_35_40_blabla|TRUE
A|B|1-100|blabla_53_60_blabla|TRUE
A|B|1-100|blabla_90_110_blabla|FALSE
My code:
awk '
BEGIN{FS=OFS="|"}
{
START=FINISH=$3
gsub(/-.+$/,"",START) # isolate the first number in the interval in $3
gsub(/^.+-/,"",FINISH) # isolate the second number in the interval in $3
a=split($4,b,";")
for(i=1; i<=a; i++){
beg=gensub(/(^[^_]+_)([0-9]+)(_.+$)/,"\\2","g",b) # isolate first number in $4
end=gensub(/(^[^_]+_[0-9]+_)([0-9]+)(_.+$)/,"\\2","g",b) # isolate second number in $4
if(beg > START && end < FINISH){
print $1 FS $2 FS $3 FS b FS "TRUE"
}
else{
print $1 FS $2 FS $3 FS b FS "FALSE"
}
}
}' input
But I get:
A|B|1-100|blabla_35_40_blabla|FALSE
A|B|1-100|blabla_53_60_blabla|FALSE
A|B|1-100|blabla_90_110_blabla|FALSE
---------- Post updated at 08:23 AM ---------- Previous update was at 07:57 AM ----------
It actually works when I use arrays instead of 'gsub /gensub'. So I assume awk treats the number as numbers with arrays and as text with gensub maybe