I am trying to run the awk
below. My question is when I split
the input, then run another awk
to perform a calculation using that split
as the input there are no issues. When I try to combine them the output is not correct, is the split not working or did I do it wrong? Thank you :).
input
chr1 955543 955763 chr1:955543-955763 AGRN-6|gc=75 1 15
chr1 955543 955763 chr1:955543-955763 AGRN-6|gc=75 2 16
chr1 955543 955763 chr1:955543-955763 AGRN-6|gc=75 3 16
chr1 955543 955763 chr1:955543-955763 AGRN-6|gc=75 4 14
chr1 955543 955763 chr1:955543-955763 AGRN-6|gc=75 5 17
after the split awk '{split($5,a,"-"); print $1,$2,$3,$4,a[1]}' input > split
split (uses the - in $5
and prints $1,$2,$3,$4,and the split a[1]
chr1 955543 955763 chr1:955543-955763 AGRN
chr1 955543 955763 chr1:955543-955763 AGRN
chr1 955543 955763 chr1:955543-955763 AGRN
chr1 955543 955763 chr1:955543-955763 AGRN
chr1 955543 955763 chr1:955543-955763 AGRN
if I use that file (split) in the below awk
the output is correct
output ($5 count of lines that are the same and the sum of $3-$2
AGRN 5 1100
If I try to perform the split and run the calculation in the same awk
, I get the below output:
awk '{split($5,a,"-"); print $1,$2,$3,$4,a[1]} {c1[a1]++; c2[a1]+=($3-$2)}
> END{for (e in c1) print e, c1[e], c2[e]}' split
output
chr1 955543 955763 chr1:955543-955763 AGRN
chr1 955543 955763 chr1:955543-955763 AGRN
chr1 955543 955763 chr1:955543-955763 AGRN
chr1 955543 955763 chr1:955543-955763 AGRN
chr1 955543 955763 chr1:955543-955763 AGRN
5 1100