comp51820_c1_seq1 42 N 0:0:0:0:0:0 1:0:0:0:0:0 0:0:0:0:0:0 3:0:0:0:0:0 0:0:0:0:0:0
comp51820_c1_seq1 43 N 0:0:0:0:0:0 0:1:0:0:0:0 0:0:0:0:0:0 0:3:0:0:0:0 0:0:0:0:0:0
comp51820_c1_seq1 44 N 0:0:4:0:3:1 0:0:1:9:0:0 10:0:0:0:0:0 0:3:3:2:2:6 2:2:2:5:60:3
comp51820_c1_seq1 45 N 0:4:0:0:5:0 0:1:8:0:0:0 0:0:0:0:9:0 0:3:0:0:6:0 0:0:0:0:13:0
I would like to compare the sums of the colon-separated values in columns 4-8. Specifically, I would like to print the lines for which the sum of each column (4 through 8) is at least 8. Caveat: I need to ignore the fifth value in each column (this excludes line four in file.txt).
Desired output:
comp51820_c1_seq1 44 N 0:0:4:0:3:1 0:0:1:9:0:0 10:0:0:0:0:0 0:3:3:2:2:6 2:2:2:5:60:3
So far I have tried to use awk to sum columns, but I'm not sure how to compare multiple sums per line. Plus it seems there must be a better way but google hasn't given me an answer so far.
Thank you, this works. Would you mind also showing how I would do it if I did not want to exclude the fifth value? So just print lines where the sum of ALL values in each of columns 4-8 is at least 8? I can't tell which part of the code addresses this part of it.
This works to exclude the correct lines, but it also excludes a line that should be accepted (since the five sums (one from each column) are all >= 8). Desired output is:
comp51820_c1_seq1 407 N 0:0:0:31:0:0 0:0:0:48:0:0 0:0:0:59:0:0 0:0:0:8:0:0 0:0:0:45:0:0
comp51820_c1_seq1 408 N 0:31:0:0:0:0 0:51:0:0:0:0 0:60:0:0:0:0 0:9:0:0:0:0 0:48:0:0:0:0
I'm trying to understand why the first line above is excluded from the output...
PLease don't mind me asking about the "n" in this split operation, I would like to learn it,
n=split($i,a,":")
i is 4,5,6,7,8 in our case (first case i used),
for $4, $5,...$8 you split and set the sub fields to array "a" ,
what would be the values of n?
n= a[1], a[2],...,a[6] ???
For $4, when i = 4, i used 3rd Line and 4th Column then is n like the following?
A[1]= A[2]=A[4]=0
A[3]=4
A[5]=3
A[6]=1
if so how you can compare j and n in
for(j=1;j<n;j++)
this part confused me, I appreciate if you could explain it a bit,
n=split($i,a,":") for field 0:29:0:0:0:0 results in
n=6 # 6 elements from string '0:29:0:0:0:0' separated by ':'
a[1]=0
a[2]=29
a[3]=0
a[4]=0
a[5]=0
a[6]=0
'j' is just a temporary iterator that goes from 1 to 'n' (the number of elements in the split string saved in array 'a').
Not sure if addressed the question though ....