Hi Friends,
My input file
Gene1 10 20 0
Gene2 5 0 15
Gene3 10 10 10
Gene4 5 0 0
If there is a zero for any gene in any column, I don't want that column to be considered which reduces the denominator value during average.
Here is my output
Gene1 10 20 0 10
Gene2 5 0 15 10
Gene3 10 10 10 10
Gene4 5 0 0 5
The last column is my average. I didn't try anything for this solution because I didn't know where to start. Sorry.
This approach utilizes bash arrays:
while read gname val1 val2 val3
do
arr=()
[ $val1 -ne 0 ] && arr=("${arr[@]}" "$val1")
[ $val2 -ne 0 ] && arr=("${arr[@]}" "$val2")
[ $val3 -ne 0 ] && arr=("${arr[@]}" "$val3")
nbrofvals=${#arr[@]}
tot=0
for arrval in "${arr[@]}"
do
tot=$((tot + arrval))
done
avg=$((tot/nbrofvals))
echo "$gname $val1 $val2 $val3 $avg"
done <input
1 Like
junior-helper:
This approach utilizes bash arrays:
while read gname val1 val2 val3
do
arr=()
[ $val1 -ne 0 ] && arr=("${arr[@]}" "$val1")
[ $val2 -ne 0 ] && arr=("${arr[@]}" "$val2")
[ $val3 -ne 0 ] && arr=("${arr[@]}" "$val3")
nbrofvals=${#arr[@]}
tot=0
for arrval in "${arr[@]}"
do
tot=$((tot + arrval))
done
avg=$((tot/nbrofvals))
echo "$gname $val1 $val2 $val3 $avg"
done <input
Thanks for your time. But I see the following errors
line 4: [: 1415.37: integer expression expected
line 5: [: 0.05248: integer expression expected
line 6: [: 2.20281: integer expression expected
line 13: tot/nbrofvals: division by 0 (error token is "s")
FYI, the zeros in the input files can even occur in second column which is your val1. Thanks for your time.
---------- Post updated at 04:37 PM ---------- Previous update was at 04:06 PM ----------
I tried all possible zeros occurrence like this
gene1 1 2 3
gene2 0 0 3
gene3 0 9 0
gene4 8 0 0
gene5 5 5 0
gene6 0 6 6
gene7 9 9 0
And then wrote if else statements for all combinations
awk '{if($2!=0 && $3!=0 && $4!=0) {v=($2+$3+$4)/3; print $0"\t"v} else if($2==0 && $3==0 && $4!=0) {v=($2+$3+$4)/1; print $0"\t"v} else if($2==0 && $3!=0 && $4==0) {v=($2+$3+$4)/1; print $0"\t"v} else if($2!=0 && $3==0 && $4==0) {v=($2+$3+$4)/1; print $0"\t"v} else if($2!=0 && $3!=0 && $4==0) {v=($2+$3+$4)/2; print $0"\t"v} else if($2==0 && $3!=0 && $4!=0) {v=($2+$3+$4)/2; print $0"\t"v} else if($2!=0 && $3!=0 && $4==0) {v=($2+$3+$4)/2; print $0"\t"v}}' input
output
gene1 1 2 3 2
gene2 0 0 3 3
gene3 0 9 0 9
gene4 8 0 0 8
gene5 5 5 0 5
gene6 0 6 6 6
gene7 9 9 0 9
You're welcome.
OK, obviously you want to calculate average of floating point numbers, right?
I saw your input sample with whole numbers only and scripted something for that.
>the zeros in the input files can even occur in second column which is your val1
If you think of gname as val0, then val1 *is* the second column, thus the script would also handle a zero in the second column.
However, right now, it would fail if all three columns (col2, 3 and 4) were all zeros.
I suggest that you provide another few "real world" input lines and the expected output.
I'm not sure I can fix my code though :rolleyes:, but it would be very useful for others.
1 Like
Yoda
September 17, 2014, 4:38pm
5
Try this:-
awk '
{
for ( i = 2; i <= NF; i++ )
{
V[$1] = $0
A[$1] += $i
T[$1] += ( $i == 0 ? 0 : 1 )
}
}
END {
for ( k in A )
print V[k], T[k] ? A[k] / T[k] : 0
}
' file | sort
2 Likes
Posted by jacobs.smith:
Hi Friends,
My input file
Code:
Gene1 10 20 0Gene2 5 0 15Gene3 10 10 10Gene4 5 0 0
If there is a zero for any gene in any column, I don't want that column to be considered which reduces the denominator value during average.
Here is my output
Code:
Gene1 10 20 0 10Gene2 5 0 15 10Gene3 10 10 10 10Gene4 5 0 0 5
The last column is my average. I didn't try anything for this solution because I didn't know where to start. Sorry.
Hello jacobs.smith,
Following may help also.
awk '{for(i=2;i<=NF;i++){if($i){A[NR]=A[NR]?A[NR]+$i:$i;B[NR]=B[NR]?B[NR]+1:1;}}} {p=f=NR;{for(r=NR;r<=NR;r++){c[p--]=A[r]/B[r]}}{for(t=f;t<=NR;t++){$0=$0 OFS c[t];print $0}}}' Input_file
Output will be as follows.
Gene1 10 20 0 15
Gene2 5 0 15 10
Gene3 10 10 10 10
Gene4 5 0 0 5
EDIT: Adding a clear view of code for same.
awk '{
for(i=2;i<=NF;i++)
{if($i)
{A[NR]=A[NR]?A[NR]+$i:$i;B[NR]=B[NR]?B[NR]+1:1;}
}
}
{p=f=NR;
{for(r=NR;r<=NR;r++)
{c[p--]=A[r]/B[r]}
}
{for(t=f;t<=NR;t++)
{$0=$0 OFS c[t];print $0}
}
}' Input_file
Thanks,
R. Singh
1 Like