Averaging each row with null values

Hi all,

I want to compute for the average of a file with null values (NaN) for each row. any help on how to do it. the sample file looks like this.

1.4 1.2 1.5 NaN 1.6
1.3 1.1 NaN 1.3 NaN
2.4 1.3 1.5 NaN 1.5 
NaN 1.2 NaN 1.4 NaN

I need to do a row-wise averaging such that it will sum only all the valid values/ divided by the number of valid values. Null values are interspersed within rows and I want to disregard these values when computing for the average like the ones below:

(1.4+1.2+1.5+1.6)/4
(1.3+1.1+1.3)/3
(2.4+1.3+1.5+1.5)/4
(1.2+1.4)/2

Thank you very much in advance for any help and insight on this.

Try...

awk '{a=b=0;for(i=1;i<=NF;i++)if($i+0==$i){a+=$i;b++};print a/b}' file1
2 Likes

Thank you very much for the code its just what I needed, thanks for saving my night,:):slight_smile:

---------- Post updated at 09:42 PM ---------- Previous update was at 08:53 PM ----------

There is one more glitch, am sorry for missing this out. I have some rows with entirely null values and as such using the previous code I ended up with getting fatal error (attempted division by zero). how can it be resolved? how to tell the script that when all values in the row is null (NaN), put NaN on that corresponding row average.

1.4 1.2 1.5 NaN 1.6 = (1.4+1.2+1.5+1.6)/4
1.3 1.1 NaN 1.3 NaN = (1.3+1.1+1.3)/3 
2.4 1.3 1.5 NaN 1.5 = (2.4+1.3+1.5+1.5)/4
NaN 1.2 NaN 1.4 NaN = (1.2+1.4)/2
NaN NaN NaN NaN NaN = NaN
NaN NaN NaN NaN NaN = NaN

Many thanks.

perl -lne 'if ($_ !~ /\d/) { print "NaN"; next}
$c=0; $x=0; while (/(\d+\.?\d+)/g) {$x+=$1; $c++}; printf "%.4f\n", $x/$c' inputfile

Hi balajesuri,

Thanks for the code firstly. i tried running it using my actual data and I'm getting erroneous values in the computation of the mean. here's the part of actual data I've been working on: the file is 10000 lines with 6 columns and columns are spaced-separated.

NaN NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN NaN
18.0972824097 20.2922077179 20.7254753113 15.4400615692 9.52701663971 6.10305690765
18.2239189148 20.7757129669 21.2065505981 15.8027992249 9.77007102966 6.23900651932
17.9261703491 20.5239772797 20.9382820129 15.6156806946 9.65097522736 6.15498161316
16.0799484253 17.7822532654 18.1771831512 13.5831022263 8.32423019409 5.3343873024
13.5023841858 13.697104454 14.0042314529 10.525557518 6.38268995285 4.15324020386
10.5770702362 8.93392467499 8.88776779175 6.87884998322 4.28267669678 2.9421877861
8.19773292542 4.39066982269 3.65492200851 3.2539999485 2.55352306366 2.10937023163
8.2882642746 2.93434286118 1.46687304974 1.83887648582 2.41666197777 2.36591768265
NaN NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN NaN
12.4762687683 6.42608261108 4.62352132797 4.02927875519 4.84602165222 4.74294662476
12.6041641235 7.9089307785 6.68155002594 5.02241182327 5.18983364105 5.45300674438
11.5241088867 8.65106964111 8.1567325592 5.48382425308 4.93830966949 5.75111293793
8.8475933075 7.91144323349 8.00369262695 4.8858923912 3.93336176872 5.43343114853
5.36913251877 6.08526515961 6.57769536972 3.54072499275 2.55421519279 4.78161048889
2.62924838066 4.02733469009 4.65754842758 2.14654231071 1.49035787582 4.13778400421
0.869714140892 1.89960038662 2.38566756248 0.849869251251 0.867563068867 3.59587478638
0.642251551151 0.681076049805 0.944821119308 0.245550200343 0.886436700821 3.47824835777
0.94763982296 0.349150121212 0.488753944635 0.175734773278 1.11330735683 3.5775718689
1.03893077374 0.26232483983 0.338562995195 0.209268793464 1.22743105888 3.58259248734
1.09674620628 0.225983262062 0.226195812225 0.251310884953 1.2021933794 3.26848888397
1.14800679684 0.295170128345 0.211536288261 0.306422531605 1.04613471031 2.6469912529
1.15812194347 0.478137284517 0.354398548603 0.366809636354 0.876217782497 2.05036878586
1.14090454578 0.649072349072 0.532046616077 0.396946698427 0.740904033184 1.56694638729
1.08868527412 0.764490664005 0.682165384293 0.391505628824 0.638526678085 1.1559252739
1.07790124416 0.883833229542 0.858095645905 0.436854451895 0.658724308014 0.953302264214
1.13274395466 1.02168250084 1.07939708233 0.566049337387 0.787775695324 0.894795715809
1.20699775219 1.16161489487 1.32476341724 0.755846261978 0.977323055267 0.929328680038
NaN NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN NaN

Many thanks

This is the result I got. Appears to be alright.

[root@hostname dir]# perl -lne 'if ($_ !~ /\d/) { print "NaN"; next}
$c=0; $x=0; while (/(\d+\.?\d+)/g) {$x+=$1; $c++}; printf "%.4f\n", $x/$c' input
NaN
NaN
NaN
NaN
NaN
NaN
15.0309
15.3363
15.1350
13.2135
10.3775
7.0837
4.0267
3.2185
NaN
NaN
6.1907
7.1433
7.4175
6.5026
4.8181
3.1815
1.7447
1.1464
1.1087
1.1099
1.0452
0.9424
0.8807
0.8378
0.7869
0.8115
0.9137
1.0593
NaN
NaN
1 Like

Hi balajesuri, thanks much its working now. it was my mistake I appended the output to an existing file. thanks much again.

cheers.:slight_smile:

1 Like

Small tweek to Ygor's original awk which prevents issues when full rows of non-values are present:

awk '/[0-9]/ {a=b=0;for(i=1;i<=NF;i++)if($i+0==$i){a+=$i;b++};print a/b; next;} {print "NaN"; }' file1

Keeping with balajesuri's solution, the mean for a 'bad' row is NaN.

I also confirmed that balajesuri's perl is doing the right thing.

1 Like

thanks agama, for the other option. :slight_smile:

Hi I'm sorry again, I have come across another problem with the computation of the mean for my data. On top of having entire rows equal null values (NaN), the data has also entire rows with valid values equal to 0 and so I'm having fatal error on division by zero. How to revise the code such that if entire row has values of NaN and zero, it will print NaN and 0, respectively.

Many many thanks.

Small conditional before a/b in the print statement:

awk '/[0-9]/ {a=b=0;for(i=1;i<=NF;i++)if($i+0==$i){a+=$i;b++};print  b== 0 ? 0 :  a/b; next;} {print "NaN"; }' file1
1 Like

NaN (="Not a Number") can be a special value POSIX Floating Point Problems - The GNU Awk User's Guide , so this adaptation may be necessary:

awk '{a=b=0; for(i=1;i<=NF;i++)if($i!="NaN"){a+=$i;b++}; print (b==0)?$1:a/b}' infile

See also:

http://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html\#tag\_20\_06_18

3 Likes

Thank you very much for all the help on this.
@ agama: what does the / [0-9] / do in the awk expression?
@ scrutinizer: thanks for the useful information of awk capabilities.

[0-9] -> Is a character class which refers to any single character which is a digit in range 0 - 9.
So, /[0-9]/ is a pattern which means, if the input text which is being read contains a number, then match is a success.

2 Likes

Thanks balajesuri for this explanation.:slight_smile: