I want to compute for the average of a file with null values (NaN) for each row. any help on how to do it. the sample file looks like this.
1.4 1.2 1.5 NaN 1.6
1.3 1.1 NaN 1.3 NaN
2.4 1.3 1.5 NaN 1.5
NaN 1.2 NaN 1.4 NaN
I need to do a row-wise averaging such that it will sum only all the valid values/ divided by the number of valid values. Null values are interspersed within rows and I want to disregard these values when computing for the average like the ones below:
Thank you very much for the code its just what I needed, thanks for saving my night,:)
---------- Post updated at 09:42 PM ---------- Previous update was at 08:53 PM ----------
There is one more glitch, am sorry for missing this out. I have some rows with entirely null values and as such using the previous code I ended up with getting fatal error (attempted division by zero). how can it be resolved? how to tell the script that when all values in the row is null (NaN), put NaN on that corresponding row average.
1.4 1.2 1.5 NaN 1.6 = (1.4+1.2+1.5+1.6)/4
1.3 1.1 NaN 1.3 NaN = (1.3+1.1+1.3)/3
2.4 1.3 1.5 NaN 1.5 = (2.4+1.3+1.5+1.5)/4
NaN 1.2 NaN 1.4 NaN = (1.2+1.4)/2
NaN NaN NaN NaN NaN = NaN
NaN NaN NaN NaN NaN = NaN
Thanks for the code firstly. i tried running it using my actual data and I'm getting erroneous values in the computation of the mean. here's the part of actual data I've been working on: the file is 10000 lines with 6 columns and columns are spaced-separated.
NaN NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN NaN
18.0972824097 20.2922077179 20.7254753113 15.4400615692 9.52701663971 6.10305690765
18.2239189148 20.7757129669 21.2065505981 15.8027992249 9.77007102966 6.23900651932
17.9261703491 20.5239772797 20.9382820129 15.6156806946 9.65097522736 6.15498161316
16.0799484253 17.7822532654 18.1771831512 13.5831022263 8.32423019409 5.3343873024
13.5023841858 13.697104454 14.0042314529 10.525557518 6.38268995285 4.15324020386
10.5770702362 8.93392467499 8.88776779175 6.87884998322 4.28267669678 2.9421877861
8.19773292542 4.39066982269 3.65492200851 3.2539999485 2.55352306366 2.10937023163
8.2882642746 2.93434286118 1.46687304974 1.83887648582 2.41666197777 2.36591768265
NaN NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN NaN
12.4762687683 6.42608261108 4.62352132797 4.02927875519 4.84602165222 4.74294662476
12.6041641235 7.9089307785 6.68155002594 5.02241182327 5.18983364105 5.45300674438
11.5241088867 8.65106964111 8.1567325592 5.48382425308 4.93830966949 5.75111293793
8.8475933075 7.91144323349 8.00369262695 4.8858923912 3.93336176872 5.43343114853
5.36913251877 6.08526515961 6.57769536972 3.54072499275 2.55421519279 4.78161048889
2.62924838066 4.02733469009 4.65754842758 2.14654231071 1.49035787582 4.13778400421
0.869714140892 1.89960038662 2.38566756248 0.849869251251 0.867563068867 3.59587478638
0.642251551151 0.681076049805 0.944821119308 0.245550200343 0.886436700821 3.47824835777
0.94763982296 0.349150121212 0.488753944635 0.175734773278 1.11330735683 3.5775718689
1.03893077374 0.26232483983 0.338562995195 0.209268793464 1.22743105888 3.58259248734
1.09674620628 0.225983262062 0.226195812225 0.251310884953 1.2021933794 3.26848888397
1.14800679684 0.295170128345 0.211536288261 0.306422531605 1.04613471031 2.6469912529
1.15812194347 0.478137284517 0.354398548603 0.366809636354 0.876217782497 2.05036878586
1.14090454578 0.649072349072 0.532046616077 0.396946698427 0.740904033184 1.56694638729
1.08868527412 0.764490664005 0.682165384293 0.391505628824 0.638526678085 1.1559252739
1.07790124416 0.883833229542 0.858095645905 0.436854451895 0.658724308014 0.953302264214
1.13274395466 1.02168250084 1.07939708233 0.566049337387 0.787775695324 0.894795715809
1.20699775219 1.16161489487 1.32476341724 0.755846261978 0.977323055267 0.929328680038
NaN NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN NaN
Hi I'm sorry again, I have come across another problem with the computation of the mean for my data. On top of having entire rows equal null values (NaN), the data has also entire rows with valid values equal to 0 and so I'm having fatal error on division by zero. How to revise the code such that if entire row has values of NaN and zero, it will print NaN and 0, respectively.
Thank you very much for all the help on this.
@ agama: what does the / [0-9] / do in the awk expression?
@ scrutinizer: thanks for the useful information of awk capabilities.
[0-9] -> Is a character class which refers to any single character which is a digit in range 0 - 9.
So, /[0-9]/ is a pattern which means, if the input text which is being read contains a number, then match is a success.