Number of lines smaller than specified value

senayasma · December 22, 2011, 11:06am

Hi All,

I have a problem to find number of lines per column smaller than the values given in a different file. In example, compare the 1st column of file1 with the 1st line of the file2, 2nd column of file1 with the 2nd line of the file2, etc

 
cat file1
 
0.2 0.9 0.8 0.5 ...
0.6 0.5 0.7 0.9 ...
0.1 0.6 0.4 0.8 ...

 
cat file2
 
0.1
0.7
0.6
0.95
...

the output should look like

 
cat output
0
2
1
2
..

Thanks in advance,

itkamaraj · December 22, 2011, 11:23am

$ count=0;while read a; do count=$(expr $count + 1); awk -v b="$a" -v count="$count" 'NR==count{c=0;for(i=1;i<=NF;i++){if($i<b){c++}}print c}' file1; done < file2

ahamed101 · December 22, 2011, 11:28am

Try this...

awk 'NR==FNR{a[++j]=$0;next} {c=0;++k;for(i=1;i<=NF;i++){$i<a[k]?++c:NULL} print c}' file2 file1

Homework?

--ahamed

senayasma · December 22, 2011, 11:57am

Hi Ahamed,

Thanks for helping me. No it is not a homework. These are two files to get the permutated p-values in GWAS.

That script is pretty fast, however it giving the number of columns compared to the value. However i need number of rows per column which is smaller than the value given in the corresponding raw in file2. So, if there is 5000 column and 100 rows. The output should have 5000 values.

Question is howmany values in the 1st column of the file1 are smaller than the 1st value given in file2, etc.

Example.
file1 has 5 column and 3 rows

 
cat file1
 
0.6 0.4 0.9 0.8 0.53 
0.7 0.3 0.4 0.3 0.1
0.9 0.6 0.2 0.1 0.84

file 2 will have 5 rows.

 
cat file2
0.5
0.7
0.3
0.7
0.4

there are no values smaller than 0.5 in file1 so output will be 1 for the first row of the output

 
cat output
 
0
3
1
2
0

ahamed101 · December 22, 2011, 12:00pm

I didn't get you, can you explain with an example?

--ahamed

senayasma · December 22, 2011, 12:08pm

Question is howmany values in the 1st column of the file1 are smaller than the 1st value given in file2, etc.

Example.
file1 has 5 column and 3 rows

 
cat file1
 
0.6 0.4 0.9 0.8 0.53 
0.7 0.3 0.4 0.3 0.1
0.9 0.6 0.2 0.1 0.84

file 2 will have 5 rows.

 
cat file2
0.5
0.7
0.3
0.7
0.4

there are no values smaller than 0.5 in file1 so output will be 1 for the first row of the output

 
cat output
 
0
3
1
2
1

itkamaraj · December 22, 2011, 12:21pm

output should be

0
3
1
3

---------- Post updated at 10:42 PM ---------- Previous update was at 10:41 PM ----------

ok, i got it....we need to compare with column wise...

---------- Post updated at 10:51 PM ---------- Previous update was at 10:42 PM ----------

$ count=0;while read a; do count=$((count + 1)); awk -v count="$count" '{printf("%s ",$count)}END{printf("\n")}' file1 >> file3 && awk -v b="$a" -v count="$count" 'NR==count{c=0;for(i=1;i<=NF;i++){if($i<b){c++}}print c}' file3; done < file2
0
3
1
2
1

ahamed101 · December 22, 2011, 12:39pm

awk 'NR==FNR{ j=1; for(i=1;i<=NF;i++){a[j]=a[j]" "$i;j++}next}
{ c=0; t=split(a[++g],arr," ") for(k=1;k<=t;k++) { if(arr[k]<$1) { c++ } } print c; }' file1 file2

--ahamed

senayasma · December 22, 2011, 1:02pm

Hi Ahamed,

I do not know why this code gives me syntax error.
Could you please help me to figure it out?

Thanks for your time,

ahamed101 · December 22, 2011, 9:02pm

Oops, there was a small mistake!

awk 'NR==FNR{ j=1; for(i=1;i<=NF;i++){a[j]=a[j]" "$i;j++}next}
{ c=0; t=split(a[++g],arr," "); for(k=1;k<=t;k++) { if(arr[k]<$1) { c++ } } print c; }' file1 file2

If solaris, use nawk!

--ahamed