Help: Counting values less than a number

So I have several files (35000, to be exact) in the format rmsd_protein_*.dat each with 2 columns and 35000 rows.

I would like to count how many values in the second column are less than 3 for each file, and output it into a new file so that it ultimately appears as:

1           14057
2           20598

.....

where the left column is the * value from the data file, and the number on the left is the number of values from that file less than three.

Unfortunately, I'm a n00b as far as shell scripts go, and even though this would be the easiest thing in the world in JAVA I just can't figure it out in scripts. Thoughts?

#!/bin/sh

ls -1 | grep "rmsd_protein_.*.dat" | while read file
do
   f=${file##*_}
   f=${f%%[.]*}
   awk '$2 < 3 {c++} END {print f, c}' f=$f $file
done > new_file.dat

Unfortunately I get the error "while: Expression Syntax." when I try to run that...?

try sh, bash, or ksh.

C shell (on Red Hat Enterprise Linux Workstation release 6.5 (Santiago), if that matters in any way)

for csh :

#!/bin/csh

foreach file ( "`ls -1 | grep 'rmsd_protein_.*.dat' `" )
   awk '$2 < 3 {c++} END {sub(".*_", "", f); sub("[.].*", "", f); print f, c}' f=$file $file
end

I am unsure what you mean exactly, but I take it you want one line for each file with the number of lines where $2 was <3 plus the total number of lines for that file.. If so, this should be fairy quick:

find . -name 'rmsd_protein_*.dat' -exec awk 'FNR==1{if(NR>1) print c,t; c=0} $2<3{c++}{t=FNR} END{print c,t}' OFS='\t' {} + > newfile.dat

Ah ha, this runs and reports the results as desired -- however, I would like it in a dat file rather than just printed to the screen, if possible. The only way I've done an awk to a file before is

awk 'insertawkcommandshere' > newfile.dat

but that enters an infinite loop with this; how do I fix that?

The loop is not infinite. It ends when the last file is processed.

#!/bin/csh

rm -f new_file.dat

foreach file ( "`ls -1 | grep 'rmsd_protein_.*.dat' `" )
   awk '$2 < 3 {c++} END {sub(".*_", "", f); sub("[.].*", "", f); print f, c}' f=$file $file >> new_file.dat
end

It certainly looks like it should work to me, but this creates an empty file and still prints the results to the screen instead..?

---------- Post updated at 02:02 PM ---------- Previous update was at 01:59 PM ----------

(additional note: I tried putting the "> data.dat" after the "f= $file $file" and then it at least filled in the data document rather than printing to the screen, but every new line overwrites the last line...)

see corrected script.

Eyyyy, perfect! Thanks!

OK, with these requirements this is a version without a shell loop, which should run fairly quickly:

find . -name 'rmsd_protein_*.dat' -exec awk 'FNR==1{if(NR>1) print F[2],c; split(FILENAME,F,/.*_|[.]/); c=0} $2<3{c++}END{print F[2],c}' OFS='\t' {} + > newfile.dat