multiple files: counting

asanjuan · September 23, 2010, 6:42am

In a directory, I have 5000 multiple files that contains around 4000 rows with 10 columns in each file containing a unique string 'AT' located at 4th column.

OM   3328   O     BT   268       5.800      7.500      4.700      0.000     1.400
OM   3329   O     BT   723       8.500      8.900      3.600      8.500     1.400
OM   3330   O     AT   231       6.700      5.500      7.600      0.000     1.400
OM   3331   O     AT   234       1.200      7.700      5.500      8.500     1.400
OM   3332   O     AT   256       3.800      5.800      5.200      0.000     1.400

I want to make a script to count how many 'AT' are found in each file, then print this count into a new file at second column. For example below, the 1st column represents the file number and the second column represents the counts. Since I have 5000 multiple files, I expect the first column to be 1....5000.

Thank you for helping.

-A

kurumi · September 23, 2010, 6:51am

ruby -e 'Dir["*"].each_with_index{|x,y| print "#{y+1} #{File.read(x).split.count("AT")}\n"}'

asanjuan · September 23, 2010, 7:10am

Thanks kurumi.

The script works and thanks for the help.

In the directory, the files are numbered in order and should be printed also in order accordingly into the new file. So in this case, file 1 should be read in first followed by file 2 and so forth until file 5000. How to make it that way?

The small problem was that the files are being read in random order, which is not the way the script should do for the counting. So for example:

Directory:

1.txt
2.txt
3.txt
.
.
.
5000.txt

Newfile:

Column 1, row1 is the first file (1.txt), column 1 row 2 is the second file (2.txt) and goes on until last 5000.txt

-A

pravin27 · September 23, 2010, 7:48am

How about this,

ls [0-9]*.txt| xargs grep -wc "AT" | sort -t"." -n > newfile

asanjuan · September 23, 2010, 8:13am

Bottleneck is to tell the script to read files inside directory by sorting it out numerically. Then, print this out accordingly into the newfile.

---------- Post updated at 08:03 PM ---------- Previous update was at 07:50 PM ----------

Hi pravin27, thanks.

The script gave the output:

1.txt  :5
2.txt  :0
3.txt  :8

The : symbol is not needed, as this cant be read in into making 2d graph. Also, file extension should not also appear in the first column.

The output expected is:

1  5
2  0
3  8

Thanks again for the time helping me.

-A

---------- Post updated at 08:13 PM ---------- Previous update was at 08:03 PM ----------

Since the expected newfile should be printed out without the suffix *.txt in 1st column, I modified the files in the directory such that:

Directory:

Then i issue this command at console:

ls * | xargs grep -wc "AT" | sort -t"." -n > newfile

Output looks:

1:4
2:5
3:0

How to tell script not have the symbol ":" in between columns 1 and 2. I want to see a space not this symbol ":" in between the two columns.

-A

kurumi · September 23, 2010, 8:15am

ruby -e 'Dir["*"].sort_by{|x| x.to_i}.each_with_index{|x,y| print "#{y+1} #{File.read(x).split.count("AT")}\n"}'

pravin27 · September 23, 2010, 8:16am

try this,

ls [0-9]*.txt| xargs grep -wc "AT" | awk -F":" '{gsub(/.txt/,"");printf "%-4d %d\n", $1,$2}' | sort -n > newfile

asanjuan · September 23, 2010, 8:30am

Maybe I should use the command replace to change the colon ( : ) punctuation into a space. The output should look like:

1 4
2 5
3 0

---------- Post updated at 08:30 PM ---------- Previous update was at 08:18 PM ----------

THANKS so much Kurumi and Pravin27. Both work perfectly.

I am so delighted. Thanks again.

-A