multiple files: counting

In a directory, I have 5000 multiple files that contains around 4000 rows with 10 columns in each file containing a unique string 'AT' located at 4th column.

OM   3328   O     BT   268       5.800      7.500      4.700      0.000     1.400
OM   3329   O     BT   723       8.500      8.900      3.600      8.500     1.400
OM   3330   O     AT   231       6.700      5.500      7.600      0.000     1.400
OM   3331   O     AT   234       1.200      7.700      5.500      8.500     1.400
OM   3332   O     AT   256       3.800      5.800      5.200      0.000     1.400

I want to make a script to count how many 'AT' are found in each file, then print this count into a new file at second column. For example below, the 1st column represents the file number and the second column represents the counts. Since I have 5000 multiple files, I expect the first column to be 1....5000.

1    5
2    0
3    8
4    2
5    2
6    0
7    3
8    5
9    0
10   1

Thank you for helping.

-A

ruby -e 'Dir["*"].each_with_index{|x,y| print "#{y+1} #{File.read(x).split.count("AT")}\n"}' 
1 Like

Thanks kurumi.

The script works and thanks for the help.

In the directory, the files are numbered in order and should be printed also in order accordingly into the new file. So in this case, file 1 should be read in first followed by file 2 and so forth until file 5000. How to make it that way?

The small problem was that the files are being read in random order, which is not the way the script should do for the counting. So for example:

Directory:

1.txt
2.txt
3.txt
.
.
.
5000.txt

Newfile:

1    5
2    0
3    8
4    2
5    2
6    0
7    3
8    5
9    0
10   1

Column 1, row1 is the first file (1.txt), column 1 row 2 is the second file (2.txt) and goes on until last 5000.txt

-A

How about this,

ls [0-9]*.txt| xargs grep -wc "AT" | sort -t"." -n > newfile

Bottleneck is to tell the script to read files inside directory by sorting it out numerically. Then, print this out accordingly into the newfile.

---------- Post updated at 08:03 PM ---------- Previous update was at 07:50 PM ----------

Hi pravin27, thanks.

The script gave the output:

1.txt  :5
2.txt  :0
3.txt  :8

The : symbol is not needed, as this cant be read in into making 2d graph. Also, file extension should not also appear in the first column.

The output expected is:

1  5
2  0
3  8

Thanks again for the time helping me.

-A

---------- Post updated at 08:13 PM ---------- Previous update was at 08:03 PM ----------

Since the expected newfile should be printed out without the suffix *.txt in 1st column, I modified the files in the directory such that:

Directory:

1
2
3
.
.
.5000

Then i issue this command at console:

ls * | xargs grep -wc "AT" | sort -t"." -n > newfile

Output looks:

1:4
2:5
3:0

How to tell script not have the symbol ":" in between columns 1 and 2. I want to see a space not this symbol ":" in between the two columns.

-A

ruby -e 'Dir["*"].sort_by{|x| x.to_i}.each_with_index{|x,y| print "#{y+1} #{File.read(x).split.count("AT")}\n"}'

try this,

ls [0-9]*.txt| xargs grep -wc "AT" | awk -F":" '{gsub(/.txt/,"");printf "%-4d %d\n", $1,$2}' | sort -n > newfile

Maybe I should use the command replace to change the colon ( : ) punctuation into a space. The output should look like:

1 4
2 5
3 0

---------- Post updated at 08:30 PM ---------- Previous update was at 08:18 PM ----------

THANKS so much Kurumi and Pravin27. Both work perfectly.

I am so delighted. Thanks again.

-A