How to find a count of a word within a file

bd_joy · May 2, 2008, 2:51pm

Hello,
I'm looking for a wait to count the number of occurrences of a certain string of characters within a file. The file that I trying to parce has segments within the file that have a header and footer to each segment and I'm trying to do a count of the header string and compare it to a count of the footer string to verify the integrity of the file.

Example file:
HEADER misc text and numeric values FOOTER HEADER more misc info FOOTER HEADER etc etc FOOTER

There are no carriage returns within the file, and it's about 50 to 60MB text file so the process needs to be somewhat efficient as a short processing timeframe is required.

I've done several searches and tried using wc, tr, and sort is a variety of different ways but I'm no closer to finding a solution. I'm a novice to the use of utilities like sed or awk, but ideas using them are welcome.

Other general info: I use ksh on AIX 5.3. Thanks for any help!

Franklin52 · May 2, 2008, 3:28pm

You can try something like this, it prints the numbers of the words HEADER and FOOTER:

awk 'BEGIN{RS=" "}/HEADER/{h++}/FOOTER/{f++}END{print h, f}' file

Regards

bd_joy · May 2, 2008, 3:37pm

Thanks for the reply! I'll test any post the results.

in2nix4life · May 2, 2008, 4:17pm

Here's another way:

#!/bin/ksh
str1=`cat file.txt | tr ' ' '\n' | grep "HEADER" | wc -l | sed 's/^[ \s]//'`
str2=`cat file.txt | tr ' ' '\n' | grep "FOOTER" | wc -l | sed 's/^[ \s]//'`

print "$str1 instances of 'HEADER' found!"
print "$str2 instances of 'FOOTER' found!"

exit 0

Hope this helps.

vgersh99 · May 2, 2008, 6:43pm

$ echo 'HEADER misc text and numeric values FOOTER HEADER more misc info FOOTER HEADER etc etc FOOTER' | nawk -F'(HEADER)|(FOOTER)' '{print "header+footer-> " NF-1}'
header+footer-> 6

If header/footer come in 'pairs', the odd number would constitute the misconfiguration.

DrZoidberg · July 14, 2008, 8:54am

I've got a follow up bump to add to this. I need to count the number of rows that start with the number 10. I can't count the instances because 10 often appears in the rows as well.

I'd be very grateful for some help on this.

joeyg · July 14, 2008, 9:06am

cat file1 | grep "^10 " | wc -l

will return number of lines that begin with a 10 (and not 100)

DrZoidberg · July 14, 2008, 9:17am

Its returning: 0

The list looks like this -the private personal data I've removed.

101149025 551105931219941213P1
101159450 481231331219941218P1
101306874 651101521319950620P1

So, it should be working shouldn't it?

joeyg · July 14, 2008, 9:24am

cat file1 | grep "^10" | wc -l

without the extra space after the 10 -- from your original description, I thought you meant a 10 and then a space.

DrZoidberg · July 14, 2008, 9:29am

Thanks, Joeyg. That did it. Thanks for taking your time to help. I'm such a Unix idiot.