Total count in each category for given file list

I have list of file names in filename.txt
below is file format
>>File1
_________________________

01~12345~Y~YES~aaaaa~can
02~23456~N~NO~bbbbb~can
.
.
.
99~23

__________________________

Need to find total count from each file depending on specific string and add them to have total count in each category.

1. directory/${LINE}|tail -1|grep -i "99"|cut -d  '~' -f 2
2. directory/${LINE}|grep '~YES~'| wc -l
3. directory/${LINE}|grep '~No~'| wc -l

I need output as :

1.total count : 23  #addition of trailer values for each file '99' suggests it's a trailer
2. YES : 10  # total occurrences of '~YES~' 
3. NO : 5     # total ocurrences of '~NO~'  

Not sure what you want your output to look like but this might get you started:

while read file
do
  awk -F~ '
    {if($1=="99")T=$2; else T=x}
    /~YES~/{Y++}
    /~NO~/{N++}
    END{
       print FILENAME
       print "\t99:" T
       print "\tYES:" Y
       print "\tNO:" N
    }' $file
done< filename.txt
1 Like

Thanks Chubler_XL,
Can i call two different loops inside one ksh script one for total count and other for YES /No?

Yes but it's less efficient as filename.txt and the individual files are read twice:

while read file
do
    printf "$file\n\t99: %s\n" $(sed -n '$s/^99~//p' $file)
done < filename.txt

printf "\n\n"

while read file
do
  awk -F~ '
    {if($1=="99")T=$2; else T=x}
    /~YES~/ {Y++}
    /~NO~/{N++}
    END{
       print FILENAME
       print "\tYES:" Y
       print "\tNO:" N
    }' $file
done< filename.txt
1 Like

Thanks dear,

Actually efficiency will not be considered in this case as filename.txt will have maximum 10 record(file names).

Can you please tell me if second loop can find YES / No as addition from all files in filename.txt?

Yes 2nd loop is counting records with ~YES~ and ~NO~

I left a line from the previous solution in that can be removed:

while read file
do
    printf "$file\n\t99: %s\n" $(sed -n '$s/^99~//p' $file)
done < filename.txt

printf "\n\n"

while read file
do
  awk -F~ '
    /~YES~/ {Y++}
    /~NO~/{N++}
    END{
       print FILENAME
       print "\tYES:" Y
       print "\tNO:" N
    }' $file
done< filename.txt
1 Like

what will be the scripts if we go for merging files and then counting individual count?
I need output in below format

Total Records : 263
YES : 143
NO : 8
No Response : 102

It could look like this:

awk '
  /~YES~/ {Y++}
  /~NO~/{N++}
  END{
    print "Total Records : " NR
    print "YES : " Y
    print "NO : " N
    print "No Response : " NR - Y - N
  }' $(cat filename.txt)
1 Like

Thanks Buddy.

can I store this output of awk to other file?

---------- Post updated 09-13-14 at 12:50 AM ---------- Previous update was 09-12-14 at 08:06 AM ----------

I have achieved the required output and saved in file as

Total Records : 263
YES : 143
NO : 8
No Response : 102

How to do mathematical operations using awk ?

For above output i need one more additional field Junk which needs to be calculated from above results as
Junk : Total Records -(YES+NO+No Response)
for above case :
Junk :10 with values 263 -(143+8+102)

And, finally i want to print this output in next line in the same file which have above result

Total Records : 263
YES : 143
NO : 8
No Response : 102
Junk : 10

Please show us your code. If you had used what Chubler_XL suggested in message #8 in this thread, "Junk" should always be zero. The math from his suggestion when expanded to what you're asking for would be:

Junk = Total Records - Yes - No - No Response

which expands to:

Junk = NR - Y - N - (NR - Y - N)

which is zero.

1 Like

Hi Don,

I don't have exact code here with me but i can tell you the process

  1. I did merge of all the files using cat
  2. then found total count using awk on merged file and put all results into new file
    awk -F'~' '($1=="99") { print $2 }' file >>new file

3.found Yes, No , No response using cat |grep |wc -l >>new file
eg .YES: cat file|grep '~YES~' |wc -l

You have shown us code that adds data to an existing file (with no indication of what was in the file to begin with). You have shown us code that would generate syntax errors.

Please wait until you can show us your EXACT code and post it so we can help you debug it. And post sample data that shows the problems you are facing. (All using CODE tags.)

1 Like

Hi Don,

Below is the code :

 
while read LINE
do
cat ${LINE}>>count.xls
done < file.txt
print "Total Records : $(awk -F'~' '($1=="99") { sum+=$2 } END { print sum+0 }' count.xls)">>final_count.xls
print "\n">>final_count.xls
print "YES : $(cat count.xls|grep -i '~Y or YES~' | wc -l)">>final_count.xls
print "NO : $(cat count.xls|grep -i '~N or NO~' |wc -l)">>final_count.xls
print "No Response : $(cat count.xls|grep -i '~No Response~' |wc -l)">>final_count.xls
if [ $? -ne 0 ]
then
echo failed
exit 12
fi
 
file.txt :
contains list of files.

this line doesn't work:

print "YES : $(cat count.xls|grep -i '~Y or YES~' | wc -l)

try replace by:

printf "YES:%s\n" "$(cat count.xls | grep "YES~\|~Y " | wc -l)"

Your line under doesn't work so

The line
print "YES : $(cat count.xls|grep -i '~Y or YES~' | wc -l)">>final_count.xls

is working correctly for me as i have data delimited by ~ and I am looking for occurances of field Y or YES .

No, you are not. This grep command will only match lines that contain the exact string ~Y or YES~ . If you want to match any line that contains ~Y , ~y , YES~ , YEs~ , YeS~ , Yes~ , yES~ , yEs~ , yeS~ , or yes~ you need something much more like what protocomm suggested (but for EREs with alternation, you need egrep instead of grep ).

Please describe in English exactly what you are hoping to match with your grep commands. Note also since you apparently aren't matching complete fields, your search for NO will also match the lines containing No Resonse . I.e., the lines containing ~NO~ and the lines containing ~No Response~ will both be counted as matching ~NO when doing case insensitive matches. Is that really what you want?

1 Like

Hi Don,

I need 4 different counts with their match for exact values as :

YES : ~Y or YES~
No : ~N or NO~
No Response : ~No Response~
Junk : Other than above 3

OK.

So this is a completely different problem than what you described in the 1st post in this thread. All of the lines you showed us in that post:

01~12345~Y~YES~aaaaa~can
02~23456~N~NO~bbbbb~can
.
.
.
99~23

are now to be counted as Junk and the counts of all of the other three categories (using your sample input) are to be zero.

So, the following totally untested code (since we have no samples to test) might do what you want:

awk -F'~' '
$1 == 99 { tr += $2 }
index("~Y or YES~") { y++;next }
index("~N or NO~") { n++;next }
index("~No Response~") { nr++;next }
{j++}
END {   printf("Total Records : %d\n\n", tr)
        printf("YES : %d\n\nNo : %d\n\nNo Response : %d\n\nJunk : %d\n", y, n, nr, j)
}
' $(cat file.txt) > final_count.txt
if [ $? -ne 0 ]
then    echo failed >&2
        exit 12
fi

Notes:

  1. The code in red adds the Total Records printout that the script you showed us in post #13 in this thread produced. If you no longer want that to be included in your output (as shown in post #17 in this thread), remove the code shown in red.
  2. This code assumes that no more than one of the strings you're looking for will occur on any single line.
  3. If the files named in file.txt are Excel spreadsheet files (as implied by your name for your temporary file), this awk script (and your series of grep commands) will not work.
  4. I changed the name of your output file from final_count.xls to final_count.txt because your requested output is a text file; not an Excel spreadsheet file.
  5. I redirected your script's diagnostic message to stderr instead of stdout.
  6. I create (or overwrite) your output file instead of appending to the current contents. If you really want the output from multiple runs of your script to append, change the > in the last line of the awk command to >> .
  7. If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk , /usr/xpg6/bin/awk , or nawk .
1 Like