Total count in each category for given file list

santoshdrkr · September 10, 2014, 3:07pm

I have list of file names in filename.txt
below is file format
>>File1
_________________________

01~12345~Y~YES~aaaaa~can
02~23456~N~NO~bbbbb~can
.
.
.
99~23

__________________________

Need to find total count from each file depending on specific string and add them to have total count in each category.

1. directory/${LINE}|tail -1|grep -i "99"|cut -d  '~' -f 2
2. directory/${LINE}|grep '~YES~'| wc -l
3. directory/${LINE}|grep '~No~'| wc -l

I need output as :

1.total count : 23  #addition of trailer values for each file '99' suggests it's a trailer
2. YES : 10  # total occurrences of '~YES~' 
3. NO : 5     # total ocurrences of '~NO~'

Chubler_XL · September 10, 2014, 7:39pm

Not sure what you want your output to look like but this might get you started:

while read file
do
  awk -F~ '
    {if($1=="99")T=$2; else T=x}
    /~YES~/{Y++}
    /~NO~/{N++}
    END{
       print FILENAME
       print "\t99:" T
       print "\tYES:" Y
       print "\tNO:" N
    }' $file
done< filename.txt

santoshdrkr · September 10, 2014, 10:53pm

chubler_xl:

Not sure what you want your output to look like but this might get you started:

while read file
do
  awk -F~ '
   {if($1=="99")T=$2; else T=x}
   /~YES~/{Y++}
   /~NO~/{N++}
   END{
   print FILENAME
   print "\t99:" T
   print "\tYES:" Y
   print "\tNO:" N
   }' $file
done< filename.txt

Thanks Chubler_XL,
Can i call two different loops inside one ksh script one for total count and other for YES /No?

Chubler_XL · September 10, 2014, 11:06pm

Yes but it's less efficient as filename.txt and the individual files are read twice:

while read file
do
    printf "$file\n\t99: %s\n" $(sed -n '$s/^99~//p' $file)
done < filename.txt

printf "\n\n"

while read file
do
  awk -F~ '
    {if($1=="99")T=$2; else T=x}
    /~YES~/ {Y++}
    /~NO~/{N++}
    END{
       print FILENAME
       print "\tYES:" Y
       print "\tNO:" N
    }' $file
done< filename.txt

santoshdrkr · September 10, 2014, 11:19pm

Thanks dear,

Actually efficiency will not be considered in this case as filename.txt will have maximum 10 record(file names).

Can you please tell me if second loop can find YES / No as addition from all files in filename.txt?

Chubler_XL · September 10, 2014, 11:27pm

Yes 2nd loop is counting records with ~YES~ and ~NO~

I left a line from the previous solution in that can be removed:

while read file
do
    printf "$file\n\t99: %s\n" $(sed -n '$s/^99~//p' $file)
done < filename.txt

printf "\n\n"

while read file
do
  awk -F~ '
    /~YES~/ {Y++}
    /~NO~/{N++}
    END{
       print FILENAME
       print "\tYES:" Y
       print "\tNO:" N
    }' $file
done< filename.txt

santoshdrkr · September 11, 2014, 11:04am

what will be the scripts if we go for merging files and then counting individual count?
I need output in below format

Total Records : 263
YES : 143
NO : 8
No Response : 102

Chubler_XL · September 11, 2014, 4:14pm

It could look like this:

awk '
  /~YES~/ {Y++}
  /~NO~/{N++}
  END{
    print "Total Records : " NR
    print "YES : " Y
    print "NO : " N
    print "No Response : " NR - Y - N
  }' $(cat filename.txt)

santoshdrkr · September 12, 2014, 3:20pm

Thanks Buddy.

can I store this output of awk to other file?

---------- Post updated 09-13-14 at 12:50 AM ---------- Previous update was 09-12-14 at 08:06 AM ----------

I have achieved the required output and saved in file as

Total Records : 263
YES : 143
NO : 8
No Response : 102

How to do mathematical operations using awk ?

For above output i need one more additional field Junk which needs to be calculated from above results as
Junk : Total Records -(YES+NO+No Response)
for above case :
Junk :10 with values 263 -(143+8+102)

And, finally i want to print this output in next line in the same file which have above result

Total Records : 263
YES : 143
NO : 8
No Response : 102
Junk : 10

Don_Cragun · September 12, 2014, 4:31pm

Please show us your code. If you had used what Chubler_XL suggested in message #8 in this thread, "Junk" should always be zero. The math from his suggestion when expanded to what you're asking for would be:

Junk = Total Records - Yes - No - No Response

which expands to:

Junk = NR - Y - N - (NR - Y - N)

which is zero.

santoshdrkr · September 13, 2014, 1:59pm

Hi Don,

I don't have exact code here with me but i can tell you the process

I did merge of all the files using cat
then found total count using awk on merged file and put all results into new file
awk -F'~' '($1=="99") { print $2 }' file >>new file

3.found Yes, No , No response using cat |grep |wc -l >>new file
eg .YES: cat file|grep '~YES~' |wc -l

Don_Cragun · September 13, 2014, 2:29pm

You have shown us code that adds data to an existing file (with no indication of what was in the file to begin with). You have shown us code that would generate syntax errors.

Please wait until you can show us your EXACT code and post it so we can help you debug it. And post sample data that shows the problems you are facing. (All using CODE tags.)

santoshdrkr · September 15, 2014, 5:49am

Hi Don,

Below is the code :

 
while read LINE
do
cat ${LINE}>>count.xls
done < file.txt
print "Total Records : $(awk -F'~' '($1=="99") { sum+=$2 } END { print sum+0 }' count.xls)">>final_count.xls
print "\n">>final_count.xls
print "YES : $(cat count.xls|grep -i '~Y or YES~' | wc -l)">>final_count.xls
print "NO : $(cat count.xls|grep -i '~N or NO~' |wc -l)">>final_count.xls
print "No Response : $(cat count.xls|grep -i '~No Response~' |wc -l)">>final_count.xls
if [ $? -ne 0 ]
then
echo failed
exit 12
fi

 
file.txt :
contains list of files.

protocomm · September 15, 2014, 7:12am

this line doesn't work:

print "YES : $(cat count.xls|grep -i '~Y or YES~' | wc -l)

try replace by:

printf "YES:%s\n" "$(cat count.xls | grep "YES~\|~Y " | wc -l)"

Your line under doesn't work so

santoshdrkr · September 15, 2014, 9:44am

The line
print "YES : $(cat count.xls|grep -i '~Y or YES~' | wc -l)">>final_count.xls

is working correctly for me as i have data delimited by ~ and I am looking for occurances of field Y or YES .

Don_Cragun · September 15, 2014, 4:32pm

No, you are not. This grep command will only match lines that contain the exact string ~Y or YES~ . If you want to match any line that contains ~Y , ~y , YES~ , YEs~ , YeS~ , Yes~ , yES~ , yEs~ , yeS~ , or yes~ you need something much more like what protocomm suggested (but for EREs with alternation, you need egrep instead of grep ).

Please describe in English exactly what you are hoping to match with your grep commands. Note also since you apparently aren't matching complete fields, your search for NO will also match the lines containing No Resonse . I.e., the lines containing ~NO~ and the lines containing ~No Response~ will both be counted as matching ~NO when doing case insensitive matches. Is that really what you want?

santoshdrkr · September 15, 2014, 10:40pm

Hi Don,

I need 4 different counts with their match for exact values as :

YES : ~Y or YES~
No : ~N or NO~
No Response : ~No Response~
Junk : Other than above 3

Don_Cragun · September 16, 2014, 1:24am

OK.

So this is a completely different problem than what you described in the 1st post in this thread. All of the lines you showed us in that post:

01~12345~Y~YES~aaaaa~can
02~23456~N~NO~bbbbb~can
.
.
.
99~23

are now to be counted as Junk and the counts of all of the other three categories (using your sample input) are to be zero.

So, the following totally untested code (since we have no samples to test) might do what you want:

awk -F'~' '
$1 == 99 { tr += $2 }
index("~Y or YES~") { y++;next }
index("~N or NO~") { n++;next }
index("~No Response~") { nr++;next }
{j++}
END {   printf("Total Records : %d\n\n", tr)
        printf("YES : %d\n\nNo : %d\n\nNo Response : %d\n\nJunk : %d\n", y, n, nr, j)
}
' $(cat file.txt) > final_count.txt
if [ $? -ne 0 ]
then    echo failed >&2
        exit 12
fi

Notes:

The code in red adds the Total Records printout that the script you showed us in post #13 in this thread produced. If you no longer want that to be included in your output (as shown in post #17 in this thread), remove the code shown in red.
This code assumes that no more than one of the strings you're looking for will occur on any single line.
If the files named in file.txt are Excel spreadsheet files (as implied by your name for your temporary file), this awk script (and your series of grep commands) will not work.
I changed the name of your output file from final_count.xls to final_count.txt because your requested output is a text file; not an Excel spreadsheet file.
I redirected your script's diagnostic message to stderr instead of stdout.
I create (or overwrite) your output file instead of appending to the current contents. If you really want the output from multiple runs of your script to append, change the > in the last line of the awk command to >> .
If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk , /usr/xpg6/bin/awk , or nawk .