Performance of calculating total number of matching records in multiple files

Hello Friends,

I've been trying to calculate total number of a certain match in multiple data records files (DRs).

Let say I have a daily created folders for each day since the beginning of july like the following

drwxrwxrwx 2 mmsuper med 65536 Jul  1 23:59 20150701
drwxrwxrwx 2 mmsuper med 65536 Jul  2 23:59 20150702
drwxrwxrwx 2 mmsuper med 65536 Jul  3 23:59 20150703
drwxrwxrwx 2 mmsuper med 65536 Jul  4 23:59 20150704
drwxrwxrwx 2 mmsuper med 65536 Jul  5 23:59 20150705
drwxrwxrwx 2 mmsuper med 65536 Jul  6 23:59 20150706
drwxrwxrwx 2 mmsuper med 65536 Jul  7 23:59 20150707
drwxrwxrwx 2 mmsuper med 65536 Jul  8 23:59 20150708
drwxrwxrwx 2 mmsuper med 65536 Jul  9 23:59 20150709
drwxrwxrwx 2 mmsuper med 65536 Jul 10 23:59 20150710
drwxrwxrwx 2 mmsuper med 65536 Jul 11 23:59 20150711
drwxrwxrwx 2 mmsuper med 65536 Jul 12 23:59 20150712
drwxrwxrwx 2 mmsuper med 65536 Jul 13 23:59 20150713
.
.

Each folder has tousands of files such as :

-rw-r--r-- 1 mmsuper med  7691 Jul  1 15:30 cdr_2015070103_30306.txt
-rw-r--r-- 1 mmsuper med  2276 Jul  1 15:30 cdr_2015070103_30307.txt
-rw-r--r-- 1 mmsuper med  2633 Jul  1 15:30 cdr_2015070103_30308.txt
-rw-r--r-- 1 mmsuper med  2682 Jul  1 15:31 cdr_2015070103_30309.txt
-rw-r--r-- 1 mmsuper med  2622 Jul  1 15:31 cdr_2015070103_30310.txt
-rw-r--r-- 1 mmsuper med  5592 Jul  1 15:31 cdr_2015070103_30311.txt
-rw-r--r-- 1 mmsuper med  3029 Jul  1 15:31 cdr_2015070103_30313.txt
-rw-r--r-- 1 mmsuper med  6940 Jul  1 15:31 cdr_2015070103_30312.txt
-rw-r--r-- 1 mmsuper med  2610 Jul  1 15:31 cdr_2015070103_30314.txt
-rw-r--r-- 1 mmsuper med  5350 Jul  1 15:32 cdr_2015070103_30315.txt
-rw-r--r-- 1 mmsuper med  2949 Jul  1 15:32 cdr_2015070103_30316.txt

And unfortunately each file has hundreds or several tousands rows (data records) whose FS are commas and in which they have a whole soap request as a field. For example the following is one charging request record :mad: (I have shortened it)

charging,s:so1-751297106414366943416671001:ws1-7512971064-14366943426191000-0001-01,s:so1-751297106414366943416671001,1,6,1,1,0,9647512971064,9647512971064,SMS,2,,,5400291,BasraOffer,,5400259,
20150712124542,20150712124542,,20150430000000,,20150712124542,0,,,1,<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"><soap:Body><ns2:chargeSubscription xmlns:ns2=
"http://flows.vrc.esdp.rmea.ericsson.com/"><subscription><name>QUERY_TARIFF_PLAN</name><value>false</value></attribute></attributes><currentSubscriptionInterval><attributes><attribute>
<name>INITIAL_CHARGE_OPTION</name><value>0</value></attribute><attribute><name>RULE_ID</name><value>121</value></attribute><attribute><name>FULFILL_ON_RESERVE</name><value>1</value>
<faultString>Sending exception</faultString><msisdn>9647512971064</msisdn><paymentMethod>2</paymentMethod><accountId>23399568</accountId><customerId>23669092</customerId><imsi>418400202171510</imsi>
</receiverSubscriber><subscriptionId>0</subscriptionId><subscriptionStartDate>20150712124542</subscriptionStartDate><billCycleStartDate xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:nil="true"/>,
414241,20150712124542,187,1,1,64751297106400249868

I need to calculate total number of spesific matching i.e. $1 = charging, $9 = Subscriber.. and I calculated it but under just one folder:

nawk -F\, '{if ($1=="charging" && $9=="9647512971064" && $29~/<faultString>[Ss]ending [Ee]xception<\/faultString>/) then c++}END{print c}' cdr_201507*txt

Here my question:

As there are hunder tousand files under a few directories how should I calculate the total number of matching fastest way by executing only a one-liner command or script?

Should I first find the files with a find command in a FOR loop and trigger nawk afterwards like the following?

for j in `find . -type f -name "cdr_201507*txt" 2>/dev/null`; 
do nawk ...
done;

I would appreciate your suggestions. I checked but could not find a spesific answer on our forum or some others.

Kind Regards

Your suggested method would start a new nawk on every file and probably be very slow.

Using + terminator instead of ; with find's -exec groups many files together. It's still possible that it ends up being several invocations of nawk because of argument limits. Then we need to somehow keep track of the sum and pass the value around. One solution is to have find just cat all the files and pipe it to nawk.

find . -type f -name 'cdr_201507*txt' -exec cat {} + | nawk -F, '
  $1=="charging" && $9=="9647512971064" && $29~/<faultString>[Ss]ending [Ee]xception<\/faultString>/ {c++}
  END{print c}'