Total record count of all the file present in a directory

Hi All ,
We need one help on the below requirement.We have multiple pipe delimited .txt file(around 100 .txt files) present on one directory.We need the total record count of all the files present in that directory without header.File format as below :

 COUPON_CODE|FRIENDLY_CODE|BRAND|EVENT_NAME|EVENT_ID|MASTER_CAMPAIGN_NAME|CAMPAIGN_COMMON_NAME|WEB_PROMO_ID|POS_PROMO_ID|MAX_STORE_USAGE_LIMIT|MAX_ECOMMERCE_USAGE_LIMIT|IS_REDEEMED|IS_EXPIRED|BARCODE_URL|WEB_REDEMPTION_COUNT|STORE_REDEMPTION_COUNT|CREATED_DATE|UPDATED_AT|START_DATE|EXPIRY_DATE|DO_NOT_EXPIRE|SUBCUSTOMER_ID|SUBCUSTOMER_SC_ID|IS_TEST_CODE|IS_CSR_CODE|REDEMPTION_AMOUNT_THRESHOLD|COUPON_AMOUNT|JOB_ID|IS_REWARDS_CERT|REWARDS_CERT_POINTS_APPLIED|REWARDS_CERT_AMOUNT_ISSUED|ISSUANCE_DATE|WEB_PROMOTION_NAME|STORE_PROMOTION_NAME|EVENT_CODE
12W51PS2PK98387|GWIN17DOL25|Gymboree|Gymboree Holiday 2 Winter DM $25 off $100|||Gym Winter 2 DM|||1|1|N|N||0|0|09/26/2017|09/26/2017|11/20/2017|12/24/2017|N||||N||||N|||11/13/2017|||98387
12W51QRB2N98387|GWIN17DOL25|Gymboree|Gymboree Holiday 2 Winter DM $25 off $100|||Gym Winter 2 DM|||1|1|N|N||0|0|09/26/2017|09/26/2017|11/20/2017|12/24/2017|N||||N||||N|||11/13/2017|||98387
12W51QZV5T98387|GWIN17DOL25|Gymboree|Gymboree Holiday 2 Winter DM $25 off $100|||Gym Winter 2 DM|||1|1|N|N||0|0|09/26/2017|09/26/2017|11/20/2017|12/24/2017|N||||N||||N|||11/13/2017|||98387

 

for one file ,we can get the record count like below :

 wc -l file1.txt
 

Can anyone kindly help me how to get total record count of all the .txt files present in that directory without header.Any help on this regard will be appreciated.Thanks !

Hello STCET22,

Could you please try following and let me know if this helps you(not tested though).

awk 'END{print NR+1-length(ARGV)}' *.txt

Thanks,
R. Singh

2 Likes

Would wc -l *.txt work? It will give you output for each file then a total.

If you just want the total just do this:-

total=$(wc -l *.txt|tail -1)                # Get the long information.  Adjust the wild-card to match all relevant files
read total junk < <(echo $total)            # Parse the line to get the value only
echo "The total number of lines is ${total}"

Another way might be simply:-

total=$(cat *.txt | wc -l)                  # All in one line

I hope that these help,
Robin

Hello rbatte1,

Apologies if I have missed anything here, I think OP has requested to remove the very first line(header) from the count. That is why I removed it in END section of my code.

Thanks,
R. Singh

@RaviderSingh13: Nice approach! Why not

awk 'END{print NR+1-ARGC}' *.txt
2 Likes

Yes, I missed the extra requirement to ignore the header record.

Can we be sure that either:-

  • the files all have the same header?
  • no files will be zero bytes?

For the former, we could:-

total=$(egrep -v "^COUPON_CODE" *.txt | wc -l)                  # All in one line

For the latter (if the headers are different) we could perhaps:-

((total=$(cat *.txt | wc -l) - $(ls -1|wc -l)))

Would either of these work? I'm not sure on it's performance versus awk though.

Robin

1 Like

Hi All,
Thank you all for your help.
@Ravinder,
As I'm new to awk part ,could you kindly explain the code for my understanding.

Hello STCET22,

Could you please go through following and do let me know if this helps you.(Please run code as mentioned in POST#2 only as this is only for explanation purposes.)

awk '
END{                    ###So this is awks END section which will be executed when ALL the Input_file(s) are done with reading by awk. 
print NR+1-length(ARGV) ###Now doing print of value of NR+1-length(ARGV) where NR value is the value of all the lines of Input_files(total of all lines of all files) then adding 1 to it and
                        ###Subtracting the value of length of array ARGV, array named ARGV is the default array for awk which will have the total number of Input_file(s) passed to it.
}                       ###So let's say you passed 3 files so its value will be 3 and value of headings will also be 3 so I am subtracting those lines which have headings in them as per your request.
' *.txt                 ###Mentioning all the .txt Input_file(s) here.

Thanks,
R. Singh

length(array) gives the number of array elements, and is a GNU extension. The standard only knows length(string) . And ARGC !
--
Another standard awk

awk 'FNR>1 {n++} END {print n+0}' *.txt

increment n if the record number of the current file is greater than 1.
At the END print n; the n+0 casts n to the number 0 in case it's unset.