How to find repeated string in a text file

I have a text file where I need to find the string = ST*850*
This string is repetaed several times in the file, so I need to know how many times it appears in the file, this is the text files:

ISA*00* *00* *08*925485USNR *ZZ*IMSALADDERSP *110824*1631*:*00501*850001355*0*P*>~GS*PO*925485USNR*IMSALADDERSP*20110824*1631*850001355*X*005010~ST*850*2262~BEG*00*SA*31016446**20110824~CUR*BY*USD~REF*IA*541177~REF*19*01~REF*MR*ZUS1~REF*AFN*ZZ~PER*BD*Aleshia Simrell*TE*4792041336~ITD*08*3*1**40~ITD*01*3~DTM*996*20110919~N1*BT*WalMart Stores Inc.*UL*0078742061078~N1*ST*Hayes Retail Services TX 7862*UL*0078742066493~N1*SU*LOUISVILLE LADDER INC~PO1*00010*5*EA*71.01*LE*IN*100016092~PID*F****4 STEP FIBERGLASS LADDER PLATFORM~SDQ*EA*UL*0078742000589*5~N9*L1*SPECIAL INSTRUCTIONS~MTX**Color Length 0.000 Width 0.000 Height~MTX**0.000 Unit of Dim. Size Unit of Mea.EA Make ModelSA~AMT*1*355.05~PO1*00020*6*EA*90.79*LE*IN*100016093~PID*F****6 STEP FIBERGLASS LADDER PLATFORM~SDQ*EA*UL*0078742000589*6~N9*L1*SPECIAL INSTRUCTIONS~AMT*1*544.74~CTT*2~AMT*TT*899.79~SE*29*2262~ST*850*2263~BEG*00*SA*31016447**20110824~CUR*BY*USD~REF*IA*541177~REF*19*01~REF*MR*ZUS1~REF*AFN*ZZ~PER*BD*Aleshia Simrell*TE*4792041336~ITD*08*3*1**40~ITD*01*3~DTM*996*20110919~N1*BT*WalMart Stores Inc.*UL*0078742061078~N1*ST*Hayes Retail Services TX 7862*UL*0078742066493~N1*SU*LOUISVILLE LADDER INC~PO1*00010*1*EA*127.06*LE*IN*100016094~PID*F****8 STEP FIBERGLASS LADDER~SDQ*EA*UL*0078742000589*1~N9*L1*SPECIAL INSTRUCTIONS~MTX**Color Length 0.000 Width 0.000 Height~MTX**0.000 Unit of Dim. Size Unit of Mea.EA Make ModelSA~AMT*1*127.06~CTT*1~AMT*TT*127.06~SE*24*2263~ST*850*2264~BEG*00*SA*31016448**20110824~CUR*BY*USD~REF*IA*541177~REF*19*01~REF*MR*ZUS1~REF*AFN*ZZ~PER*BD*Aleshia Simrell*TE*4792041336~ITD*08*3*1**40~ITD*01*3~DTM*996*20110919~N1*BT*WalMart Stores Inc.*UL*0078742061078~N1*ST*Hayes Retail Services TX 7862*UL*0078742066493~N1*SU*LOUISVILLE LADDER INC~PO1*00010*2*EA*90.79*LE*IN*100016093~PID*F****6 STEP FIBERGLASS LADDER PLATFORM~SDQ*EA*UL*0078742000589*2~N9*L1*SPECIAL INSTRUCTIONS~MTX**Color Length 0.000 Width 0.000 Height~MTX**0.000 Unit of Dim. Size Unit of Mea.EA Make ModelSA~AMT*1*181.58~PO1*00020*1*EA*127.06*LE*IN*100016094~PID*F****8 STEP FIBERGLASS LADDER~SDQ*EA*UL*0078742000589*1~N9*L1*SPECIAL INSTRUCTIONS~AMT*1*127.06~PO1*00030*1*EA*191.72*LE*IN*100016096~PID*F****TWELVE STEP FIBERGLASS LADDER~SDQ*EA*UL*0078742000589*1~N9*L1*SPECIAL INSTRUCTIONS~AMT*1*191.72~PO1*00040*10*EA*71.01*LE*IN*100016092~PID*F****4 STEP FIBERGLASS LADDER PLATFORM~SDQ*EA*UL*0078742000589*10~N9*L1*SPECIAL INSTRUCTIONS~AMT*1*710.1~PO1*00050*5*EA*55*LE*IN*100016091~PID*F****2 STEP FIBERGLASS LADDER PLATFORM~SDQ*EA*UL*0078742000589*5~N9*L1*SPECIAL INSTRUCTIONS~AMT*1*275~CTT*5~AMT*TT*1485.46~SE*44*2264~ST*850*2265~BEG*00*SA*31016449**20110824~CUR*BY*USD~REF*IA*541177~REF*19*01~REF*MR*ZUS1~REF*AFN*ZZ~PER*BD*Linda Cheek*TE*4792042014~ITD*08*3*1**40~ITD*01*3~DTM*996*20110829~N1*BT*WalMart Stores Inc.*UL*0078742061078~N1*ST*Hayes Retail Services TX 7862*UL*0078742066493~N1*SU*LOUISVILLE LADDER INC~PO1*00010*4*EA*71.01*LE*IN*100016092~PID*F****4 STEP FIBERGLASS LADDER PLATFORM~SDQ*EA*UL*0078742067391*4~N9*L1*SPECIAL INSTRUCTIONS~MTX**Color Length 0.000 Width 0.000 Height~MTX**0.000 Unit of Dim. Size Unit of Mea.EA Make ModelSA~AMT*1*284.04~PO1*00020*6*EA*90.79*LE*IN*100016093~PID*F****6 STEP FIBERGLASS LADDER PLATFORM~SDQ*EA*UL*0078742067391*6~N9*L1*SPECIAL INSTRUCTIONS~AMT*1*544.74~CTT*2~AMT*TT*828.78~SE*29*2265~ST*850*2266~BEG*00*SA*31016450**20110824~CUR*BY*USD~REF*IA*541177~REF*19*01~REF*MR*ZUS1~REF*AFN*ZZ~PER*BD*Linda Cheek*TE*4792042014~ITD*08*3*1**40~ITD*01*3~DTM*996*20110829~N1*BT*WalMart Stores Inc.*UL*0078742061078~N1*ST*Hayes Retail Services TX 7862*UL*0078742066493~N1*SU*LOUISVILLE LADDER INC~PO1*00010*2*EA*90.79*LE*IN*100016093~PID*F****6 STEP FIBERGLASS LADDER PLATFORM~SDQ*EA*UL*0078742067391*2~N9*L1*SPECIAL INSTRUCTIONS~MTX**Color Length 0.000 Width 0.000 Height~MTX**0.000 Unit of Dim. Size Unit of Mea.EA Make ModelSA~AMT*1*181.58~PO1*00020*1*EA*127.06*LE*IN*100016094~PID*F****8 STEP FIBERGLASS LADDER~SDQ*EA*UL*0078742067391*1~N9*L1*SPECIAL INSTRUCTIONS~AMT*1*127.06~PO1*00030*1*EA*191.72*LE*IN*100016096~PID*F****TWELVE STEP FIBERGLASS LADDER~SDQ*EA*UL*0078742067391*1~N9*L1*SPECIAL INSTRUCTIONS~AMT*1*191.72~PO1*00040*10*EA*71.01*LE*IN*100016092~PID*F****4 STEP FIBERGLASS LADDER PLATFORM~SDQ*EA*UL*0078742067391*10~N9*L1*SPECIAL INSTRUCTIONS~AMT*1*710.1~PO1*00050*5*EA*55*LE*IN*100016091~PID*F****2 STEP FIBERGLASS LADDER PLATFORM~SDQ*EA*UL*0078742067391*5~N9*L1*SPECIAL INSTRUCTIONS~AMT*1*275~CTT*5~AMT*TT*1485.46~SE*44*2266~GE*5*850001355~IEA*1*850001355~

Please encode data with code tags

awk '{ tmp=$0
         cnt=0
         i=index(tmp, "ST*850*")
         while(i>0)
         {
                cnt++;
                tmp=substr(tmp,i)
                i=index(tmp, "ST*850*")
          }
          END {print "Found ", cnt, " Times" } '  inputfilename
grep -o "ST\*850\*" infile |wc -l

Create a file where you store the instances you are looking for like Strings.txt and create another file where you paste this script and name it String_Freq.awk

awk -f String_Freq.awk Strings.txt

strings.txt
ST*850*
ISA*00*
BEG*00*

String_Freq.awk

NR==FNR {words[++nwords]=$0;next}
{for(i=1;i<=NF;i++) freq[$i]++}
END {for(w=1;w<=nwords;w++)
{if (freq[words[w]]+0>0) print "Instances of " words[w] " : " freq[words[w]]+0}}

another way by gawk

gawk -F "ST\\\*850\\\*" '{print NF-1}' infile

jum mcnamara your code give errors:
Syntax Error The source line is 10.
The error context is
>>> END <<< {print "Found ", cnt, " Times" }
awk: 0602-502 The statement cannot be correctly parsed. The source line is 10.
awk: 0602-540 There is a missing } character.

rdcwayx
my grep command does not have -o option

check my another reply, if your awk is GAWK, it should work for you.

nawk '{c+=gsub("ST
[*]850
[*]", "&")}END{print c}' myFile

rdcwayx
it works fine, but now is more complex, so I have a Directory with 70,000 files with differents dates, so the goal is for example get how many times the string is repeated in all files from a date ( for example the September month)
Thanks for your help.

take a look at my previous suggestion - substitute the 'myFile' with the wild-carded file names for month of September. I we don't know how your files are named, it's hard to provide a more detailed hint.
The below will 'grab' all the files and provide the total. Start with that.

nawk '{c+=gsub("ST
[*]850
[*]", "&")}END{print c}' /path/2/dir/with/files/*
1 Like

Here is the update for your new request

awk -F "ST\\\*850\\\*" '{sum+=NF-1}END{print sum}' *sept*

ok guys, Im trying this scritp:

clear
echo "Please enter the start date in the format MMM DD, example: Jun 25"
read strtdt
echo "please enter the end date in the format MMM DD, example: Ago 26"
read enddt
touch -t ${strtdt}0000 /gentran/SI51/install/EDIS_Inbound/datefrom
touch -t ${enddt}2359 /gentran/SI51/install/EDIS_Inbound/dateto
find /gentran/SI51/install/EDIS_Inbound \( -newer /gentran/SI51/install/EDIS_Inb
ound/datefrom ! -newer /gentran/SI51/install/EDIS_Inbound/dateto \) -print | xar
gs awk '{c+=gsub("ST
[]850
[
]", "&")}END{print c}'

but when the dates are small, for instead 20111024 to 20111025 it appears work fine, but when dates are bigger for instead 20111001 to 20111025 then I get several lines like: 1031
691
463
98
132
148
16
Do you know why?

xargs causes awk to run several times. Change "xargs awk" to "xargs cat |awk". This way xargs invokes cat as many times as needed then all the concatenated data is fed into one awk.

1 Like

All.
Thanks so much for your help, I have gotten the result I wanted, binlib, your help was very useful also.

Refugio.