String compare in multiple files

Hi

I have a requirement to process number of files matching a criteria. The resulted file would be processed indivdually looking for a particular string until another one found lines afterwards. Then look for the occurrence of another string in the result count and display/return the result.

Continue the process till the EOF reached then to take the next file do the same ... till no more files to process

example :

File1.log

######### AUD57#########
<::REQUESTSTRUCTURE>
<MESSAGE>.........
.............
.........
........
</ITEM>
.........
........
</ITEM>
.........
........
</ITEM>
.........
........
</ITEM>
</MESSAGE>
</:
:REQUESTSTRUCTURE>
................
.............
..............
..............

######### AUD57#########
<::REQUESTSTRUCTURE>
<MESSAGE>.........
.............
.........
........
</ITEM>
.........
........
</ITEM>
.........
........
</ITEM>
.........
........
</ITEM>
</MESSAGE>
</:
:REQUESTSTRUCTURE>
...........
EOF

So all I wanted to know is how many occaurcnce of </item> in this log file after "######### AUD57#########" and between "</:_:REQUESTSTRUCTURE>"

I have a kind of script which does this but takes ages to result (with process line by line). so would like to see a simpler and much faster solution - HELP please URGENT

cheers
Arun

Is this what you are looking for?

nawk '
        /######### AUD57#########/{a++}
        /<\/ITEM>/{items[a]++}
        END{
                for( i in items ){
                        print "No of items in AUD57("i") is - "items
                        t += items
                }
                print "Total items in file = "t
        }
' infile

It does it almost but the problem i have is slightly different..

I need to find ITEMS between ##### AUD57##
and
</:_:REQUESTSTRUCTURE>

The reason for that is there are other ITEMS in the log which I am not intrested at all (between AUD57 till another AUD57).

Find how many items for AUD57 till the </:_:REQUESTSTRUCTURE>.

hope that makes little more clear.

But,
Thanks a lot for the response and much appreciated. Great and hope to see a response soon.

cheers
arun

Try this:

awk '
/AUD57/{f=1} 
/<\/ITEM>/ && f {c++}
/<\/:_:REQUESTSTRUCTURE>/{print "Number of occurrence: "c; exit}
' file

Tried it but giving wrong results. I have got the test log file ... and i would expect 9 ITEMS between "...AUD57..." till </_:RequestStructure>.

#####====> AUD57 - ACTION
<_:RequestStructure>
        </item>
        </item>
        </item>
</_:RequestStructure>

</item>

#####====> AUD57 - ACTION
<_:RequestStructure>
        </item>
        </item>
        </item>
</_:RequestStructure>

</item>
</item>
</item>

#####====> AUD57 - ACTION
        </item>
        </item>
        </item>
</_:RequestStructure>

<!-- out side AUD -->
<_:RequestStructure>
        </item>
        </item>
        </item>
</_:RequestStructure>

Thanks for the time spend on this.

9 items??

Can you be more specific?
Do you want the output between the first "#####====> AUD57" and the first "</_:RequestStructure>"?

yes 9 items.

#####====> AUD57 - ACTION 
<_:RequestStructure>
        </item> ****
        </item> ****
        </item> ****
</_:RequestStructure>  ======== 3

then ....
#####====> AUD57 - ACTION
<_:RequestStructure>
        </item>
        </item>
        </item>
 </_:RequestStructure> ======== 3

then...
....
#####====> AUD57 - ACTION
        </item>
        </item>
        </item>
</_:RequestStructure> ========= 3

So, 9 should be final result.

Please provide a better example of the file.
How do you recognize the last item? There are a lot of tags with "</_:RequestStructure>".

Looking at your last example where you expect 9 items it would seem you could simply look for any item end tags between: -

<_:RequestStructure>

and

</_:RequestStructure>

ofcourse you could do that but there are other audit entries like
#####====> AUD99 - ACTION
<:RequestStructure>
</item>
</item>
</item>
</
:RequestStructure>

but I am intrested only

#####====> AUD57 - ACTION
<:RequestStructure>
</item>
</item>
</item>
</
:RequestStructure>

Try this:

awk '
/AUD57/{f=1} 
/<\/ITEM>/ && f {c++}
/AUD/ && !/AUD57/{print "Number of occurrence: "c; exit}
' file

Output:

$ cat file
#####====> AUD57 - ACTION
<_:RequestStructure>
        </ITEM>
        </ITEM>
        </ITEM>
</_:RequestStructure>

#####====> AUD57 - ACTION
<_:RequestStructure>
        </ITEM>
        </ITEM>
        </ITEM>
</_:RequestStructure>

#####====> AUD57 - ACTION
        </ITEM>
        </ITEM>
        </ITEM>
</_:RequestStructure>

#####====> AUD58 - ACTION
<_:RequestStructure>
        </ITEM>
        </ITEM>
        </ITEM>
</_:RequestStructure>
#####====> AUD67 - ACTION
<_:RequestStructure>
        </ITEM>
        </ITEM>
        </ITEM>
</_:RequestStructure>
$
$ awk '
/AUD57/{f=1} 
/<\/ITEM>/ && f {c++}
/AUD/ && !/AUD57/{print "Number of occurrence: "c; exit}
' file
Number of occurrence: 9
$
$

Frank

Thanks a lot for the time and effort. That would work exactly if the input log file looks like that. but as I said earlier the <item> can be random in between and some are not even inside a "AUDxx" block.

In the example log if you have an <item> floating (with out having proper AUD and Requeststructure then the script would result 10 - wouldnt? . And that's the problem for me. Sorry I know this is getting longer.

I start thinking to write a java program to sort this out as I have very little time left on this.

much appreciated - great help
-arun

Ok, one more try:

awk '
/AUD57/{f=1} 
/<\/item>/ && f {c++}
f && /<\/_:RequestStructure>/{f=0}
END{print "Number of occurrence: "c}
' file

That seems working - great. Thanks a lot for the help and much appreciated your time on this.

Great.

-arun