Replacing lines matching a multi-line pattern (sed/perl/awk)

Dear Unix Forums,

I am hoping you can help me with a pattern matching problem.

What am I trying to do?
I want to replace multiple lines of a text file (that match a multi-line pattern) with a single line of text. These patterns can span several lines and do not always have the same number of line breaks in between.

Example input file

@LIB ADVAPI32.dll @CAL RtlInitAnsiString @PA1 0x0012f740 @PA2 "CriticalSectionTimeout" @RET0 
@LIB ADVAPI32.dll @CAL memmove @PA1 0x0012f740 @PA2 0x0012f68c @PA3 4 @RET 0x0012f8bc 
@LIB ADVAPI32.dll @CAL RtlInitAnsiString @PA1 0x0012f740 @PA2 "CriticalSectionTimeout" @RET0 
@LIB ADVAPI32.dll @CAL BlaCall @PA1 0x0012f741 @PA2 0x0012f68c @PA3 4 @RET 0x0012f8bc 
@LIB ADVAPI32.dll @CAL memmove @PA1 0x0012f740 @PA2 0x0012f68c @PA3 4 @RET 0x0012f8bc 
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS 

One of the simpler patterns may look like this (pseudocode; meaning that only parts of the line are relevant and that the number of line breaks can vary):

^ * @CAL RtlInitAnsiString @PA1 0x0012f740 * ((1 to 3 line breaks)) * @CAL memmove @PA1 0x0012f740 * $

The matching text should be replaced with a string (e.g. "@MATCH").
In the end, the file should look like this:

Desired output

@MATCH //replaced lines 1 and 2
@MATCH //replaced lines 3 to 5, including the irrelevant BlaCall
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS 

Current solution for adjacent lines only:

sed 'N;s/\@LIB.*\@CAL RtlInitAnsiString .*\@PA1 0x0012f741.*\n.*\@CAL memmove \@PA1 0x0012f740.*/\@MATCH/' inputfile

Unfortunately, this does not seem to work for several line breaks (ie. when there is "gap" between the lines containing RtlInitAnsiString and memmove).

Stuff I tried that didn't match anything:

perl -pne 'BEGIN {undef $/} s/\@LIB.*\@CAL RtlInitAnsiString \@PA1 0x0012f740.*\@CAL memmove \@PA1 0x0012f740.*/\@MATCH/' inputfile
perl -0pe 's/^\@LIB.*\@CAL RtlInitAnsiString .*\@PA1 0x0012f740.*\@CAL memmove \@PA1 0x0012f740.*$/\@MATCH/gm' inputfile
perl -0pe 's/^\@LIB*\@CAL RtlInitAnsiString *\@PA1 0x0012f740*.*\@CAL memmove \@PA1 0x0012f740*$/\@MATCH/s' inputfile

Any ideas how to get this kind of multi-line pattern matching to work? I'd prefer sed or perl, but awk is fine too :wink:

Thanks in advance!

From python shell:

>>> import re
>>> text = '''
... @LIB ADVAPI32.dll @CAL RtlInitAnsiString @PA1 0x0012f740 @PA2 "CriticalSectionTimeout" @RET0 
... @LIB ADVAPI32.dll @CAL memmove @PA1 0x0012f740 @PA2 0x0012f68c @PA3 4 @RET 0x0012f8bc 
... @LIB ADVAPI32.dll @CAL RtlInitAnsiString @PA1 0x0012f740 @PA2 "CriticalSectionTimeout" @RET0 
... @LIB ADVAPI32.dll @CAL BlaCall @PA1 0x0012f741 @PA2 0x0012f68c @PA3 4 @RET 0x0012f8bc 
... @LIB ADVAPI32.dll @CAL memmove @PA1 0x0012f740 @PA2 0x0012f68c @PA3 4 @RET 0x0012f8bc 
... @LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
... '''
>>> 
>>> pattern = re.compile('^.*?@CAL RtlInitAnsiString @PA1 0x0012f740.*?@CAL memmove @PA1 0x0012f740.*?$',re.MULTILINE|re.DOTALL)
>>> 
>>> print re.sub(pattern,'@MATCH',text)                                                                                                                         
@MATCH
@MATCH
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS

I am not working in perl stuff rigth now but i'm sure that the translation should be pretty straightforward

2 Likes

What should the output be for the input:

@LIB ADVAPI32.dll @CAL RtlInitAnsiString @PA1 0x0012f740 @PA2 "CriticalSectionTimeout" @RET0 
@LIB ADVAPI32.dll @CAL RtlInitAnsiString @PA1 0x0012f740 @PA2 "CriticalSectionTimeout" @RET0 
@LIB ADVAPI32.dll @CAL BlaCall @PA1 0x0012f741 @PA2 0x0012f68c @PA3 4 @RET 0x0012f8bc 
@LIB ADVAPI32.dll @CAL memmove @PA1 0x0012f740 @PA2 0x0012f68c @PA3 4 @RET 0x0012f8bc 
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS

(which is your sample input file with line 2 removed)? Is line 1 kept in the output as is, or should lines 1 through 4 be changed to a single:

@MATCH

output line?

What should happen if there are more than 3 newlines between @CAL RtlAnsiStringToUnicodeString and @CAL memmove if there are no other occurrences of @CAL RtlAnsiStringToUnicodeString between them?

1 Like

Need response to Don's questions as it looks like there can be a lot a varying cases.

Requirement Analysis not complete :wink:

With the current data you have provided, try this

awk '
  /@CAL RtlInitAnsiString @PA1 0x0012f740/{s=1}
  s && /@CAL memmove @PA1 0x0012f740/{ print "@MATCH"; s=0; next }
  !s' infile

--ahamed

1 Like

Thank you all for your replies!

In response to Don's questions: This input file...

@LIB ADVAPI32.dll @CAL RtlInitAnsiString @PA1 0x0012f740 @PA2 "CriticalSectionTimeout" @RET0 
@LIB ADVAPI32.dll @CAL RtlInitAnsiString @PA1 0x0012f740 @PA2 "CriticalSectionTimeout" @RET0 
@LIB ADVAPI32.dll @CAL BlaCall @PA1 0x0012f741 @PA2 0x0012f68c @PA3 4 @RET 0x0012f8bc 
@LIB ADVAPI32.dll @CAL memmove @PA1 0x0012f740 @PA2 0x0012f68c @PA3 4 @RET 0x0012f8bc 
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS

...should turn into this (minimal "destruction"):

@LIB ADVAPI32.dll @CAL RtlInitAnsiString @PA1 0x0012f740 @PA2 "CriticalSectionTimeout" @RET0 
@MATCH
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS

If there are more than 3 newlines in between the first and second part of the pattern, nothing should happen. The replacement should only be executed as long as the "maximum gap" is not exceeded (in this case 3). So if the input file would look like this:

@LIB ADVAPI32.dll @CAL RtlInitAnsiString @PA1 0x0012f740 @PA2 "CriticalSectionTimeout" @RET0 
@LIB ADVAPI32.dll @CAL memmove @PA1 0x0012f740 @PA2 0x0012f68c @PA3 4 @RET 0x0012f8bc 
@LIB ADVAPI32.dll @CAL RtlInitAnsiString @PA1 0x0012f740 @PA2 "CriticalSectionTimeout" @RET0 
@LIB ADVAPI32.dll @CAL BlaCall @PA1 0x0012f741 @PA2 0x0012f68c @PA3 4 @RET 0x0012f8bc 
@LIB ADVAPI32.dll @CAL memmove @PA1 0x0012f740 @PA2 0x0012f68c @PA3 4 @RET 0x0012f8bc 
@LIB ADVAPI32.dll @CAL RtlInitAnsiString @PA1 0x0012f740 @PA2 "CriticalSectionTimeout" @RET0 
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
@LIB ADVAPI32.dll @CAL memmove @PA1 0x0012f740 @PA2 0x0012f68c @PA3 4 @RET 0x0012f8bc

...the script should NOT replace the large block of "RtlAnsiStringToUnicodeString".

Thanks again!

Interesting regex (python shell again):

>>> text = '''
... @LIB ADVAPI32.dll @CAL RtlInitAnsiString @PA1 0x0012f740 @PA2 "CriticalSectionTimeout" @RET0 
... @LIB ADVAPI32.dll @CAL RtlInitAnsiString @PA1 0x0012f740 @PA2 "CriticalSectionTimeout" @RET0 
... @LIB ADVAPI32.dll @CAL BlaCall @PA1 0x0012f741 @PA2 0x0012f68c @PA3 4 @RET 0x0012f8bc 
... @LIB ADVAPI32.dll @CAL memmove @PA1 0x0012f740 @PA2 0x0012f68c @PA3 4 @RET 0x0012f8bc 
... @LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
... '''
>>>
>>> pattern = re.compile(r'''
... ^[^\n]+@CAL\sRtlInitAnsiString\s@PA1\s0x0012f740[^\n]+\n
... (?:(?!^[^\n]+RtlInitAnsiString)[^\n]+\n){1,3}
... ^[^\n]+@CAL\smemmove\s@PA1\s0x0012f740[^\n]+
... ''', re.X|re.M|re.S)
>>> 
>>> print re.sub(pattern, '@MATCH', text)
@LIB ADVAPI32.dll @CAL RtlInitAnsiString @PA1 0x0012f740 @PA2 "CriticalSectionTimeout" @RET0 
@MATCH
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
>>> 
>>> text2 = '''
... @LIB ADVAPI32.dll @CAL RtlInitAnsiString @PA1 0x0012f740 @PA2 "CriticalSectionTimeout" @RET0 
... @LIB ADVAPI32.dll @CAL memmove @PA1 0x0012f740 @PA2 0x0012f68c @PA3 4 @RET 0x0012f8bc 
... @LIB ADVAPI32.dll @CAL RtlInitAnsiString @PA1 0x0012f740 @PA2 "CriticalSectionTimeout" @RET0 
... @LIB ADVAPI32.dll @CAL BlaCall @PA1 0x0012f741 @PA2 0x0012f68c @PA3 4 @RET 0x0012f8bc 
... @LIB ADVAPI32.dll @CAL memmove @PA1 0x0012f740 @PA2 0x0012f68c @PA3 4 @RET 0x0012f8bc 
... @LIB ADVAPI32.dll @CAL RtlInitAnsiString @PA1 0x0012f740 @PA2 "CriticalSectionTimeout" @RET0 
... @LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
... @LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
... @LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
... @LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
... @LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
... @LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
... @LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
... @LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
... @LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
... @LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
... @LIB ADVAPI32.dll @CAL memmove @PA1 0x0012f740 @PA2 0x0012f68c @PA3 4 @RET 0x0012f8bc
... '''
>>> print re.sub(pattern, '@MATCH', text2)

@LIB ADVAPI32.dll @CAL RtlInitAnsiString @PA1 0x0012f740 @PA2 "CriticalSectionTimeout" @RET0 
@LIB ADVAPI32.dll @CAL memmove @PA1 0x0012f740 @PA2 0x0012f68c @PA3 4 @RET 0x0012f8bc 
@MATCH
@LIB ADVAPI32.dll @CAL RtlInitAnsiString @PA1 0x0012f740 @PA2 "CriticalSectionTimeout" @RET0 
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
@LIB ADVAPI32.dll @CAL memmove @PA1 0x0012f740 @PA2 0x0012f68c @PA3 4 @RET 0x0012f8bc
1 Like

Thanks Klashxx,

your python script works!
I changed the pattern to...

>>> pattern = re.compile(r'''
... ^[^\n]+@CAL\sRtlInitAnsiString\s@PA1\s0x0012f740[^\n]+\n
... (?:(?!^[^\n]+RtlInitAnsiString)[^\n]+\n){0,3}
... ^[^\n]+@CAL\smemmove\s@PA1\s0x0012f740[^\n]+
... ''', re.X|re.M|re.S)

...so it would also match adjacent lines. Now I have to figure out how to turn this into a "one-liner" (I currently use "eval" to loop through a file containing pattern matching commands (mostly "sed")) and what each part of the expression does (up until now, my scripting endeavors were limited to rather basic stuff :wink: ).

Does anyone know how python compares to other approaches (awk, etc.) in terms of performance? The files I plan to analyze have upwards of 50,000 lines each and are matched against hundreds of single-line and multi-line patterns.

Cheers

$
$
$ cat input1
@LIB ADVAPI32.dll @CAL RtlInitAnsiString @PA1 0x0012f740 @PA2 "CriticalSectionTimeout" @RET0
@LIB ADVAPI32.dll @CAL memmove @PA1 0x0012f740 @PA2 0x0012f68c @PA3 4 @RET 0x0012f8bc
@LIB ADVAPI32.dll @CAL RtlInitAnsiString @PA1 0x0012f740 @PA2 "CriticalSectionTimeout" @RET0
@LIB ADVAPI32.dll @CAL BlaCall @PA1 0x0012f741 @PA2 0x0012f68c @PA3 4 @RET 0x0012f8bc
@LIB ADVAPI32.dll @CAL memmove @PA1 0x0012f740 @PA2 0x0012f68c @PA3 4 @RET 0x0012f8bc
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
$
$ perl -lne 'BEGIN {$in = 0; $count = 0}
             if (/\@CAL RtlInitAnsiString/) {
                 if ($x[0] =~ /\@CAL RtlInitAnsiString/) {print foreach (@x); @x = (); $count=0}
                 push @x, $_; $in = 1; $count++
             }
             elsif (/\@CAL memmove/ and $x[0] =~ /\@CAL RtlInitAnsiString/ and $in) {
                 print "\@MATCH"; @x = (); $in = 0; $count = 0
             }
             elsif ($in) {
                 $count++;
                 if ($count > 3) {print foreach (@x); @x=(); $count=0; $in=0}
                 else { push @x, $_; $count++ }
             }
             else {print}
            ' input1
@MATCH
@MATCH
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
$
$
$ cat input2
@LIB ADVAPI32.dll @CAL RtlInitAnsiString @PA1 0x0012f740 @PA2 "CriticalSectionTimeout" @RET0
@LIB ADVAPI32.dll @CAL RtlInitAnsiString @PA1 0x0012f740 @PA2 "CriticalSectionTimeout" @RET0
@LIB ADVAPI32.dll @CAL BlaCall @PA1 0x0012f741 @PA2 0x0012f68c @PA3 4 @RET 0x0012f8bc
@LIB ADVAPI32.dll @CAL memmove @PA1 0x0012f740 @PA2 0x0012f68c @PA3 4 @RET 0x0012f8bc
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
$
$ perl -lne 'BEGIN {$in = 0; $count = 0}
             if (/\@CAL RtlInitAnsiString/) {
                 if ($x[0] =~ /\@CAL RtlInitAnsiString/) {print foreach (@x); @x = (); $count=0}
                 push @x, $_; $in = 1; $count++
             }
             elsif (/\@CAL memmove/ and $x[0] =~ /\@CAL RtlInitAnsiString/ and $in) {
                 print "\@MATCH"; @x = (); $in = 0; $count = 0
             }
             elsif ($in) {
                 $count++;
                 if ($count > 3) {print foreach (@x); @x=(); $count=0; $in=0}
                 else { push @x, $_; $count++ }
             }
             else {print}
            ' input2
@LIB ADVAPI32.dll @CAL RtlInitAnsiString @PA1 0x0012f740 @PA2 "CriticalSectionTimeout" @RET0
@MATCH
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
$
$
$ cat input3
@LIB ADVAPI32.dll @CAL RtlInitAnsiString @PA1 0x0012f740 @PA2 "CriticalSectionTimeout" @RET0
@LIB ADVAPI32.dll @CAL BlaCall @PA1 0x0012f741 @PA2 0x0012f68c @PA3 4 @RET 0x0012f8bc
@LIB ADVAPI32.dll @CAL RtlInitAnsiString @PA1 0x0012f740 @PA2 "CriticalSectionTimeout" @RET0
@LIB ADVAPI32.dll @CAL BlaCall @PA1 0x0012f741 @PA2 0x0012f68c @PA3 4 @RET 0x0012f8bc
@LIB ADVAPI32.dll @CAL memmove @PA1 0x0012f740 @PA2 0x0012f68c @PA3 4 @RET 0x0012f8bc
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
$
$ perl -lne 'BEGIN {$in = 0; $count = 0}
             if (/\@CAL RtlInitAnsiString/) {
                 if ($x[0] =~ /\@CAL RtlInitAnsiString/) {print foreach (@x); @x = (); $count=0}
                 push @x, $_; $in = 1; $count++
             }
             elsif (/\@CAL memmove/ and $x[0] =~ /\@CAL RtlInitAnsiString/ and $in) {
                 print "\@MATCH"; @x = (); $in = 0; $count = 0
             }
             elsif ($in) {
                 $count++;
                 if ($count > 3) {print foreach (@x); @x=(); $count=0; $in=0}
                 else { push @x, $_; $count++ }
             }
             else {print}
            ' input3
@LIB ADVAPI32.dll @CAL RtlInitAnsiString @PA1 0x0012f740 @PA2 "CriticalSectionTimeout" @RET0
@LIB ADVAPI32.dll @CAL BlaCall @PA1 0x0012f741 @PA2 0x0012f68c @PA3 4 @RET 0x0012f8bc
@MATCH
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
$
$
$ cat input4
@LIB ADVAPI32.dll @CAL RtlInitAnsiString @PA1 0x0012f740 @PA2 "CriticalSectionTimeout" @RET0
@LIB ADVAPI32.dll @CAL memmove @PA1 0x0012f740 @PA2 0x0012f68c @PA3 4 @RET 0x0012f8bc
@LIB ADVAPI32.dll @CAL RtlInitAnsiString @PA1 0x0012f740 @PA2 "CriticalSectionTimeout" @RET0
@LIB ADVAPI32.dll @CAL BlaCall @PA1 0x0012f741 @PA2 0x0012f68c @PA3 4 @RET 0x0012f8bc
@LIB ADVAPI32.dll @CAL memmove @PA1 0x0012f740 @PA2 0x0012f68c @PA3 4 @RET 0x0012f8bc
@LIB ADVAPI32.dll @CAL RtlInitAnsiString @PA1 0x0012f740 @PA2 "CriticalSectionTimeout" @RET0
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
@LIB ADVAPI32.dll @CAL memmove @PA1 0x0012f740 @PA2 0x0012f68c @PA3 4 @RET 0x0012f8bc
$
$
$ perl -lne 'BEGIN {$in = 0; $count = 0}
             if (/\@CAL RtlInitAnsiString/) {
                 if ($x[0] =~ /\@CAL RtlInitAnsiString/) {print foreach (@x); @x = (); $count=0}
                 push @x, $_; $in = 1; $count++
             }
             elsif (/\@CAL memmove/ and $x[0] =~ /\@CAL RtlInitAnsiString/ and $in) {
                 print "\@MATCH"; @x = (); $in = 0; $count = 0
             }
             elsif ($in) {
                 $count++;
                 if ($count > 3) {print foreach (@x); @x=(); $count=0; $in=0}
                 else { push @x, $_; $count++ }
             }
             else {print}
            ' input4
@MATCH
@MATCH
@LIB ADVAPI32.dll @CAL RtlInitAnsiString @PA1 0x0012f740 @PA2 "CriticalSectionTimeout" @RET0
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
@LIB ADVAPI32.dll @CAL memmove @PA1 0x0012f740 @PA2 0x0012f68c @PA3 4 @RET 0x0012f8bc
$
$
$
1 Like

You should pick one tool and keep stick to it.
A starting point to give python a try:

#!/usr/bin/env python
import re
import sys

pattern = re.compile(r'''
    ^[^\n]+@CAL\sRtlInitAnsiString\s@PA1\s0x0012f740[^\n]+\n
    (?:(?!^[^\n]+RtlInitAnsiString)[^\n]+\n){0,3}
    ^[^\n]+@CAL\smemmove\s@PA1\s0x0012f740[^\n]+
    ''', re.X|re.M|re.S)

with open(sys.argv[1], 'r') as f:
    text = f.read()

tfilter = re.sub(pattern, '@MATCH', text)

print tfilter 

sys.exit(0)
1 Like

Also a starting point to give awk a try:

awk \
  -v S="@CAL RtlInitAnsiString @PA1 0x0012f740" \
  -v E="@CAL memmove @PA1 0x0012f740" \
  -v M="@MATCH" \
  -v L=3 '
$0~S{if(R)print V;V=$0;R=FNR+L;next}
FNR==R{print V; R=V=x}
R&&$0~E {$0=M; R=V=x}
R{V=V"\n"$0;next}
1
END{if(V)print V}' $1
1 Like

Wow, three very nice approaches using three powerful tools (awk integrates best into the existing analysis scripts but the python and perl solutions look very promising too) - I will now test them on different and (much) larger sets of input files/patterns and then probably stick to the fastest one.

Thanks again, your input was extremely helpful and sure motivates me to get deeper into this kind of scripting.

Cheers!