Search 3 words

Hi All,
I have almost 1000+ files and I want to search specific pattern. Looking forwarded your input. Pls note that need to ignore words in between /* */
Search for: "insert into xyz" (Which procedure contain all 3).
Expected output:
procedure test1
procedure test2
procedure test3

File Contain:
procedure test1
insert into xyz;
end;
procedure
test2
insert
into
xyz
end;
procedure test3
insert /* asas*/
into xyz
end;
procedure test4
inserting into xyz
end;

try below perl:

local $/="end;";
while(<DATA>){
	print "procedure $1\n" if /procedure\s*(\S*).*insert\s*(\/\*.*\*\/\s*)*\s*into\s*xyz/s;
}
__DATA__
File Contain:
procedure test1 
insert into xyz;
end;
procedure 
test2 
insert
into
xyz
end;
procedure test3
insert /* asas*/  
into xyz
end;
procedure test4
inserting into xyz
end;
procedure test5
inserting aa into xyz
end;

Thanks for reply Cherry, however I do't want to execute via perl. I m looking for Unix command.

Two solutions using awk and gawk
With awk :

awk '
{ $0 = tolower($0) }
/^[[:space:]]*procedure/ {
   check_procedure();
   procedure = ""
}
{
   procedure = procedure " " $0;
}
END {
   check_procedure();
}
function check_procedure() {
   gsub(/\/\*[^*]*\*\//, "", procedure);
   gsub(/[[:space:]]+/, " ", procedure);
   if (procedure ~/insert into xyz/) {
      split(procedure, f);
      print "procedure",f[2];
   }
}
' inputfile

With gawk :

awk -v RS=procedure '
{
   gsub(/\n/, " ");
   gsub(/\/\*[^*]*\*\//, "");
   gsub(/[[:space:]]+/, " ")
   $0 = tolower($0);
}
/insert into xyz/ {
   print "procedure",$1
}
' inputfile

Inputfile:

procedure test1
insert into xyz;
end;
procedure
test2
insert
into
xyz
end;
procedure test3
insert /* asas*/
into xyz
end;
procedure test4
inserting into xyz
end;
procedure test5
insert into abcdef
end;

Output:

procedure test1
procedure test2
procedure test3

Jean-Pierre.

Thanks for wonderful response. This is working fine for me, however I forget to mention one this, i.e. it could be "procedure" or "function". I need to consider both, however above code only taking care of "procedure". Pls find the latest file and expected output.

Expected output:
procedure test1
function test2
procedure test3

File Contain:
procedure test1
insert into xyz;
end;

function

test2

insert

into
xyz
end;

procedure test3
insert /* asas*/
into xyz
end;

procedure test4
inserting into xyz
end;

Request you to update the command ASAP.

A new version of the awk solution :

awk '
{ $0 = tolower($0) }
/^[[:space:]]*(procedure|function)/ {
   check_procedure();
   procedure = ""
}
{
   procedure = procedure " " $0;
}
END {
   check_procedure();
}
function check_procedure() {
   gsub(/\/\*[^*]*\*\//, "", procedure);
   gsub(/[[:space:]]+/, " ", procedure);
   if (procedure ~/insert into xyz/) {
      split(procedure, f);
      print f[1],f[2];
   }
}
' inputfile

Inputfile:

procedure test1
insert into xyz;
end;
procedure
test2
insert
into
xyz
end;
procedure test3
insert /* asas*/
into xyz
end;
procedure test4
inserting into xyz
end;
procedure test5
insert into abcdef
end;
function ftest
insert into
xyz
end;

Output:

procedure test1
procedure test2
procedure test3
function ftest

Jean-Pierre.

Thanks for your reply. Can you modify gawk command as well.

---------- Post updated at 09:07 AM ---------- Previous update was at 08:00 AM ----------

I am getting " Memory fault(coredump)" error. I guess gawk should work fine. Can you pls modify gawk command.

I have no system avalaible with gawk.
Try this new version of awk script :

awk '
{ $0 = tolower($0) }
/^[[:space:]]*(procedure|function)/ {
   check_procedure();
   memorize  = 1;
}
memorize {
   procedure = procedure " " $0;
   gsub(/\/\*[^*]*\*\//, "", procedure);   
   if (procedure ~ /end[[:space:]]*;/) check_procedure();}
END {
   check_procedure();
}
function check_procedure() {
    gsub(/[[:space:]]+/, " ", procedure);
    if (procedure ~/insert into xyz/) {
      split(procedure, f);
      print f[1],f[2];
   }
   procedure = "";
   memorize  = 0;}
' in.sql

Jean-Pierre.

Tried with this, however getting below error. Attaching file test.txt for you refer your reference.

awk: The result procedure load_file of the gsub function
cannot be longer than 3,000 bytes.

Last attempt, another way without using gsub function :

awk -v RS='[[:space:]()\n]' -v Text='insert into xyz' '
BEGIN {
   Words_count = split(tolower(Text), Word);
   In_proc        = 0;
   Skip_comment   = 0;
   Wait_proc_name = 0;
}

!NF { next }
{ $0 = tolower($0) }

/^\/\*/ {
   Skip_comment = 1;
   next;
}

Skip_comment {
   if ($0 !~ /\*\/$/) next;
   Skip_comment = 0;
   next;
}

Wait_proc_name {
   Proc_name      = $0;
   Wait_proc_name = 0;
   next;
}

/(procedure|function)$/ {
   Proc_type      = $0;
   Wait_proc_name = 1;
   Index_word     = 1;
   In_proc        = 1;
   next;
}

In_proc {
   if ($0 != Word[Index_word]) next;
   if (Index_word == Words_count) {
      print Proc_type, Proc_name;
      Index_word = 0;
      In_proc    = 0;
   }
   Index_word++;
}

' inputfile

Jean-Pierre.

Well, thanks for bearing with me aigles. Latest command is also not working ad I am getting "awk: Input line procedure test1 insert cannot be longer than 3,000 bytes.". errors. Anyway in future, if you have time, pls reply for that. I have already attached file 'test.txt', that you can use for your reference.

Sounds like if input line is greater than 3000 bytes.
Instead of:

awk ..... inputfile

Try:

fold -s -w 300 inputfile | awk ...

Jean-Pierre.

Still I am getting same error.

The final command is...

fold -s -w 300 test.txt | awk -v RS='[[:space:]()\n]' -v Text='insert into xyz' '
BEGIN {
Words_count = split(tolower(Text), Word);
In_proc = 0;
Skip_comment = 0;
Wait_proc_name = 0;
}
!NF { next }
{ $0 = tolower($0) }
/^\/\/ {
Skip_comment = 1;
next;
}
Skip_comment {
if ($0 !~ /\
\/$/) next;
Skip_comment = 0;
next;
}
Wait_proc_name {
Proc_name = $0;
Wait_proc_name = 0;
next;
}
/(procedure|function)$/ {
Proc_type = $0;
Wait_proc_name = 1;
Index_word = 1;
In_proc = 1;
next;
}
In_proc {
if ($0 != Word[Index_word]) next;
if (Index_word == Words_count) {
print Proc_type, Proc_name;
Index_word = 0;
In_proc = 0;
}
Index_word++;
}'
.