Hi All,
I have almost 1000+ files and I want to search specific pattern. Looking forwarded your input. Pls note that need to ignore words in between /* */
Search for: "insert into xyz" (Which procedure contain all 3).
Expected output:
procedure test1
procedure test2
procedure test3
File Contain:
procedure test1
insert into xyz;
end;
procedure
test2
insert
into
xyz
end;
procedure test3
insert /* asas*/
into xyz
end;
procedure test4
inserting into xyz
end;
try below perl:
local $/="end;";
while(<DATA>){
print "procedure $1\n" if /procedure\s*(\S*).*insert\s*(\/\*.*\*\/\s*)*\s*into\s*xyz/s;
}
__DATA__
File Contain:
procedure test1
insert into xyz;
end;
procedure
test2
insert
into
xyz
end;
procedure test3
insert /* asas*/
into xyz
end;
procedure test4
inserting into xyz
end;
procedure test5
inserting aa into xyz
end;
Thanks for reply Cherry, however I do't want to execute via perl. I m looking for Unix command.
aigles
July 30, 2009, 10:01am
4
Two solutions using awk and gawk
With awk :
awk '
{ $0 = tolower($0) }
/^[[:space:]]*procedure/ {
check_procedure();
procedure = ""
}
{
procedure = procedure " " $0;
}
END {
check_procedure();
}
function check_procedure() {
gsub(/\/\*[^*]*\*\//, "", procedure);
gsub(/[[:space:]]+/, " ", procedure);
if (procedure ~/insert into xyz/) {
split(procedure, f);
print "procedure",f[2];
}
}
' inputfile
With gawk :
awk -v RS=procedure '
{
gsub(/\n/, " ");
gsub(/\/\*[^*]*\*\//, "");
gsub(/[[:space:]]+/, " ")
$0 = tolower($0);
}
/insert into xyz/ {
print "procedure",$1
}
' inputfile
Inputfile:
procedure test1
insert into xyz;
end;
procedure
test2
insert
into
xyz
end;
procedure test3
insert /* asas*/
into xyz
end;
procedure test4
inserting into xyz
end;
procedure test5
insert into abcdef
end;
Output:
procedure test1
procedure test2
procedure test3
Jean-Pierre.
aigles:
Two solutions using awk and gawk
With awk :
awk '
{ $0 = tolower($0) }
/^[[:space:]]*procedure/ {
check_procedure();
procedure = ""
}
{
procedure = procedure " " $0;
}
END {
check_procedure();
}
function check_procedure() {
gsub(/\/\*[^*]*\*\//, "", procedure);
gsub(/[[:space:]]+/, " ", procedure);
if (procedure ~/insert into xyz/) {
split(procedure, f);
print "procedure",f[2];
}
}
' inputfile
With gawk :
awk -v RS=procedure '
{
gsub(/\n/, " ");
gsub(/\/\*[^*]*\*\//, "");
gsub(/[[:space:]]+/, " ")
$0 = tolower($0);
}
/insert into xyz/ {
print "procedure",$1
}
' inputfile
Inputfile:
procedure test1
insert into xyz;
end;
procedure
test2
insert
into
xyz
end;
procedure test3
insert /* asas*/
into xyz
end;
procedure test4
inserting into xyz
end;
procedure test5
insert into abcdef
end;
Output:
procedure test1
procedure test2
procedure test3
Jean-Pierre.
Thanks for wonderful response. This is working fine for me, however I forget to mention one this, i.e. it could be "procedure" or "function". I need to consider both, however above code only taking care of "procedure". Pls find the latest file and expected output.
Expected output:
procedure test1
function test2
procedure test3
File Contain:
procedure test1
insert into xyz;
end;
function
test2
insert
into
xyz
end;
procedure test3
insert /* asas*/
into xyz
end;
procedure test4
inserting into xyz
end;
Request you to update the command ASAP.
aigles
July 31, 2009, 2:06am
6
A new version of the awk solution :
awk '
{ $0 = tolower($0) }
/^[[:space:]]*(procedure|function)/ {
check_procedure();
procedure = ""
}
{
procedure = procedure " " $0;
}
END {
check_procedure();
}
function check_procedure() {
gsub(/\/\*[^*]*\*\//, "", procedure);
gsub(/[[:space:]]+/, " ", procedure);
if (procedure ~/insert into xyz/) {
split(procedure, f);
print f[1],f[2];
}
}
' inputfile
Inputfile:
procedure test1
insert into xyz;
end;
procedure
test2
insert
into
xyz
end;
procedure test3
insert /* asas*/
into xyz
end;
procedure test4
inserting into xyz
end;
procedure test5
insert into abcdef
end;
function ftest
insert into
xyz
end;
Output:
procedure test1
procedure test2
procedure test3
function ftest
Jean-Pierre.
Thanks for your reply. Can you modify gawk command as well.
aigles:
A new version of the awk solution :
awk '
{ $0 = tolower($0) }
/^[[:space:]]*(procedure|function)/ {
check_procedure();
procedure = ""
}
{
procedure = procedure " " $0;
}
END {
check_procedure();
}
function check_procedure() {
gsub(/\/\*[^*]*\*\//, "", procedure);
gsub(/[[:space:]]+/, " ", procedure);
if (procedure ~/insert into xyz/) {
split(procedure, f);
print f[1],f[2];
}
}
' inputfile
Inputfile:
procedure test1
insert into xyz;
end;
procedure
test2
insert
into
xyz
end;
procedure test3
insert /* asas*/
into xyz
end;
procedure test4
inserting into xyz
end;
procedure test5
insert into abcdef
end;
function ftest
insert into
xyz
end;
Output:
procedure test1
procedure test2
procedure test3
function ftest
Jean-Pierre.
---------- Post updated at 09:07 AM ---------- Previous update was at 08:00 AM ----------
I am getting " Memory fault(coredump)" error. I guess gawk should work fine. Can you pls modify gawk command.
aigles
July 31, 2009, 10:25am
8
I have no system avalaible with gawk.
Try this new version of awk script :
awk '
{ $0 = tolower($0) }
/^[[:space:]]*(procedure|function)/ {
check_procedure();
memorize = 1;
}
memorize {
procedure = procedure " " $0;
gsub(/\/\*[^*]*\*\//, "", procedure);
if (procedure ~ /end[[:space:]]*;/) check_procedure();}
END {
check_procedure();
}
function check_procedure() {
gsub(/[[:space:]]+/, " ", procedure);
if (procedure ~/insert into xyz/) {
split(procedure, f);
print f[1],f[2];
}
procedure = "";
memorize = 0;}
' in.sql
Jean-Pierre.
Tried with this, however getting below error. Attaching file test.txt for you refer your reference.
awk: The result procedure load_file of the gsub function
cannot be longer than 3,000 bytes.
aigles:
I have no system avalaible with gawk.
Try this new version of awk script :
awk '
{ $0 = tolower($0) }
/^[[:space:]]*(procedure|function)/ {
check_procedure();
memorize = 1;
}
memorize {
procedure = procedure " " $0;
gsub(/\/\*[^*]*\*\//, "", procedure);
if (procedure ~ /end[[:space:]]*;/) check_procedure();}
END {
check_procedure();
}
function check_procedure() {
gsub(/[[:space:]]+/, " ", procedure);
if (procedure ~/insert into xyz/) {
split(procedure, f);
print f[1],f[2];
}
procedure = "";
memorize = 0;}
' in.sql
Jean-Pierre.
aigles
July 31, 2009, 3:33pm
10
Last attempt, another way without using gsub function :
awk -v RS='[[:space:]()\n]' -v Text='insert into xyz' '
BEGIN {
Words_count = split(tolower(Text), Word);
In_proc = 0;
Skip_comment = 0;
Wait_proc_name = 0;
}
!NF { next }
{ $0 = tolower($0) }
/^\/\*/ {
Skip_comment = 1;
next;
}
Skip_comment {
if ($0 !~ /\*\/$/) next;
Skip_comment = 0;
next;
}
Wait_proc_name {
Proc_name = $0;
Wait_proc_name = 0;
next;
}
/(procedure|function)$/ {
Proc_type = $0;
Wait_proc_name = 1;
Index_word = 1;
In_proc = 1;
next;
}
In_proc {
if ($0 != Word[Index_word]) next;
if (Index_word == Words_count) {
print Proc_type, Proc_name;
Index_word = 0;
In_proc = 0;
}
Index_word++;
}
' inputfile
Jean-Pierre.
Well, thanks for bearing with me aigles. Latest command is also not working ad I am getting "awk: Input line procedure test1 insert cannot be longer than 3,000 bytes.". errors. Anyway in future, if you have time, pls reply for that. I have already attached file 'test.txt', that you can use for your reference.
aigles:
Last attempt, another way without using gsub function :
awk -v RS='[[:space:]()\n]' -v Text='insert into xyz' '
BEGIN {
Words_count = split(tolower(Text), Word);
In_proc = 0;
Skip_comment = 0;
Wait_proc_name = 0;
}
!NF { next }
{ $0 = tolower($0) }
/^\/\*/ {
Skip_comment = 1;
next;
}
Skip_comment {
if ($0 !~ /\*\/$/) next;
Skip_comment = 0;
next;
}
Wait_proc_name {
Proc_name = $0;
Wait_proc_name = 0;
next;
}
/(procedure|function)$/ {
Proc_type = $0;
Wait_proc_name = 1;
Index_word = 1;
In_proc = 1;
next;
}
In_proc {
if ($0 != Word[Index_word]) next;
if (Index_word == Words_count) {
print Proc_type, Proc_name;
Index_word = 0;
In_proc = 0;
}
Index_word++;
}
' inputfile
Jean-Pierre.
aigles
August 1, 2009, 4:12am
12
Sounds like if input line is greater than 3000 bytes.
Instead of:
awk ..... inputfile
Try:
fold -s -w 300 inputfile | awk ...
Jean-Pierre.
Still I am getting same error.
The final command is...
fold -s -w 300 test.txt | awk -v RS='[[:space:]()\n]' -v Text='insert into xyz' '
BEGIN {
Words_count = split(tolower(Text), Word);
In_proc = 0;
Skip_comment = 0;
Wait_proc_name = 0;
}
!NF { next }
{ $0 = tolower($0) }
/^\/\/ {
Skip_comment = 1;
next;
}
Skip_comment {
if ($0 !~ /\ \/$/) next;
Skip_comment = 0;
next;
}
Wait_proc_name {
Proc_name = $0;
Wait_proc_name = 0;
next;
}
/(procedure|function)$/ {
Proc_type = $0;
Wait_proc_name = 1;
Index_word = 1;
In_proc = 1;
next;
}
In_proc {
if ($0 != Word[Index_word]) next;
if (Index_word == Words_count) {
print Proc_type, Proc_name;
Index_word = 0;
In_proc = 0;
}
Index_word++;
}'
.