Hi,
I need some guidance with understanding this Perl script below. I am not the author of the script and the author has not leave any documentation. I supposed it is meant to be 'easy' if you're a Perl or regex guru. I am having problem understanding what regex to use The script does warn about tweaking the regex to suit the ever changing string
This is the script
[host01]$ cat x.pl
#!/usr/bin/perl
#
# ./logparse.pl <logfile> <service_name_to_search> | sort | uniq
#
$log = $ARGV[0];
$service_name = $ARGV[1];
$found = 0;
open LOG, $log || die "cannot open logfile $!";
while ($line = <LOG>){
if ( $line =~ /\(SERVICE_NAME=$service_name\).*\(HOST=([\d.\w]+)\)\(USER=(\w+)\)/ ) {
print $service_name . "\t" . $1 . "\t" . $2 . "\t" . $3 . "\n";
$found = 1;
}
elsif ( $line =~ /\(USER=(\w+)\).*\(SERVICE_NAME=$service_name.*\).*\(HOST=([\d.\w]+)\)*/ ) {
print $service_name . "\t" . $1 . "\t" . $2 . "\t" . $3 . "\n";
$found = 1;
}
elsif ( $line =~ /\(CONNECT_DATA=\((\w+).*\(SERVICE_NAME=$service_name.*\).*\(HOST=([\d.\w]+)\)*/ ) {
print $service_name . "\t" . $1 . "\t" . $2 . "\t" . $3 . "\n";
$found = 1;
}
}
close LOG;
if ( $found == "0" ) {
print "\n" ;
print "There is no nothing found for " . $service_name . "\n" ;
print "Maybe the regex needs changing " . "\n" ;
print "The string format has been known to change " . "\n" ;
print "\n" ;
}
Here's some sample files to parse and run against this script.
#==> test1.log <==
#2018-07-23 13:19:38 * (CONNECT_DATA=(CID=(PROGRAM=JDBC Thin Client)(HOST=__jdbc__)(USER=mickey))(SERVER=DEDICATED)(SERVICE_NAME=work_app.com.ph)) * (ADDRESS=(PROTOCOL=tcp)(HOST=12.123.11.123)(PORT=53102)) * establish * work_app.com.ph * 0
#2018-07-23 09:12:12 * (CONNECT_DATA=(CID=(PROGRAM=SQL Developer)(HOST=__jdbc__)(USER=minnie))(SERVICE_NAME=work_app.com.ph)(SERVER=dedicated)(INSTANCE_NAME=testp11)) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.214.14.29)(PORT=53548)) * establish * work_app.com.ph * 0
#
#==> test2.log <==
#2019-05-12 04:17:10 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=fail_app.com.ph)(CID=(PROGRAM=C:\Windows\system32\exec01.exe)(HOST=MNLAPP01)(USER=!sysadmin01))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.11.11.123)(PORT=62625)) * establish * fail_app.com.ph * 0
#2019-05-12 04:17:10 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=fail_app.com.ph)(CID=(PROGRAM=C:\Windows\system32\exec02.exe)(HOST=MNLAPP01)(USER=!sysadmin01))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.11.11.123)(PORT=62627)) * establish * fail_app.com.ph * 0
#2019-05-12 04:17:10 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=fail_app.com.ph)(CID=(PROGRAM=C:\Windows\system32\exec03.exe)(HOST=MNLAPP01)(USER=!sysadmin01))(INSTANCE_NAME=xxxt23)) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.11.11.123)(PORT=62626)) * establish * fail_app.com.ph * 0
#2019-05-12 04:17:11 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=fail_app.com.ph)(CID=(PROGRAM=C:\Windows\system32\exec01.exe)(HOST=MNLAPP01)(USER=!sysadmin01))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.11.11.123)(PORT=62629)) * establish * fail_app.com.ph * 0
Sample run of the script is as below:
[host01]$ ./x.pl test1.log work_app
work_app mickey 12.123.11.123
work_app minnie 10.214.14.29
[host01]$ ./x.pl test2.log fail_app
fail_app SERVER 10.11.11.123
fail_app SERVER 10.11.11.123
fail_app SERVER 10.11.11.123
fail_app SERVER 10.11.11.123
Using awk and paste, this is what I am hoping to get with the Perl script
awk '{ print $4 }' test2.log | awk -F"(" '{ print $6 }' | awk -F")" '{ print $1 }' > program.tmp.99
awk '{ print $4 }' test2.log | awk -F"(" '{ print $7 }' | awk -F")" '{ print $1 }' > host.tmp.99
awk '{ print $4 }' test2.log | awk -F"(" '{ print $8 }' | awk -F")" '{ print $1 }' > user.tmp.99
awk '{ print $6 }' test2.log | awk -F"(" '{ print $4 }' | awk -F")" '{ print $1 }' > host_ip.tmp.99
paste program.tmp.99 host.tmp.99 user.tmp.99 host_ip.tmp.99 | sort | uniq
[host01]$ paste program.tmp.99 host.tmp.99 user.tmp.99 host_ip.tmp.99 | sort | uniq
PROGRAM=C:\Windows\system32\exec01.exe HOST=MNLAPP01 USER=!sysadmin01 HOST=10.11.11.123
PROGRAM=C:\Windows\system32\exec02.exe HOST=MNLAPP01 USER=!sysadmin01 HOST=10.11.11.123
PROGRAM=C:\Windows\system32\exec03.exe HOST=MNLAPP01 USER=!sysadmin01 HOST=10.11.11.123
May I please ask someone to kindly explain how the regex is parsing the string? I've been pulling whatever is left of my hair all day and still can't figure out how is it doing what it is meant to be doing. At the moment, I use awk to tmp files and paste to get what I wanted. It is not the best solution I know, sorry.
For the first run of x.pl, it looks alright, but am expecting hoping to get the PROGRAM value as well. I am hoping it should be $1
[host01]$ ./x.pl test1.log work_app
work_app mickey 12.123.11.123
work_app minnie 10.214.14.29
For the second run of x.pl, I was hoping to get the output from using awk+paste.
[host01]$ ./x.pl test2.log fail_app
fail_app SERVER 10.11.11.123
fail_app SERVER 10.11.11.123
fail_app SERVER 10.11.11.123
fail_app SERVER 10.11.11.123
I believe the answers to my problem is trying to figure how is the Perl regex is dissecting the string into several fields. I can understand this line here does the work of search/match for the search string but how does ot break it down into several fields
$line =~ /\(SERVICE_NAME=$service_name\).*\(HOST=([\d.\w]+)\)\(USER=(\w+)\)/
The connection changes also based on the program so sometimes I need information before that SERVICE_NAME and sometimes I need information after and sometimes both?
Some regex tutorial will be much appreacited
Please advise. Thanks.