Apache2 logs analysis

hi there,

need some improvement on this. thanks.

Purpose is to :

Generally identify illegal accesses in the Apache2 logs, like, System commands, SHELL hacks, malwares, bots, and other hacking attempts. Most of these have a common background of gaining access to the weak parts of the www side. I had a pretty interesting set of results, mostly from India, China, Italy, Southern America and Midlands of Africa ( somebody trying to hack while sitting in safari ) and of course the USA as well.

1) re-engineer the apache2's other_vhosts_access.logs -- can also be incorporated to analyse other log formats.
2) I need to smart-ize the Counters -- initially I have to create and initialize the counters at the BEGIN block, where I'm interested in something smarter to use less coding.
3) is there any variable in AWK/GAWK containing the value of "searched string" or lets call it search-pattern place holder.
4) END block is containing individual statements for all counters at the end, need to improve it as well. Thanks

Regards,
Nasir Mahmood

#!/usr/bin/awk -f
#
#
# version 1: Counters added to show count of matches at the end.
# 1.2:     changed and displayed the resulting match at the end of every line. Added color code to string matched,

BEGIN { FS="\""; SHOWLOG=1; IGNORECASE=1; CurlynumberNF=0; azAZ09NF=0; UnameNF=0; ExprNF=0; WgetNF=0; DecodeNF=0; EvalNF=0; Base64NF=0; azAZ09NF=0; DisconnectNF=0; ConnectNF=0; FunctionNF=0; ExitNF=0; DocRootNF=0; chrNF=0; DelayNF=0; WaitforNF=0;  PrintNF=0; CgiBinNF=0; PasswdNF=0; BinShNF=0; PerlNF=0; BashNF=0; SelectNF=0; zhCNNF=0; WordPress=0; WpCron=0; WpAdmin=0; CgiBin=0; Passwd=0; WpLogin=0; Echo2=0; Eval2=0; Base64=0; DOCROOT=0; SetTimeLimit=0; SetMagicQuotes=0; FilePutContent=0; Magento=0; PhpAdmin=0; PhpMyAdmin=0; FCKEditor=0; System2=0; Sqlite=0; SQLManager=0; WebEdit=0; WpContent=0; WebSQL=0; MySQLDumper=0; webdb=0; WebConsole=0; Digit200=0; azAZ300=0; WebManage=0; }

$2 ~ /webmanage/ { WebManage++; split($1,a," "); x[a[2]]++; if ( SHOWLOG ) {  $(NF-1)=$(NF-1)"\t[0-9]{100} !200"  }; printf("%s\t\033[1;32m%s\033[0m\t\t%s\n",a[2],$2,$(NF-1)); }
$2 ~ /[a-zA-Z_-]{300,}/ && $3 !~ /200/ { azAZ300++; split($1,a," "); x[a[2]]++; if ( SHOWLOG ) {  $(NF-1)=$(NF-1)"\t[0-9]{100} !200"  }; printf("%s\t\033[1;32m%s\033[0m\t\t%s\n",a[2],$2,$(NF-1)); }
$2 ~ /[0-9]{200,}/ && $3 !~ /200/ { Digit200++; split($1,a," "); x[a[2]]++; if ( SHOWLOG ) {  $(NF-1)=$(NF-1)"\t[0-9]{100} !200"  }; printf("%s\t\033[1;32m%s\033[0m\t\t%s\n",a[2],$2,$(NF-1)); }
$2 ~ /web-console/ { WebConsole++; split($1,a," "); x[a[2]]++; if ( SHOWLOG ) {  $(NF-1)=$(NF-1)"\t\033[1;31mweb-console\033[0m"  }; printf("%s\t\033[1;32m%s\033[0m\t%s\n",a[2],$2,$(NF-1)); }
$2 ~ /webdb/ { webdb++; split($1,a," "); x[a[2]]++; if ( SHOWLOG ) {  $(NF-1)=$(NF-1)"\t\033[1;31mwebdb\033[0m"  }; printf("%s\t\033[1;32m%s\033[0m\t%s\n",a[2],$2,$(NF-1)); }
$2 ~ /mysqldumper/ { MySQLDumper++; split($1,a," "); x[a[2]]++; if ( SHOWLOG ) {  $(NF-1)=$(NF-1)"\t\033[1;31mmysqldumper\033[0m"  }; printf("%s\t\033[1;32m%s\033[0m\t%s\n",a[2],$2,$(NF-1)); }
$2 ~ /websql/ { WebSQL++; split($1,a," "); x[a[2]]++; if ( SHOWLOG ) {  $(NF-1)=$(NF-1)"\t\033[1;31mwebsql\033[0m"  }; printf("%s\t\033[1;32m%s\033[0m\t%s\n",a[2],$2,$(NF-1)); }
$2 ~ /wp-content/ { WpContent++; split($1,a," "); x[a[2]]++; if ( SHOWLOG ) {  $(NF-1)=$(NF-1)"\t\033[1;31mwp-content\033[0m"  }; printf("%s\t\033[1;32m%s\033[0m\t%s\n",a[2],$2,$(NF-1)); }
$2 ~ /webedit/ { WebEdit++; split($1,a," "); x[a[2]]++; if ( SHOWLOG ) {  $(NF-1)=$(NF-1)"\t\033[1;31mwebedit\033[0m"  }; printf("%s\t\033[1;32m%s\033[0m\t%s\n",a[2],$2,$(NF-1)); }
$2 ~ /sqlmanager/ { SQLManager++; split($1,a," "); x[a[2]]++; if ( SHOWLOG ) {  $(NF-1)=$(NF-1)"\t\033[1;31msqlmanager\033[0m"  }; printf("%s\t\033[1;32m%s\033[0m\t%s\n",a[2],$2,$(NF-1)); }
$2 ~ /sqlite/ { Sqlite++; split($1,a," "); x[a[2]]++; if ( SHOWLOG ) {  $(NF-1)=$(NF-1)"\t\033[1;31msqlite\033[0m"  }; printf("%s\t\033[1;32m%s\033[0m\t%s\n",a[2],$2,$(NF-1)); }
$2 ~ /system/ { System2++; split($1,a," "); x[a[2]]++; if ( SHOWLOG ) {  $(NF-1)=$(NF-1)"\t\033[1;31msystem\033[0m"  }; printf("%s\t\033[1;32m%s\033[0m\t%s\n",a[2],$2,$(NF-1)); }
$2 ~ /fckeditor/ { FCKEditor++; split($1,a," "); x[a[2]]++; if ( SHOWLOG ) {  $(NF-1)=$(NF-1)"\t\033[1;31mfckeditor\033[0m"  }; printf("%s\t\033[1;32m%s\033[0m\t%s\n",a[2],$2,$(NF-1)); }
$2 ~ /phpmyadmin/ { PhpMyAdmin++; split($1,a," "); x[a[2]]++; if ( SHOWLOG ) {  $(NF-1)=$(NF-1)"\t\033[1;31mphpmyadmin\033[0m"  }; printf("%s\t\033[1;32m%s\033[0m\t%s\n",a[2],$2,$(NF-1)); }
$2 ~ /phpadmin/ { PhpAdmin++; split($1,a," "); x[a[2]]++; if ( SHOWLOG ) {  $(NF-1)=$(NF-1)"\t\033[1;31mphpadmin\033[0m"  }; printf("%s\t\033[1;32m%s\033[0m\t%s\n",a[2],$2,$(NF-1)); }
$2 ~ /magento/ { Magento++; split($1,a," "); x[a[2]]++; if ( SHOWLOG ) {  $(NF-1)=$(NF-1)"\t\033[1;31mmagento\033[0m"  }; printf("%s\t\033[1;32m%s\033[0m\t%s\n",a[2],$2,$(NF-1)); }
$2 ~ /"file_put_content"/ { FilePutContent++; split($1,a," "); x[a[2]]++; if ( SHOWLOG ) {  $(NF-1)=$(NF-1)"\t\033[1;31mFilePutContent\033[0m"  }; printf("%s\t\033[1;32m%s\033[0m\t%s\n",a[2],$2,$(NF-1)); }
$2 ~ /"set_magic_quotes"/ { SetMagicQuotes++; split($1,a," "); x[a[2]]++; if ( SHOWLOG ) {  $(NF-1)=$(NF-1)"\t\033[1;31mSetMagicQuotes\033[0m"  }; printf("%s\t\033[1;32m%s\033[0m\t%s\n",a[2],$2,$(NF-1)); }
$2 ~ /"set_time_limit"/ { SetTimeLimit++; split($1,a," "); x[a[2]]++; if ( SHOWLOG ) {  $(NF-1)=$(NF-1)"\t\033[1;31mSetTimeLimit\033[0m"  }; printf("%s\t\033[1;32m%s\033[0m\t%s\n",a[2],$2,$(NF-1)); }
$2 ~ /"DOCUMENT_ROOT"/ { DOCROOT++; split($1,a," "); x[a[2]]++; if ( SHOWLOG ) {  $(NF-1)=$(NF-1)"\t\033[1;31mDOCROOT\033[0m"  }; printf("%s\t\033[1;32m%s\033[0m\t%s\n",a[2],$2,$(NF-1)); }
$2 ~ /"base64"/ { Base64++; split($1,a," "); x[a[2]]++; if ( SHOWLOG ) {  $(NF-1)=$(NF-1)"\t\033[1;31mbase64\033[0m"  }; printf("%s\t\033[1;32m%s\033[0m\t%s\n",a[2],$2,$(NF-1)); }
$2 ~ /"eval"/ { Eval2++; split($1,a," "); x[a[2]]++; if ( SHOWLOG ) {  $(NF-1)=$(NF-1)"\t\033[1;31meval\033[0m"  }; printf("%s\t\033[1;32m%s\033[0m\t%s\n",a[2],$2,$(NF-1)); }
$2 ~ /"echo"/ { Echo2++; split($1,a," "); x[a[2]]++; if ( SHOWLOG ) {  $(NF-1)=$(NF-1)"\t\033[1;31mecho\033[0m"  }; printf("%s\t\033[1;32m%s\033[0m\t%s\n",a[2],$2,$(NF-1)); }
$2 ~ /\/wp-login/ { WpLogin++; split($1,a," "); x[a[2]]++;if ( SHOWLOG ) {  $(NF-1)=$(NF-1)"\t\033[1;31mwp-login\033[0m"  }; printf("%s\t\033[1;32m%s\033[0m\t%s\n",a[2],$2,$(NF-1)); }
$2 ~ /passwd/ { Passwd++; split($1,a," "); x[a[2]]++;if ( SHOWLOG ) {  $(NF-1)=$(NF-1)"\t\033[1;31mpasswd\033[0m"  }; printf("%s\t\033[1;32m%s\033[0m\t%s\n",a[2],$2,$(NF-1)); }
$2 ~ /"cgi-bin"/ { CgiBin++; split($1,a," "); x[a[2]]++; if ( SHOWLOG ) {  $(NF-1)=$(NF-1)"\t\033[1;31mcgi-bin\033[0m"  };  printf("%s\t\033[1;32m%s\033[0m\t%s\n",a[2],$2,$(NF-1)); }
$2 ~ /"wp-admin"/ { WpAdmin++ ;split($1,a," "); x[a[2]]++; if ( SHOWLOG ) {  $(NF-1)=$(NF-1)"\t\033[1;31mwp-admin\033[0m"  };  printf("%s\t\033[1;32m%s\033[0m\t%s\n",a[2],$2,$(NF-1)); }
$2 ~ /"wp-cron"/ { WpCron++;  split($1,a," "); x[a[2]]++; if ( SHOWLOG ) {  $(NF-1)=$(NF-1)"\t\033[1;31mwp-cron\033[0m"  }; printf("%s\t\033[1;32m%s\033[0m\t%s\n",a[2],$2,$(NF-1)); }
$2 ~ /"wordpress"/ { WordPress++; split($1,a," "); x[a[2]]++; if ( SHOWLOG ) {  $(NF-1)=$(NF-1)"\t\033[1;31mwordpress\033[0m"  };  printf("%s\t\033[1;32m%s\033[0m\t%s\n",a[2],$2,$(NF-1)); }
$(NF-1)  ~ /"zh_CN"/ { zhCNNF++; split($1,a," "); x[a[2]]++; if ( SHOWLOG ) {  $(NF-1)=$(NF-1)"\t\033[1;31mBase64_Decode\033[0m"  }; printf("%s\t\033[1;32m%s\033[0m\t%s\n",a[2],$2,$(NF-1)); }
( $(NF-1) !~ /Mozilla/ && $(NF-1) ~ /\\x[a-fA-Z0-9]+/ ) { Hexa++; split($1,a," "); x[a[2]]++; if ( SHOWLOG ) {  $(NF-1)=$(NF-1)"\t\033[1;31mx[a-z0-9]\033[0m"  }; printf("%s\t%s\t\t\033[1;32m%s\033[0m\n",a[2],$2,$(NF-1));}
$(NF-1) ~ /"select"/ { SelectNF++; split($1,a," "); x[a[2]]++; if ( SHOWLOG ) {  $(NF-1)=$(NF-1)"\t\033[1;31mselect\033[0m"  }; printf("%s\t%s\t\033[1;32m%s\033[0m\n",a[2],$2,$(NF-1));}
$(NF-1) ~ /"bash"/ { BashNF++;  split($1,a," "); x[a[2]]++; if ( SHOWLOG ) {  $(NF-1)=$(NF-1)"\t\033[1;31mbash\033[0m"  }; printf("%s\t%s\t\033[1;32m%s\033[0m\n",a[2],$2,$(NF-1));}
$(NF-1) ~ /"perl"/ { PerlNF++;  split($1,a," "); x[a[2]]++; if ( SHOWLOG ) {  $(NF-1)=$(NF-1)"\t\033[1;31mperl\033[0m"  }; printf("%s\t%s\t\033[1;32m%s\033[0m\n",a[2],$2,$(NF-1));}
$(NF-1) ~ /bin\/sh/ { BinShNF++; split($1,a," "); x[a[2]]++; if ( SHOWLOG ) {  $(NF-1)=$(NF-1)"\t\033[1;31mbin/sh\033[0m"  }; printf("%s\t%s\t\033[1;32m%s\033[0m\n",a[2],$2,$(NF-1));}
$(NF-1) ~ /"passwd"/ { PasswdNF++; split($1,a," "); x[a[2]]++; if ( SHOWLOG ) {  $(NF-1)=$(NF-1)"\t\033[1;31mpasswdNF\033[0m"  }; printf("%s\t%s\t\033[1;32m%s\033[0m\n",a[2],$2,$(NF-1));}
$(NF-1) ~ /"cgi-bin"/ { CgiBinNF++; split($1,a," "); x[a[2]]++; if ( SHOWLOG ) {  $(NF-1)=$(NF-1)"\t\033[1;31mcgi-binNF\033[0m"  }; printf("%s\t%s\t\033[1;32m%s\033[0m\n",a[2],$2,$(NF-1));}
$(NF-1) ~ /"print"/ { PrintNF++; split($1,a," "); x[a[2]]++; if ( SHOWLOG ) {  $(NF-1)=$(NF-1)"\t\033[1;31mprintNF\033[0m"  };  printf("%s\t%s\t\033[1;32m%s\033[0m\n",a[2],$2,$(NF-1));}
$(NF-1) ~ /"waitfor"/ { WaitforNF++; split($1,a," "); x[a[2]]++; if ( SHOWLOG ) {  $(NF-1)=$(NF-1)"\t\033[1;31mwaitforNF\033[0m"  }; printf("%s\t%s\t\033[1;32m%s\033[0m\n",a[2],$2,$(NF-1));}
$(NF-1) ~ /"delay"/ { DelayNF++; split($1,a," "); x[a[2]]++; if ( SHOWLOG ) {  $(NF-1)=$(NF-1)"\t\033[1;31mdelay\033[0m"  };  printf("%s\t%s\t\033[1;32m%s\033[0m\n",a[2],$2,$(NF-1));}
$(NF-1) ~ /\<chr\([0-9a-zA-Z]+\)\>/ { chrNF++; split($1,a," "); x[a[2]]++; if ( SHOWLOG ) {  $(NF-1)=$(NF-1)"\t\033[1;31mchrNF\033[0m"  }; printf("%s\t%s\t\033[1;32m%s\033[0m\n",a[2],$2,$(NF-1));}
$(NF-1) ~ /"DOCUMENT_ROOT"/ { DocRootNF++; split($1,a," "); if ( SHOWLOG ) {  $(NF-1)=$(NF-1)"\t\033[1;31mDOCUMENT_ROOT\033[0m"  }; x[a[2]]++; printf("%s\t%s\t\033[1;32m%s\033[0m\n",a[2],$2,$(NF-1));}
$(NF-1) ~ /"exit"/ { ExitNF++; split($1,a," "); x[a[2]]++; if ( SHOWLOG ) {  $(NF-1)=$(NF-1)"\t\033[1;31mexitNF\033[0m"  }; printf("%s\t%s\t\033[1;32m%s\033[0m\n",a[2],$2,$(NF-1));}
$(NF-1 )  ~ /"function"/ { FunctionNF++; split($1,a," "); x[a[2]]++; if ( SHOWLOG ) {  $(NF-1)=$(NF-1)"\t\033[1;31mfunctionNF\033[0m"  }; printf("%s\t%s\t\033[1;32m%s\033[0m\n",a[2],$2,$(NF-1));}
( $(NF-1) !~ /Mozilla/ &&  $(NF-1) !~ /Outlook/ && $(NF-1) !~ /internal dummy connection/ && $3 !~ /200/ && $(NF-1) ~ /connect/ )  { ConnectNF++; split($1,a," "); x[a[2]]++; if ( SHOWLOG ) {  $(NF-1)=$(NF-1)"\t\033[1;31mconnectNF\033[0m"  }; printf("%s\t%s\t\033[1;32m%s\033[0m\n",a[2],$2,$(NF-1));}
$(NF-1) ~ /"disconnect"/ { DisconnectNF++; split($1,a," "); x[a[2]]++; if ( SHOWLOG ) {  $(NF-1)=$(NF-1)"\t\033[1;31mdisconnectNF\033[0m"  }; printf("%s\t%s\t\033[1;32m%s\033[0m\n",a[2],$2,$(NF-1));}
$(NF-1) ~ /[0-9a-zA-Z]{300,}/ { azAZ09NF++;  split($1,a," "); x[a[2]]++; if ( SHOWLOG ) {  $(NF-1)=$(NF-1)"\t\033[1;31ma-zA-Z0-9-300\033[0m"  }; printf("%s\t%s\t\033[1;32m%s\033[0m\n",a[2],$2,$(NF-1));}
$(NF-1) ~ /"base64"/ { Base64NF++;  split($1,a," "); x[a[2]]++; if ( SHOWLOG ) {  $(NF-1)=$(NF-1)"\t\033[1;31mbase64NF\033[0m"  }; printf("%s\t%s\t\033[1;32m%s\033[0m\n",a[2],$2,$(NF-1));}
$(NF-1) ~ /\<eval\>/ { EvalNF++; split($1,a," "); x[a[2]]++; if ( SHOWLOG ) {  $(NF-1)=$(NF-1)"\t\033[1;31mevalNF\033[0m"  }; printf("%s\t%s\t\033[1;32m%s\033[0m\n",a[2],$2,$(NF-1));}
$(NF-1) ~ /"decode"/ { DecodeNF++; split($1,a," "); x[a[2]]++; if ( SHOWLOG ) {  $(NF-1)=$(NF-1)"\t\033[1;31mdecodeNF\033[0m"  }; printf("%s\t%s\t\033[1;32m%s\033[0m\n",a[2],$2,$(NF-1));}
$(NF-1) ~ /"wget([0-9]+)"/ { WgetNF++; split($1,a," "); x[a[2]]++; if ( SHOWLOG ) {  $(NF-1)=$(NF-1)"\t\033[1;31mwgeNF\033[0m"  }; printf("%s\t%s\t\033[1;32m%s\033[0m\n",a[2],$2,$(NF-1));}
$(NF-1) ~ /"expr"/ { ExprNF++; split($1,a," "); x[a[2]]++; if ( SHOWLOG ) {  $(NF-1)=$(NF-1)"\t\033[1;31mexprNF\033[0m"  }; printf("%s\t%s\t\033[1;32m%s\033[0m\n",a[2],$2,$(NF-1));}
$(NF-1) ~ /"uname"/ { UnameNF++; split($1,a," "); x[a[2]]++; if ( SHOWLOG ) {  $(NF-1)=$(NF-1)"\t\033[1;31muanemNF\033[0m"  }; printf("%s\t%s\t\033[1;32m%s\033[0m\n",a[2],$2,$(NF-1));}
$(NF-1) ~ /\$\([a-zA-Z0-9]+\)/ { azAZ09NF++; split($1,a," "); x[a[2]]++; if ( SHOWLOG ) {  $(NF-1)=$(NF-1)"\t\033[1;31m$(a-zA-Z0-9)\033[0m"  }; printf("%s\t%s\t\033[1;32m%s\033[0m\n",a[2],$2,$(NF-1));}
$(NF-1) ~ /\$\{[0-9]+\}/    { CurlynumberNF++; split($1,a," "); x[a[2]]++; if ( SHOWLOG ) {  $(NF-1)=$(NF-1)"\t\033[1;31m$(0-9)\033[0m"  }; printf("%s\t%s\t\033[1;32m%s\033[0m\n",a[2],$2,$(NF-1));}
END {
printf("%-20s\t%d\n","azAZ09NF",azAZ09NF);
printf("%-20s\t%d\n","UnameNF",UnameNF);
printf("%-20s\t%d\n","ExprNF",ExprNF);
printf("%-20s\t%d\n","WgetNF",WgetNF);
printf("%-20s\t%d\n","DecodeNF",DecodeNF);
printf("%-20s\t%d\n","EvalNF",EvalNF);
printf("%-20s\t%d\n","Base64NF",Base64NF);
printf("%-20s\t%d\n","azAZ09NF",azAZ09NF);
printf("%-20s\t%d\n","DisconnectNF",DisconnectNF);
printf("%-20s\t%d\n","ConnectNF",ConnectNF);
printf("%-20s\t%d\n","FunctionNF",FunctionNF);
printf("%-20s\t%d\n","ExitNF",ExitNF);
printf("%-20s\t%d\n","DocRootNF",DocRootNF);
printf("%-20s\t%d\n","chrNF",chrNF);
printf("%-20s\t%d\n","DelayNF",DelayNF);
printf("%-20s\t%d\n","WaitforNF",WaitforNF);
printf("%-20s\t%d\n","PrintNF",PrintNF);
printf("%-20s\t%d\n","CgiBinNF",CgiBinNF);
printf("%-20s\t%d\n","PasswdNF",PasswdNF);
printf("%-20s\t%d\n","BinShNF",BinShNF);
printf("%-20s\t%d\n","PerlNF",PerlNF);
printf("%-20s\t%d\n","BashNF",BashNF);
printf("%-20s\t%d\n","SelectNF",SelectNF);
printf("%-20s\t%d\n","zhCNNF",zhCNNF);
printf("%-20s\t%d\n","WordPress",WordPress);
printf("%-20s\t%d\n","WpCron",WpCron);
printf("%-20s\t%d\n","WpAdmin",WpAdmin);
printf("%-20s\t%d\n","CgiBin",CgiBin);
printf("%-20s\t%d\n","Passwd",Passwd);
printf("%-20s\t%d\n","WpLogin",WpLogin);
printf("%-20s\t%d\n","Echo2",Echo2);
printf("%-20s\t%d\n","Eval2",Eval2);
printf("%-20s\t%d\n","Base64",Base64);
printf("%-20s\t%d\n","DOCROOT",DOCROOT);
printf("%-20s\t%d\n","SetTimeLimit",SetTimeLimit);
printf("%-20s\t%d\n","SetMagicQuotes",SetMagicQuotes);
printf("%-20s\t%d\n","FilePutContent",FilePutContent);
printf("%-20s\t%d\n","Magento",Magento);
printf("%-20s\t%d\n","PhpAdmin",PhpAdmin);
printf("%-20s\t%d\n","PhpMyAdmin",PhpMyAdmin);
printf("%-20s\t%d\n","FCKEditor",FCKEditor);
printf("%-20s\t%d\n","System2",System2);
printf("%-20s\t%d\n","Sqlite",Sqlite);
printf("%-20s\t%d\n","SQLManager",SQLManager);
printf("%-20s\t%d\n","WebEdit",WebEdit);
printf("%-20s\t%d\n","WpContent",WpContent);
printf("%-20s\t%d\n","WebSQL",WebSQL);
printf("%-20s\t%d\n","MySQLDumper",MySQLDumper);
printf("%-20s\t%d\n","webdb",webdb);
printf("%-20s\t%d\n","WebConsole",WebConsole);
printf("%-20s\t%d\n","Digit200",Digit200);
printf("%-20s\t%d\n","azAZ300",azAZ300);
printf("%-20s\t%d\n","WebManage",WebManage);

        for ( j in x )  {
                print j
                        }
    }


Sorry, but this code is as unreadable as probably possible. You might want to start by bringing it into a form a human can actually understand.

Inotherwordsitisquitehardtounderstandwhatyouaremeaning
ifthereisnostructureinyourcodeonecanrecognizeandyoumight
wanttostarttherebeforeevenattemptingtochangeyourcode.

I hope this helps.

bakunin

2 Likes

Fully supporting what bakunin says, on first glance one can see that there are many, many repeating (almost) identical operations, so using adequate data structures you could dramatically simplify the entire script, making it way more maintainable at the same time.
On top, a few lines of sample input data would help as well...

my apologies:

I have only production logs available from my boxes, which I used to extract from the above given script.

Reverse Engineering the code is not a big problem for those who dare.

for everyone else, here is the login

input is something like below:

example.com:80 IP_ADDRESS - - [07/Aug/2016:02:03:42 +0100] "GET /extracted-Request HTTP/1.1" 200 9638 "-" "Mozilla/5.0 (compatible; trovitBot 1.0; +http://www.trovit.com/bot.html)"

repetition of the above code lines will eventually make the script run.

the resulting output will be something like :

WpLogin                 784
Echo2                   0
Eval2                   0
Base64                  0
DOCROOT                 0
SetTimeLimit            0
SetMagicQuotes          0
FilePutContent          0
Magento                 0
PhpAdmin                0
PhpMyAdmin              1
FCKEditor               283
System2                 46
Sqlite                  0
SQLManager              0
WebEdit                 2
WpContent               2850
WebSQL                  0
MySQLDumper             0
webdb                   0
WebConsole              0
Digit200                0
azAZ300                 0
WebManage               4

<IP LIST>

It is not about "reverse engineering": if you are not the person who wrote this code I'd say throw it away, carefully analyse what you need and then implement that. Personally i think the script you have shown us is beyond repair.

I hope this helps.

bakunin

We can all analyze what each of the lines in that script is doing if we waste the time to make it readable by humans as well as by awk . But, without a clear specification of what you are trying to do and without a representative sample of the data being processed, we have no reason to know what parts of the code work correctly by design, what pars of the code work correctly by accident, and what parts of the code are broken. And showing us output with no input from which it was derived is an extremely small help.

With what you have given us, there is no reason for us to waste time trying to guess at what might be done better (other than to make the code much easier to read).

I have had a quick try at simplifying this script for you.

I managed to identify 3 different tests you are doing and created a check() function that
will cover these cases. It checks for a match and returns zero of no match. Otherwise it logs when required and returns 1. The return value is added to each of your counters.

I'm sure there could be much more simplification if you specified you expressions and counter names in another config file. But you would still need to edit the config file to change the tests so I doubt much more would be gained going that way.

Below, I use check() function to increment counters for your 3 different test cases - your job is to extend this for the full testing set. Note there is no need to initialise the counters as they will be set to zero automatically once the first line is processed.

#!/usr/bin/awk -f
function check(Fld, mtch, ex) {
   # ex will always be null (false) if it is not passed in,
   # otherwise it must equate to true to continue
   if(!ex && (Fld !~ mtch)) return 0

   x[IP]++
   if (SHOWLOG) printf("%s\t\033[1;32m%s\033[0m\t\t\033[1;32m%s\033[0m\n", IP, $2, mtch)
   return 1
}

BEGIN { FS="\""; SHOWLOG=1; IGNORECASE=1 }

{
  split($1,a," ")
  IP = a[2]

  # Case 1 - match to $2
  WebManage += check($2, "webmanage")
  WebSQL    += check($2, "websql")
  Digit200  += check($2, "[0-9]{200,}")

  # Case 2 - match to $(NF - 1)
  PrintNF   += check($(NF -1), "print")
  BinShNF   += check($(NF -1), "bin/sh")

  # Case 3 - complex expression
  Hexa      += check("", "[a-z0-9]", ( $(NF-1) !~ /Mozilla/ && $(NF-1) ~ /\\x[a-fA-Z0-9]+/ ))
  ConnectNF += check("", "connect", ( $(NF-1) !~ /Mozilla/ &&  $(NF-1) !~ /Outlook/ && $(NF-1) !~ /internal dummy connection/ && $3 !~ /200/ && $(NF-1) ~ /connect/))
}

END {
  printf("%-20s\t%d\n","webManage", WebManage);
  printf("%-20s\t%d\n","WebSQL", WebSQL);
  printf("%-20s\t%d\n","Digit200", Digit200);
  printf("%-20s\t%d\n","PrintNF", PrintNF);
  printf("%-20s\t%d\n","BinShNF", BinShNF);
  printf("%-20s\t%d\n","Hexa", Hexa);
  printf("%-20s\t%d\n","ConnectNF", ConnectNF);

  for ( j in x )  {
      print j
  }
}