Help to parse syslog with perl

[quote=arm;303037305]
logver=56 idseq=63256900099118326 itime=1563205190 devid=FG-5KDTB18800138 devname=LAL-C1-FGT-03 vd=USER date=2019-07-15 time=18:39:49 logid="0000000013" type="traffic"
subtype="forward" level="notice" eventtime=1563205189 srcip=11.3.3.17 srcport=50544 srcintf="SGI-CORE.123" srcintfrole="undefined" dstip=12.0.1.1 dstport=443 dsti
ntf="FA-SPI.100" dstintfrole="undefined" poluuid="230d4d26-AAAA-51e9-b9d1-7bf4c828f000" sessionid=20639817 proto=6 action="server-rst" policyid=10 policytype="policy" s
ervice="HTTPS" dstcountry="United State " srccountry="Reserved" trandisp="snat" transip=11.1.1.1 transport=5092 duration=71 sentbyte=093 rcvdbyte=213 sentpkt=11 rcv
dpkt=16 appcat="unscanned"

I used below script to parsing 1000000 records

#!/usr/bin/env perl
use strict;
use warnings;
while( <> ) {
    if ( /^(?=.*eventtime=(\S+))(?=.*srcip=(\S+))(?=.*srcport=(\S+))(?=.*dstip=(\S+))(?=.*dstport=(\S+))(?=.*sessionid=(\S+))(?=.*action=(\S+))(?=.*policyid=(\S+))(?=.*service=(\S+))(?=.*dstcountry=(\S+))(?=.*transip=(\S+))(?=.*transport=(\S+))(?=.*duration=(\S+)).*$/ ) {
            print "$1|$2|$3|$4|$5|$6|$7|$8|$9|$10|$11|$12|$13\n";
                }
                }

the problem here is didn't manage to find the correct "regular expression" to match dstcountry , what I need is to give me "United State" not "United

1563205189|11.3.3.17|50544|12.0.1.1 |443|20585519|"server-rst"|10|"HTTPS"|"United|11.1.1.1|5092|71

Try this:

#!/usr/bin/env perl
use strict;
use warnings;
while( <> ) {
    if ( /^(?=.*eventtime=(\S+))(?=.*srcip=(\S+))(?=.*srcport=(\S+))
           (?=.*dstip=(\S+))(?=.*dstport=(\S+))(?=.*sessionid=(\S+))
           (?=.*action=(\S+))(?=.*policyid=(\S+))(?=.*service=(\S+))
           (?=.*dstcountry=("[^"]+"|\S+))(?=.*transip=(\S+))
           (?=.*transport=(\S+))(?=.*duration=(\S+)).*$/x ) {
            print "$1|$2|$3|$4|$5|$6|$7|$8|$9|$10|$11|$12|$13\n";
                }
                }

--- Post updated at 07:17 AM ---

Or if the quote are always there you can drop the |\S+ and if you want to strip the quotes use (?=.*dstcountry="([^"]+)")

1 Like

@Chubler_XL
billion thanks for prompt response , in fact I have no "perl" background so if you can do me a favor and answer my below inquires it would be highly appreciated:

  1. Could you please tell me where can I learn such regular expressions , any link or any book should go though ?
  2. What is the equivalent regex for :
(?=.*dstcountry=("[^"]+"|\S+))  and  (?=.*dstcountry="([^"]+)")

if I wanna do it using stream editor in bash:

sed 's/?/?/g'

Many modern sed implementations support extended regular expressions (POSIX ERE) via the -E option . This is not a powerful as perl and does not support named capture groups, but should be sufficient for your requirement.

Using ERE with the sed -n option (to suppress automatic printing of pattern space) we can do:

sed -En 's/.*dstcountry=("[^"]+"|\S+).*/\1/p'
sed -En 's/.*dstcountry="([^"]+)".*/\1/p'

\1 in the replace part of the substitution is referring to the first capture group the same as $1 in perl.

Sorry I don't know of a good source to learn all this stuff, I just picked it up over the years of using these products. Perhaps try searching for "learning regex" or "regular expression examples" using your favorite search engine.

I did a quick search myself and came across this quite nice regex cheat sheet, it seems to cover a lot of features and is fairly easy to use Regex Cheat Sheet

2 Likes