Help needed with log conversion script.

Hi All,

I have a log file with several entries which need to be converted in a different format:

A)
log "tcp://1.2.3.4:80"
should be translated to --> Logged this from host 1.2.3.4 port 80

B)
log "tcp://1.2.3.4:*" --> Logged this from host 1.2.3.4

C)
log "tcp://1.2.3.4:80,8080" -->
Logged this from host 1.2.3.4:80 port 80
Logged this from host 1.2.3.4:80 port 8080�

D)
log "tcp://1.2.3.4:80-101" --> Logged this from host 1.2.3.4 range 80 101
.................................................................................................

Could you please guide me on this?

Best Regards.

This should responds exactly to the A,B,C and D requirements :

S=${1:6}
IP="Logged this from host ${S%:*}"
P=${S##*:}
if [ "$P" = "*" ]
then # case B
    echo "$IP"
elif I=$(expr index $P '-')
then # case D
    echo "$IP range ${P:0:$((I-1))} ${P:$I}"
elif I=$(expr index $P ',')
then # case C
    echo -e "$IP port ${P:0:$((I-1))}\n$IP port ${P:$I}"
else # case A
    echo "$IP port $P"
fi

call this with your log line as argument

Thanks for sharing this so quickly, frans.

I know basic scripting only. Honestly, I am looking at each line right now to understand what it means. Please, if it's possible, can you remark on regex used here.

Also, I checked this script. There are some points I observed during testing:

a. When I execute the script with parameter of 'log tcp://1.2.3.4:80', it throws syntax error.
b. There's no check on IP octet / values; i.e. if we give an i/p of 1.2.3.4.5.6.7.8.9.0, then it will be printed as is.
c. When there are more than 2 ports, the second line puts p2, p3 n so on together, instead of on separate lines.
d. There are 100s of entries in a log file, so instead of feeding an IP:port one by one, I want to automate it.
e. How can I add more conditional checks in the script. For example, if I see a record -> log tcp://1.2.3.4:3389 -> then I'd want to put it as -> RDP from home system 1.2.3.4 port 3389.
f. is there a way without regex :stuck_out_tongue:

Update:
I read the regex and just wanted to add what I understood from the script components above. Running the script for 2 ports -> log tcp://1.2.3.4:100-110:

S=${1:6}	                            <-- initializing S as an array / what is 1:6?
IP="Logged this from host ${S%:*}"    <-- implies take any value before :
P=${S##*:}	                            <-- implies take any value after :
if [ "$P" = "*" ]	                            <-- check if value of P is eq to *
then # case B
    echo "$IP"
elif I=$(expr index $P '-')	            <-- locates the char '-' in P and returns index value (?) to I (why?) / how is I = 3 here?
then # case D
    echo "$IP range ${P:0:$((I-1))} ${P:$I}"	<-- $(P:0:2) is 100 (how?) / $(P:3) is port 110
elif I=$(expr index $P ',')	           <-- checks if there is a (,) in P
then # case C
    echo -e "$IP port ${P:0:$((I-1))}\n$IP port ${P:$I}" <-- same check as above
else # case A
    echo "$IP port $P"
fi

Best Regards.

$ cat logfile
tcp://1.2.3.4:80
tcp://1.2.3.4:*
tcp://1.2.3.4:80,8080
tcp://1.2.3.4:80-101

$ awk -F ":|/" '/\*$/ {printf "Logged this from host %s\n",$4;next}
              /,/ {split($5,a,",") ; for (i in a) printf "Logged this from host %s port %s\n", $4, a;next}
              /-/ {sub(/-/," ",$5) ; printf "Logged this from host %s range %s\n",$4,$5;next}
              {printf "Logged this from host %s port %s\n",$4,$5}' logfile

Logged this from host 1.2.3.4 port 80
Logged this from host 1.2.3.4
Logged this from host 1.2.3.4 port 80
Logged this from host 1.2.3.4 port 8080
Logged this from host 1.2.3.4 range 80 101

To scan a file

while read L
do
# Put the above code here but modify the first line
# S=${1:6}
S=${L:6}
# (... remaining of the code)
done < logfile

Thnx frans. In this part:

when there are more than 2 ports separated by comma, the logic does not work. For example, for tcp://1.2.3.4:10,20,30, this is returned

Logged this from host 1.2.3.4 port 10
Logged this from host 1.2.3.4 port 20,30

Secondly, is there a way I can take out 'tcp' in a var instead of removin it. Then I think I will be able to put an -> if [ "$var" eq "tcp" ] then do the flow1 else do flow2...

Best Regards.

Reviewed script with important modifications :

#!/bin/bash
Split() {    # Function to make an array with a variable
    eval "$1=( \"$(echo "${!1}" | sed "s/$2/\" \"/g")\" )"
}
while read L
do
    Split L ':' # Makes an array ( 0:Protocol 1:IP 2:Port(s) )
    if [ ${L[0]} = tcp ]
    then
        IP="Logged this from host ${L[1]:2}" # :2 to remove the 2 slashes from host IP
        P=${L[2]}
        Split P ',' # makes an array with port numbers
        if [ "$P" = "*" ]    # case B
        then echo "$IP"
        elif I=$(expr index $P '-') # case D
        then    echo "$IP range ${P:0:$((I-1))} ${P:$I}"
        else # case A & C
            for ((i=0; i<${#P[@]}; i++))
            do
                echo "$IP port ${P[$i]}"
            done
        fi
    else
        echo "Other protocol : ${L[0]}"
    fi
done < infile

Hi rdcwayx, Thnx.

I tried adding this:

for i in $(cat c2.txt)
do
        #proto="$i" | cut -d ':' -f 1;
        proto=${i:0:3}
        echo $proto

<then above script here>

done

This is to take the first 3 chars, tcp in this case, in a variable - proto - and use this to customize the statements as -> Logged $proto request from host ...

My script snippet doesn't seem right though.

Update:
I know how to do it now. I used $1 to put the value 'tcp':

$ awk -F ":|/" '/\*$/ {printf "Logged "$1" from host %s\n",$4;next}
              /,/ {split($5,a,",") ; for (i in a) printf "Logged  "$1"  from host %s port %s\n", $4, a;next}
              /-/ {sub(/-/," ",$5) ; printf "Logged  "$1"  from host %s range %s\n",$4,$5;next}
              {printf "Logged  "$1"  from host %s port %s\n",$4,$5}' logfile

Logged tcp request from host 1.2.3.4 port 80
Logged tcp request from host 1.2.3.4
Logged tcp request from host 1.2.3.4 port 80
Logged tcp request from host 1.2.3.4 port 8080
Logged tcp request from host 1.2.3.4 range 80 101

Best Regards.

---------- Post updated at 09:37 AM ---------- Previous update was at 05:50 AM ----------

Hi,

In this script:

$ awk -F ":|/" '/\*$/ {printf "Logged "$1" from host %s\n",$4;next}
A 
/,/ {split($5,a,",") ; for (i in a) printf "Logged  "$1"  from host %s port %s\n", $4, a;next}

B 
/-/ {sub(/-/," ",$5) ; printf "Logged  "$1"  from host %s range %s\n",$4,$5;next}

              {printf "Logged  "$1"  from host %s port %s\n",$4,$5}' logfile

When there are several ports separated by comma (,), they are put as separate lines which is fine.

But I am now trying to check on A if the ports are in sequential order or not. For example,

tcp://1.2.3.4:20,21,22,23,100

. In this record, the ports are in sequence, and these should be placed as range, and the last port as a separate line i.e.

Logged tcp request from host 1.2.3.4 range 20 23
Logged tcp request from host 1.2.3.4 port 100

Can I use if statement in Line A in awk? or there is a better way to do this..

Best Regards..

---------- Post updated at 04:36 PM ---------- Previous update was at 09:35 AM ----------

---------- Post updated at 04:38 PM ---------- Previous update was at 04:36 PM ----------

Superb! :slight_smile:

rdcwayx, earlier issues are resolved now. I made one change to the script. This will put appropriate protocol type tcp / udp as per the log:

#!/bin/bash
Split() {    # Function to make an array with a variable
    eval "$1=( \"$(echo "${!1}" | sed "s/$2/\" \"/g")\" )"
}
while read L
do
    Split L ':' # Makes an array ( 0:Protocol 1:IP 2:Port(s) )
#    if [ ${L[0]} = tcp ]
#    then
        IP=" Logged "${L[0]}" request from host ${L[1]:2}" # :2 to remove the 2 slashes from host IP
        P=${L[2]}
        Split P ',' # makes an array with port numbers
        if [ "$P" = "*" ]    # case B
        then echo "$IP"
        elif I=$(expr index $P '-') # case D
        then    echo "$IP range ${P:0:$((I-1))} ${P:$I}"
         else # case A & C
            for ((i=0; i<${#P[@]}; i++))
            do
                echo "$IP port ${P[$i]}"
            done
        fi
#    else
#        echo "Other protocol : ${L[0]}"
#    fi
done < c2.txt

I noticed that when the ports are in sequence, for example -

tcp://1.2.3.4:20,21,22

or
in sequence but randomly placed, like -

tcp://1.2.3.4:20,22,21

these could be placed in a range statement, i.e.
right now they will be printed as

Logged from host 1.2.3.4 port 20
Logged from host 1.2.3.4 port 21
Logged from host 1.2.3.4 port 22

but how can we print them as -

Logged from host 1.2.3.4 range 20 22

.

Implying if the ports are in numerical sequence, separated by comma, and may or may not be in sequentially placed in log entry, they should be written by way of 'range' instead of on separate lines.

Best Regards..

:smiley: That would be nice !! :rolleyes:
Congratulations to who will have a simple way to do that !

Good night, sweet dreams !

:smiley: I understand n should take a break now, especially if this is coming from ya. It'd be hard to beat for sure.

Gracias frans for your time n sharing knowledge.