I have a log file with several entries which need to be converted in a different format:
A)
log "tcp://1.2.3.4:80"
should be translated to --> Logged this from host 1.2.3.4 port 80
B)
log "tcp://1.2.3.4:*" --> Logged this from host 1.2.3.4
C)
log "tcp://1.2.3.4:80,8080" -->
Logged this from host 1.2.3.4:80 port 80
Logged this from host 1.2.3.4:80 port 8080�
D)
log "tcp://1.2.3.4:80-101" --> Logged this from host 1.2.3.4 range 80 101
.................................................................................................
This should responds exactly to the A,B,C and D requirements :
S=${1:6}
IP="Logged this from host ${S%:*}"
P=${S##*:}
if [ "$P" = "*" ]
then # case B
echo "$IP"
elif I=$(expr index $P '-')
then # case D
echo "$IP range ${P:0:$((I-1))} ${P:$I}"
elif I=$(expr index $P ',')
then # case C
echo -e "$IP port ${P:0:$((I-1))}\n$IP port ${P:$I}"
else # case A
echo "$IP port $P"
fi
I know basic scripting only. Honestly, I am looking at each line right now to understand what it means. Please, if it's possible, can you remark on regex used here.
Also, I checked this script. There are some points I observed during testing:
a. When I execute the script with parameter of 'log tcp://1.2.3.4:80', it throws syntax error.
b. There's no check on IP octet / values; i.e. if we give an i/p of 1.2.3.4.5.6.7.8.9.0, then it will be printed as is.
c. When there are more than 2 ports, the second line puts p2, p3 n so on together, instead of on separate lines.
d. There are 100s of entries in a log file, so instead of feeding an IP:port one by one, I want to automate it.
e. How can I add more conditional checks in the script. For example, if I see a record -> log tcp://1.2.3.4:3389 -> then I'd want to put it as -> RDP from home system 1.2.3.4 port 3389.
f. is there a way without regex
Update:
I read the regex and just wanted to add what I understood from the script components above. Running the script for 2 ports -> log tcp://1.2.3.4:100-110:
S=${1:6} <-- initializing S as an array / what is 1:6?
IP="Logged this from host ${S%:*}" <-- implies take any value before :
P=${S##*:} <-- implies take any value after :
if [ "$P" = "*" ] <-- check if value of P is eq to *
then # case B
echo "$IP"
elif I=$(expr index $P '-') <-- locates the char '-' in P and returns index value (?) to I (why?) / how is I = 3 here?
then # case D
echo "$IP range ${P:0:$((I-1))} ${P:$I}" <-- $(P:0:2) is 100 (how?) / $(P:3) is port 110
elif I=$(expr index $P ',') <-- checks if there is a (,) in P
then # case C
echo -e "$IP port ${P:0:$((I-1))}\n$IP port ${P:$I}" <-- same check as above
else # case A
echo "$IP port $P"
fi
$ cat logfile
tcp://1.2.3.4:80
tcp://1.2.3.4:*
tcp://1.2.3.4:80,8080
tcp://1.2.3.4:80-101
$ awk -F ":|/" '/\*$/ {printf "Logged this from host %s\n",$4;next}
/,/ {split($5,a,",") ; for (i in a) printf "Logged this from host %s port %s\n", $4, a;next}
/-/ {sub(/-/," ",$5) ; printf "Logged this from host %s range %s\n",$4,$5;next}
{printf "Logged this from host %s port %s\n",$4,$5}' logfile
Logged this from host 1.2.3.4 port 80
Logged this from host 1.2.3.4
Logged this from host 1.2.3.4 port 80
Logged this from host 1.2.3.4 port 8080
Logged this from host 1.2.3.4 range 80 101
when there are more than 2 ports separated by comma, the logic does not work. For example, for tcp://1.2.3.4:10,20,30, this is returned
Logged this from host 1.2.3.4 port 10
Logged this from host 1.2.3.4 port 20,30
Secondly, is there a way I can take out 'tcp' in a var instead of removin it. Then I think I will be able to put an -> if [ "$var" eq "tcp" ] then do the flow1 else do flow2...
#!/bin/bash
Split() { # Function to make an array with a variable
eval "$1=( \"$(echo "${!1}" | sed "s/$2/\" \"/g")\" )"
}
while read L
do
Split L ':' # Makes an array ( 0:Protocol 1:IP 2:Port(s) )
if [ ${L[0]} = tcp ]
then
IP="Logged this from host ${L[1]:2}" # :2 to remove the 2 slashes from host IP
P=${L[2]}
Split P ',' # makes an array with port numbers
if [ "$P" = "*" ] # case B
then echo "$IP"
elif I=$(expr index $P '-') # case D
then echo "$IP range ${P:0:$((I-1))} ${P:$I}"
else # case A & C
for ((i=0; i<${#P[@]}; i++))
do
echo "$IP port ${P[$i]}"
done
fi
else
echo "Other protocol : ${L[0]}"
fi
done < infile
for i in $(cat c2.txt)
do
#proto="$i" | cut -d ':' -f 1;
proto=${i:0:3}
echo $proto
<then above script here>
done
This is to take the first 3 chars, tcp in this case, in a variable - proto - and use this to customize the statements as -> Logged $proto request from host ...
My script snippet doesn't seem right though.
Update:
I know how to do it now. I used $1 to put the value 'tcp':
$ awk -F ":|/" '/\*$/ {printf "Logged "$1" from host %s\n",$4;next}
/,/ {split($5,a,",") ; for (i in a) printf "Logged "$1" from host %s port %s\n", $4, a;next}
/-/ {sub(/-/," ",$5) ; printf "Logged "$1" from host %s range %s\n",$4,$5;next}
{printf "Logged "$1" from host %s port %s\n",$4,$5}' logfile
Logged tcp request from host 1.2.3.4 port 80
Logged tcp request from host 1.2.3.4
Logged tcp request from host 1.2.3.4 port 80
Logged tcp request from host 1.2.3.4 port 8080
Logged tcp request from host 1.2.3.4 range 80 101
Best Regards.
---------- Post updated at 09:37 AM ---------- Previous update was at 05:50 AM ----------
Hi,
In this script:
$ awk -F ":|/" '/\*$/ {printf "Logged "$1" from host %s\n",$4;next}
A
/,/ {split($5,a,",") ; for (i in a) printf "Logged "$1" from host %s port %s\n", $4, a;next}
B
/-/ {sub(/-/," ",$5) ; printf "Logged "$1" from host %s range %s\n",$4,$5;next}
{printf "Logged "$1" from host %s port %s\n",$4,$5}' logfile
When there are several ports separated by comma (,), they are put as separate lines which is fine.
But I am now trying to check on A if the ports are in sequential order or not. For example,
tcp://1.2.3.4:20,21,22,23,100
. In this record, the ports are in sequence, and these should be placed as range, and the last port as a separate line i.e.
Logged tcp request from host 1.2.3.4 range 20 23
Logged tcp request from host 1.2.3.4 port 100
Can I use if statement in Line A in awk? or there is a better way to do this..
Best Regards..
---------- Post updated at 04:36 PM ---------- Previous update was at 09:35 AM ----------
---------- Post updated at 04:38 PM ---------- Previous update was at 04:36 PM ----------
Superb!
rdcwayx, earlier issues are resolved now. I made one change to the script. This will put appropriate protocol type tcp / udp as per the log:
#!/bin/bash
Split() { # Function to make an array with a variable
eval "$1=( \"$(echo "${!1}" | sed "s/$2/\" \"/g")\" )"
}
while read L
do
Split L ':' # Makes an array ( 0:Protocol 1:IP 2:Port(s) )
# if [ ${L[0]} = tcp ]
# then
IP=" Logged "${L[0]}" request from host ${L[1]:2}" # :2 to remove the 2 slashes from host IP
P=${L[2]}
Split P ',' # makes an array with port numbers
if [ "$P" = "*" ] # case B
then echo "$IP"
elif I=$(expr index $P '-') # case D
then echo "$IP range ${P:0:$((I-1))} ${P:$I}"
else # case A & C
for ((i=0; i<${#P[@]}; i++))
do
echo "$IP port ${P[$i]}"
done
fi
# else
# echo "Other protocol : ${L[0]}"
# fi
done < c2.txt
I noticed that when the ports are in sequence, for example -
tcp://1.2.3.4:20,21,22
or
in sequence but randomly placed, like -
tcp://1.2.3.4:20,22,21
these could be placed in a range statement, i.e.
right now they will be printed as
Logged from host 1.2.3.4 port 20
Logged from host 1.2.3.4 port 21
Logged from host 1.2.3.4 port 22
but how can we print them as -
Logged from host 1.2.3.4 range 20 22
.
Implying if the ports are in numerical sequence, separated by comma, and may or may not be in sequentially placed in log entry, they should be written by way of 'range' instead of on separate lines.