Merge lines with varying characters

Hi, I have a large set of data (firewall logs) that I'm trying to summarize. I've been able to write a script to consolidate the ports, now am looking to conslidate even further, based on IP.

Source Destination Type Port
192.168.5.108 192.168.11.12 TCP 1, 2, 3, 4, 5, 15
192.168.5.109 192.168.11.12 TCP 6, 7, 8, 9, 10, 11
192.168.5.110 192.168.11.12 TCP 12, 13
192.168.6.23 192.168.11.12 TCP 14, 15
192.168.5.108 192.168.11.13 TCP 10, 12, 13, 14, 15, 5
192.168.5.109 192.168.11.13 TCP 16, 17, 18, 19, 110, 111
192.168.5.110 192.168.11.13 TCP 112, 113
192.168.6.108 192.168.11.14 TCP 20, 22, 23, 24, 25, 6
192.168.6.109 192.168.11.14 TCP 26, 27, 28, 29, 210, 211
192.168.7.110 192.168.11.14 TCP 212, 213
192.168.6.23 192.168.11.14 TCP 214, 215

I'd like to script it so that the output would group all the source IP's, and their destination ports, going to the same destination IP:
SourceIP1,IP2,IP3,IP4 TCP DestinationIP DestinationPort1,P2,P3,P4,P5,P6......

example, the first destination of 192.168.11.12 would be summarized to look like so:

192.168.5.108,192.168.5.109,192.168.5.110,192.168.5.23 192.168.11.12 TCP 1, 2, 3, 4, 5, 15, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15

Any help would be greatly appreciated!

An awk approach:-

awk '
        {
                port = $0
                sub(/.*[a-zA-Z]+ /,x,port)
                if ( ! ( ( $2 FS $1 ) in A_S_IP ) )
                        A_R_S_IP[$2] = A_R_S_IP[$2] ? A_R_S_IP[$2] FS $1 : $1
                if ( ! ( ( $2 FS $3 ) in A_TYPE ) )
                        A_R_TYPE[$2] = A_R_TYPE[$2] ? A_R_TYPE[$2] FS $3 : $3
                if ( ! ( ( $2 FS port ) in A_PORT ) )
                        A_R_PORT[$2] = A_R_PORT[$2] ? A_R_PORT[$2] ", " port : port

                A_D_IP[$2]
                A_S_IP[$2 FS $1]
                A_TYPE[$2 FS $3]
                A_PORT[$2 FS port]
        }
        END {
                for ( k in A_D_IP )
                        print A_R_S_IP[k], k, A_R_TYPE[k], A_R_PORT[k]
        }
' file

Why don't you work immediately on the input file with the structure of your recent post

awk '
NR == 1 {print
         next
        }

        {IX = $2 FS $3
         if (!CT[$1 FS $2 FS $3]++) a[IX] = a[IX]?a[IX] "," $1:$1
         b[IX] = b[IX]?b[IX] "," $4:$4
        }

END     {for (i in a) print a FS i FS b
        }
'  file
Source Destination Type Port
192.168.5.108,192.168.5.109,192.168.5.110,192.168.6.23 192.168.11.12 TCP 1,2,3,4,5,15,6,7,8,9,10,11,12,13,14,15
192.168.5.108,192.168.5.109,192.168.5.110 192.168.11.13 TCP 10,12,13,14,15,5,16,17,18,19,110,111,112,113
192.168.6.108,192.168.6.109,192.168.7.110,192.168.6.23 192.168.11.14 TCP 20,22,23,24,25,6,26,27,28,29,210,211,212,213,214,215

There may be duplicate ports in the output which are not eliminated.

Thanks Rudi.

This somewhat works. The output I get combines the source IPs just fine, however the destination ports are incomplete.

192.168.5.108 192.168.11.12 TCP 1, 2, 3, 4, 5, 15
192.168.5.109,192.168.5.110,192.168.6.23 192.168.11.12 TCP 6,,12,,14,
192.168.5.108,192.168.5.109,192.168.5.110 192.168.11.13 TCP 10,,16,,112,
192.168.6.108,192.168.6.109,192.168.7.110,192.168.6.23 192.168.11.14 TCP 20,,26,,212,,214,

If you change the line:

         b[IX] = b[IX]?b[IX] "," $4:$4

in RudiC's script to:

	 gsub(/, /, ",")
         b[IX] = b[IX]?b[IX] "," $4:$4

I think you'll get something closer to what you wanted.

thanks! that made the output much better. Is there a way to tag a thread as "solved"?

I'm glad that RudiC's suggestion and my minor tweak helped you get the output you wanted.

If you look at the tags attached to this thread at the top of this thread, you'll note that there are two tags already attached AND there is the following note:

So, just click on the words Edit Tags at the top right corner of the tags associated with this thread, and add the tag solved to mark the thread as solved.

1 Like