Bash/shell merge similar lines

Hello,

I've been working on a bash script to parse through firewall logs (cisco). I'm nearing the end and have a dilemma.

My data looks as such (actual data is several gigs worth of logs - without the headers):
sourceIP destinationIP destinationProtocol destinationPort

1.1.1.1   2.2.2.2         TCP                       22
1.1.1.1   2.2.2.2         TCP                       31
1.1.1.1   2.2.2.2         TCP                       45
1.1.1.1   2.2.2.2         TCP                       67
1.1.1.3   2.2.2.2         TCP                       22
1.1.1.3   2.2.2.2         TCP                       89
1.1.1.3   2.2.2.2         TCP                       78
1.1.1.1   2.2.2.3         TCP                       78
1.1.1.1   2.2.2.3         TCP                       79

I would like to script it so that the ports are put on a single line for matching IPs, like so:
sourceIP destinationIP destinationProtocol destinationPort

1.1.1.1   2.2.2.2         TCP                       22, 31, 45, 67
1.1.1.3   2.2.2.2         TCP                       22, 89, 78
1.1.1.1   2.2.2.3         TCP                       78, 79

Would awk or sed be able to do what I'm looking for? How?

Any help would be much appreciated.

something along these lines - a bit succinct on explanation, but... :

awk '
  # idx is an index constructed by concatenating first 3 fields in the line
  {idx=$1 FS $2 FS $3} 

  # a - an array indexed by 'idx'.
  # if idx is in a, add forth field to it; if not, assign forth field as the first entry in array a
  {a[idx]=(idx in a)?a[idx] OFS $4:$4} 

  # after all records/lines have been process...
  # iterate through all the indecies in array a, print out the value of the index and
  # the corresponding value stored in the array with the current index.
  END {for(i in a) print i FS a}' OFS=, myInputFileGoesHere
1 Like

cool, that works! mind if you give a quick explanation of what the command is doing?