Bash/shell merge similar lines

umang2382 · February 17, 2016, 6:43pm

Hello,

I've been working on a bash script to parse through firewall logs (cisco). I'm nearing the end and have a dilemma.

My data looks as such (actual data is several gigs worth of logs - without the headers):
sourceIP destinationIP destinationProtocol destinationPort

1.1.1.1   2.2.2.2         TCP                       22
1.1.1.1   2.2.2.2         TCP                       31
1.1.1.1   2.2.2.2         TCP                       45
1.1.1.1   2.2.2.2         TCP                       67
1.1.1.3   2.2.2.2         TCP                       22
1.1.1.3   2.2.2.2         TCP                       89
1.1.1.3   2.2.2.2         TCP                       78
1.1.1.1   2.2.2.3         TCP                       78
1.1.1.1   2.2.2.3         TCP                       79

I would like to script it so that the ports are put on a single line for matching IPs, like so:
sourceIP destinationIP destinationProtocol destinationPort

1.1.1.1   2.2.2.2         TCP                       22, 31, 45, 67
1.1.1.3   2.2.2.2         TCP                       22, 89, 78
1.1.1.1   2.2.2.3         TCP                       78, 79

Would awk or sed be able to do what I'm looking for? How?

Any help would be much appreciated.

vgersh99 · February 17, 2016, 6:55pm

something along these lines - a bit succinct on explanation, but... :

awk '
  # idx is an index constructed by concatenating first 3 fields in the line
  {idx=$1 FS $2 FS $3} 

  # a - an array indexed by 'idx'.
  # if idx is in a, add forth field to it; if not, assign forth field as the first entry in array a
  {a[idx]=(idx in a)?a[idx] OFS $4:$4} 

  # after all records/lines have been process...
  # iterate through all the indecies in array a, print out the value of the index and
  # the corresponding value stored in the array with the current index.
  END {for(i in a) print i FS a}' OFS=, myInputFileGoesHere

umang2382 · February 17, 2016, 7:19pm

cool, that works! mind if you give a quick explanation of what the command is doing?