Search and combine fields

Hi all,

  1. I have a log file
    2011/11/14 00:42:50 | 38:guess pid=008499 opened Testing 0, 1, 2, 3
    2011/11/14 11:43:42 | 38:guess pid=008499 closed
    2011/11/14 11:47:08 | 39:guess pid=017567 opened Testing 0, 1, 2, 3
    2011/11/14 11:47:08 | 40:guess pid=012780 opened Testing 0, 1, 2, 3
    2011/11/14 12:32:34 | 40:guess pid=012780 closed
    2011/11/14 12:39:05 | 41:guess pid=016015 opened Testing 0, 1, 2, 3

How do i create a ksh shell so it can get me a new file which combine those fields with the same pid number? Thanks.

2011/11/14 00:42:50 |   38:guess   pid=008499 opened Testing 0, 1, 2, 3 | 2011/11/14 11:43:42 |   38:guess   pid=008499 closed
2011/11/14 11:47:08 |   40:guess   pid=012780 opened Testing 0, 1, 2, 3 | 2011/11/14 12:32:34 |   40:guess   pid=012780 closed
  1. If the last fields is opened , can we add a current time stamp? must be last line only.
2011/11/14 00:42:50 |   38:guess   pid=008499 opened Testing 0, 1, 2, 3 |2011/11/14 11:43:42 |   38:guess   pid=008499 closed
2011/11/14 11:47:08 |   40:guess   pid=012780 opened Testing 0, 1, 2, 3 |2011/11/14 12:32:34 |   40:guess   pid=012780 closed
2011/11/14 12:39:05 |   41:guess   pid=016015 opened Testing 0, 1, 2, 3 |current date time | still running

This should do it:

awk -v now="$(date +"%Y/%m/%d %T")" '
{
  out[$5]=out[$5] " |" $0
}
END {
  for(var in out)
  {
    l=out[var];
    if(!match(l,"closed"))
    {
      l=l " |" now " | still running"
    }
    printf("%s\n",l);
  }
}
' logfile | sort 
awk '!a[$5]++&&/opened/{b[$5]=$0}a[$5]&&/closed/{print b[$5] OFS $0;delete b[$5]}
    END{for (i in b) print b, strftime( "%Y/%m/%d %H:%M:%S" ,systime()) " | still running"}' OFS=\| infile

2011/11/14 00:42:50 | 38:guess pid=008499 opened Testing 0, 1, 2, 3|2011/11/14 11:43:42 | 38:guess pid=008499 closed
2011/11/14 11:47:08 | 40:guess pid=012780 opened Testing 0, 1, 2, 3|2011/11/14 12:32:34 | 40:guess pid=012780 closed
2011/11/14 11:47:08 | 39:guess pid=017567 opened Testing 0, 1, 2, 3|2011/11/15 10:58:37 | still running
2011/11/14 12:39:05 | 41:guess pid=016015 opened Testing 0, 1, 2, 3|2011/11/15 10:58:37 | still running

Looks like compactness matters :wink: so here is less readable still version of my script (with as small correction to adjust the OP output format):

awk -v now="$(date +"%Y/%m/%d %T")" '{o[$5]=o[$5]"|"$0}
END{for(v in o){l=substr(o[v],2);if(!match(l,"closed")){l=l"|"now" | still running"}print l}}' logfile | sort

2011/11/14 00:42:50 | 38:guess pid=008499 opened Testing 0, 1, 2, 3|2011/11/14 11:43:42 | 38:guess pid=008499 closed
2011/11/14 11:47:08 | 39:guess pid=017567 opened Testing 0, 1, 2, 3|2011/11/15 02:24:45 | still running
2011/11/14 11:47:08 | 40:guess pid=012780 opened Testing 0, 1, 2, 3|2011/11/14 12:32:34 | 40:guess pid=012780 closed
2011/11/14 12:39:05 | 41:guess pid=016015 opened Testing 0, 1, 2, 3|2011/11/15 02:24:45 | still running

Edit: I suspect I missed the "must be last line only" requirement though.

Thanks, i will try it