Remove lines of the same time stamp leaving the highest

Hi guys,

I have a log that looks like that below. Columns 2 3 4 5 6 7 is the date/time stamp separated by comma.

UUU,02,06,2010,10,00,00,00,0000000000000000,0000000000000000,0000000000001224
UUU,02,06,2010,10,05,00,00,0000000000000000,0000000000000000,0000000000001502
UUU,02,06,2010,10,10,00,00,0000000000000000,0000000000000000,0000000000001514
UUU,02,06,2010,10,10,00,00,0000000000000000,0000000000000000,0000000000001305
UUU,02,06,2010,10,10,00,00,0000000000000000,0000000000000000,0000000000001456
UUU,02,06,2010,10,15,00,00,0000000000000000,0000000000000000,0000000000001324

The problem is it can often contain entries for the same time stamp, I want to remove entries with the same time and just leave one entry that has the highest number in column 11.

So the above log would be outputed as:

UUU,02,06,2010,10,00,00,00,0000000000000000,0000000000000000,0000000000001224
UUU,02,06,2010,10,05,00,00,0000000000000000,0000000000000000,0000000000001502
UUU,02,06,2010,10,10,00,00,0000000000000000,0000000000000000,0000000000001514
UUU,02,06,2010,10,15,00,00,0000000000000000,0000000000000000,0000000000001324

Anyone know how to do it, it would be nice to be done in perl.

Dunno perl, but hope this awk script helps. I tested it against your input and it worked.
In the code below, replace the /tmp/input.data file to where your input data file points to.

awk -F',' '{
  ts=$2","$3","$4","$5","$6","$7;
  if( x[ts] == "" ) {
    x[ts] = $0
  }
  else  {
    split(x[ts],f11)
    if( $11 > f11[11] ) {
      x[ts] = $0
    }
  }
}
END {
  for(i in x)  {
    print x
  }
}' /tmp/input.data

Let's try another solution:

awk -F, 'function X(){return substr($0,5,19)};function Y(){z=$NF;Z[y]=$0}END{print Z[y]}NR==1{y=X();Y()}{x=X();if(x!=y){print Z[y];y=x;Y()}}'  infile

Use gawk, nawk or /usr/xpg4/bin/awk on Solaris.

---------- Post updated at 10:51 PM ---------- Previous update was at 08:15 PM ----------

linuxpenguin you are +54