Sorting group of records and loading last record

patricjemmy6 · January 20, 2015, 7:58pm

Hi Everyone,
I have below record set. File is fixed widht file

101newjersyus 20150110
101nboston us 20150103
102boston   us 20140106
102boston   us 20140103

I need to group record based on first 3 letters in our case(101 and 102)
and sort last 8 digit in ascending order and print only last record
So output should be

101newjersyus 20150110
102boston   us 20140106

Chubler_XL · January 20, 2015, 11:30pm

How about this awk + sort solution

awk '
{ k=substr($0,1,3)
  v=substr($0,length-8)

  if(a[k]<v) {
     a[k]=v
     l[k]=$0
  }
}
END { for(k in a) print l[k] }' infile | sort

Don_Cragun · January 21, 2015, 2:05am

patricjemmy6:

Hi Everyone,
I have below record set. File is fixed widht file
101newjersyus 20150110
101nboston us 20150103
102boston   us 20140106
102boston   us 20140103
I need to group record based on first 3 letters in our case(101 and 102)
and sort last 8 digit in ascending order and print only last record
So output should be
101newjersyus 20150110
102boston   us 20140106

You said that the input file is fixed width, but as can clearly be seen above (now that CODE tags have been added), some lines are longer than others. If the lines were fixed length (or more importantly, if the date at the ends of these lines started in the same character position in all lines), Chubler_XL's awk | sort pipeline could be changed to a sort | awk pipeline with a slightly more complex sort command and a much simpler awk command. But, since your secondary sort key is not in a fixed field, and is not in a fixed position; that won't work.

RudiC · January 21, 2015, 5:21am

Were it a fixed width file, things would be much simpler. The following does work on your sample but will fail if positions of spaces and fields shift:

sort -t. -k1.1,1.3n -k1.14r file4 | sort -uk1.1,1.3
101newjersyus 20150110
102boston   us 20140106

Don_Cragun · January 21, 2015, 4:11pm

rudic:

Were it a fixed width file, things would be much simpler. The following does work on your sample but will fail if positions of spaces and fields shift:
sort -t. -k1.1,1.3n -k1.14r file4 | sort -uk1.1,1.3
101newjersyus 20150110
102boston   us 20140106

This will work on many systems. Unfortunately the standards are silent as to which line with identical keys will be printed by sort -u so there is no guarantee that it will always print the 1st line in each set of lines with identical strings in the 1st three characters of the line.

The following should work portably (as long as the 8 digit date string starts in column 13 or 14 and is preceded by a space if it starts in column 14):

sort -t. -k1.1,1.3n -k1.14r file|awk '(k=substr($0,1,3))!=K{print;K=k}'

But, of course, if you want to try this on a Solaris/SunOS system, you'd have to change awk to /usr/xpg4/bin/awk , /usr/xpg6/bin/awk , or nawk .