ksh scripting: Extract 1 most recent record for unique key

I'm loading multiple delimited files into an Oracle DB using sqlldr on Unix. I would like to get only the most recent record per each unique key. There may be multiple updates for each key, but I only want the most recent one.

There is a date column in my delimited files, so I'm using cat to combine multiple files into one file, then I sort that file in descending order by date, using the unix sort command.

When I load with sqlldr, I have a unique key restraint on my Oracle table, so I'm grabbing the first occurrence for each key (the most recent, since they're in *descending* order by date) and throwing away the rest (the earlier out-of-date ones), but it seems to me that it's very inefficient, and it creates a huge amount of "errors" in my sqlldr log file (I don't want to use the SILENT=errors option, or put my errors in /dev/null, since there may be genuine errors that I don't want to miss.)

Is there some way to get those "first" (most recent) occurrences using Unix scripting, instead of doing it in sqlldr?

Thanks,
OP

Well, you didn't supply any sample data so it's difficult to give you a specific answer, but presuming the second field is the unique key, and presuming the fields are separated by white space, this will output only the first record for each key:

awk '$2 in a { next } { a[$2]; print }' inputfile

Thanks for the help.

This is the sort instruction I use in my .ksh script (sort by 4th column in descending order):

sort -r -t! +3 -4 unsorted_file.txt -o sorted_file.txt

Here is some mocked up data. This is what it looks like after sorting:

19127171 ! SUCCEEDED ! ACTIVITY-4 ! 2008-10-31 05:58:48.820 ! XYZ !
19127171 ! INPROGRESS ! ACTIVITY-4 ! 2008-10-31 05:58:27.355 ! ABC !
19127171 ! SUCCEEDED ! ACTIVITY-3 ! 2008-10-31 05:58:26.308 ! JKL !
19127171 ! INPROGRESS ! ACTIVITY-3 ! 2008-10-31 05:58:12.482 ! TUV !
19127171 ! SUCCEEDED ! ACTIVITY-2 ! 2008-10-31 05:58:04.668 ! LMN !
19127171 ! INPROGRESS ! ACTIVITY-4 ! 2008-10-31 05:58:03.355 ! FGH !
19127171 ! INPROGRESS ! ACTIVITY-4 ! 2008-10-31 05:57:43.355 ! BCD !
19127171 ! INPROGRESS ! ACTIVITY-3 ! 2008-10-31 05:57:32.381 ! WXY !
19127171 ! SUCCEEDED ! ACTIVITY-1 ! 2008-10-31 05:57:31.266 ! PQR !
19127171 ! INPROGRESS ! ACTIVITY-2 ! 2008-10-31 05:57:21.717 ! PBJ !

The first three columns are my unique key. So, for ACTIVITY-4, I want only the most recent "INPROGRESS" record (line 2), as well as the "SUCCEEDED" record (line 1). The previous "INPROGRESS" records (lines 6,7) can be thrown away. Similarly, for "ACTIVITY-3", keep lines 3 & 4, throw away line 8.

Thanks!
OP