I have a pipe delimited file. Key is field 2, date is field 5 (as example, my real file is more complicated of course, but the KEY and DATE are accurate)
There can be duplicate rows for a key with different dates.
I need to keep only rows with latest date in this case.
Example data:
W|AAA|DD|D|20080101
W|BBB|CC|C|20080101
W|AAA|BB|B|20080201
W|CCC|DD|D|20080701
W|CCC|EE|E|20080801
W|AAA|DD|D|20081231
I would want to see:
W|AAA|DD|D|20081231
W|BBB|CC|C|20080101
W|CCC|EE|E|20080801
I want to use sort for this but am open to other options. I'm guessing awk could be involved, but I'm bad at writing awk.
I've searched but didn't find anything that seemed to match.
Ideas?
edit to add a little more info:
It could be that I have rows with same key and same date. In that case, I'd prefer to take the one that is last in the file (because of differences in the other fields of the rows and how the file gets built) -- but if that is not possible I understand.
I made something that is working on my small test file (the real file has too much data to really hand check) - I'd appreciate if someone could take a look and critique/agree/tell me I'm all wet.