File modify

honey26 · May 14, 2015, 7:29am

Hi All

I am getting a file with below pattern -

00150366 05/08/2015 07:14:32
8000186167+++ 50195281000000000371001010903236
800186167+++ 100209000000000
800000018617+++ 50295281000000000371001010900217================================3u4398482344334=432434
00150367 05/08/2015 07:14:32
80009000001+++ 50195281000000000371001010903236
8000900000186167+++ 50695281000000000371001010953001Y
8000900000186167+++ 50295281000000000371001010900217=================
00150366 05/08/2015 07:28:32
8000900186167+++ 50195281000000000371001010903236
8000900000186167+++ 50695281000000000371001010953001Y
900000186167+++ 50295281000000000371001010900217=================

I have to sort a file based on the first column for ex - 00150366 in the bold pattern in all the file - remove if there is anything duplicate ... and keep the latest time and date format one .

and then remove the bold three columns and process the file as these columns are just for sorting - Every Bold Pattern will have 3 records inside .

If every record had these 3 columns in front of it then it is simple . But i am not understanding with all these spaces now .

Please let me know in case anyone knows if this is possible or not .

RudiC · May 14, 2015, 7:57am

Not sure I understand that spec. Try

paste -sd"\t\t\t\n" file | sort -k3,3 | tail -1 | tr '\t' '\n'
00150366 05/08/2015 07:28:32
8000900186167+++ 50195281000000000371001010903236
8000900000186167+++ 50695281000000000371001010953001Y
900000186167+++ 50295281000000000371001010900217=================

This would fail if e.g. the time range would span two or more days, if other than four lines per records were present, and maybe other reasons.

honey26 · May 14, 2015, 12:07pm

Hi Rudic

Thanks for your help

May be this will help
The below should be the output -

80009000001+++ 50195281000000000371001010903236
8000900000186167+++ 50695281000000000371001010953001Y
8000900000186167+++ 50295281000000000371001010900217=================
8000900186167+++ 50195281000000000371001010903236
8000900000186167+++ 50695281000000000371001010953001Y
900000186167+++ 50295281000000000371001010900217=================

basically sort out on the unique number 00150366 and remove if there are any duplicates for this unique and choose the latest one . and send all the unique ones . Removing these fields unique number and date time ....

RudiC · May 14, 2015, 12:38pm

Please use code tags as required by forum rules!

Try

paste -sd"\t\t\t\n" file | sort -k1,1 -k3,3r  | awk '!T[$1]++ {sub ($1 " " $2 " " $3 "\t", ""); gsub ("\t", "\n"); print}'
8000900186167+++ 50195281000000000371001010903236
8000900000186167+++ 50695281000000000371001010953001Y
900000186167+++ 50295281000000000371001010900217=================
80009000001+++ 50195281000000000371001010903236
8000900000186167+++ 50695281000000000371001010953001Y
8000900000186167+++ 50295281000000000371001010900217=================

honey26 · May 17, 2015, 2:38pm

Awesome it worked perfectly .. ....
Thanks a lot buddy

I will analyze how this is working . thanks again

honey26 · May 26, 2015, 6:35pm

Hi Rudic

The Time thing is working But the date thing is not working .
If the dates are different it is not picking up with the latest date .

Instead picking up with time for older date . Can you please help .

Don_Cragun · May 26, 2015, 9:28pm

RudiC already told you that that script would only work if all of the times for a given field 1 were on the same date. We would hope that you would have used his suggestion as a starting point and would have figured out how to extend it to work with multiple dates.

Since your sample data was all on one day, testing for anything else is just guess work. (I'm guessing that the sample data you provided is for May 8, 2015; not August 5, 2015. If I guessed incorrectly, you'll have to modify the sort to make the month a higher order sort key than the day.)

The following produces the correct sets of lines for you, but the order of the sets of lines is different than in your sample. You didn't specify any order for the output other than that the records selected of output had to be the most recent date and time for each key value. If you want a specific order for the keys, you should have said what that order should be. Try:

paste -sd"\t\t\t\n" file | sort -k2.7,2r -k2.1,2.5r -k3,3r  | \
    awk '!T[$1]++ {sub("[^\t]*\t", ""); gsub("\t", "\n"); print}

honey26 · June 6, 2015, 6:32am

Hi Don and Rudic thanks a lot . this is working now.