Hi,
I would like to sort a list in different ways:
1> Unique based on Field 1 with highest Field 4
For Instance Input:
1678923450;11112222333344;11-1x;2_File.xml
1678923450;11112222333344;11-1x;5_File.xml
1234567890;11113333222244;11-1x;3_File.xml
Output:
1234567890;11113333222244;11-1x;3_File.xml
1678923450;11112222333344;11-1x;5_File.xml
2> Unique based on Field 2, then Field 1 with highest Field 4
For Instance Input:
1234567890;11113333222244;11-1x;3_File.xml
1234567890;11112222333344;11-1x;1_File.xml
1678923450;11112222333344;11-1x;2_File.xml
Output:
1234567890;11112222333344;11-1x;1_File.xml
1678923450;11112222333344;11-1x;2_File.xml
1234567890;11113333222244;11-1x;3_File.xml
1>
sort -t\; -k1,1 -k4,4d infile | awk -F\; '!($1 in A) { print $0; A[$1] }'
2>
sort -t\; -k1,1 -k2,2 -k4,4d infile | awk -F\; '!($1$2 in A) { print $0; A[$1$2] }'
Thanks for the reply.
1> Sorry but got this Code Output:
1234567890;11113333222244;11-1x;3_File.xml
1678923450;11112222333344;11-1x;2_File.xml
2> Code Output as expected.
Can you please explain the awk logic you've used so I can modify further for my reports?
Sorry, should have checked my sort parameters the 4th field was supposed to be sorted in descending order to do the sort needs the r flag (not d) so your correct answers are:
sort -t\; -k1,1 -k4,4r infile | awk -F\; '!($1 in A) { print $0; A[$1] }'
sort -t\; -k1,1 -k2,2 -k4,4d infile | awk -F\; '!($1$2 in A) { print $0; A[$1$2] }'
For the awk logic:
-F\;
use semicolon as the field separator, this allows awk to split the line up into $1, $2 ... $NF fields
!($1$2 in A)
means only process lines where the value of field1 concatenated with field2 (your unique key) is not already in the array A
print $0
print the entire line
So in prints the first line where each unique key value occurs.
A[$1$2]
add value field1 concatenated with field2 to the array A
1 Like
Thanks.
Is there a way I can add a Counter Field at the end of each line. This Counter will keep a track of each occurrence of 1st Field.
For instance:
1234567890;11113333222244;11-1x;3_File.xml;Try-1
1234567890;11112222333344;11-1x;1_File.xml;Try-2
1678923450;11112222333344;11-1x;2_File.xml;Try-1
Use something like this, using and array named C[] to keep track of your counter.
sort -t\; -k1,1 -k2,2r -k4,4r infile | awk -F\; '!($1$2 in A) { print $0 ";Try-" ++C[$1]; A[$1$2] }'
1 Like
That worked perfectly. Thanks a lot!
Hey,
Quick query.
If I have 2 Files - File1 (2M Lines) & File2 (5K Lines) & I want to find & replace the 5K Lines from File2 in File1 - What's the fastest way?
I'm currently doing it using sed which takes ages.
File1 (2M Lines)
A1;B1;C1;D1;E1
A2;B2;C2;D2;E2
A3;B3;C3;D3;E3
File 2 (5K Lines)
A2;B2;F2:G2
A4;B4;F4;G4
Pick Line by Line from File2, Find in File1, then Delete that Line in File1 & add corresponding Line from File2 to File1
Merged & Replaced Output (2M Lines)
A1;B1;C1;D1;E1
A2;B2;F2:G2
A3;B3;C3;D3;E3
---------- Post updated at 03:03 PM ---------- Previous update was at 02:57 PM ----------
I couldn't go with join since it checks only 1 column. I need to check First 2 Fields from File2 in File1.