Removing duplicate lines on first column based with pipe delimiter

Hi,

I have tried to remove dublicate lines based on first column with pipe delimiter . but i ma not able to get some uniqu lines

Command :

sort -t'|' -nuk1 file.txt

Input :

38376KZ|09/25/15|1.057
38376KZ|09/25/15|1.057
02006YB|09/25/15|0.859
12593PS|09/25/15|2.803
14041NL|09/25/15|1.415
02006JAB|09/25/15|0.214

Output;

38376KZ|09/25/15|1.057
12593PS|09/25/15|2.803
14041NL|09/25/15|1.415

But the output should be :

38376KZ|09/25/15|1.057
12593PS|09/25/15|2.803
14041NL|09/25/15|1.415
02006JAB|09/25/15|0.214

Can you please help me on thi why it is not working ?

Please use code tags as required by forum rules!

I'm getting

02006YB|09/25/15|0.859
12593PS|09/25/15|2.803
14041NL|09/25/15|1.415
38376KZ|09/25/15|1.057

The fourth line in your desired output is a duplicate to the third input line due to the -n option.
A possible reason for the missing line is you have a <CR> char as a line terminator somewhere that causes overwriting a line.

1 Like

Hello parithi06,

Please use code tags while using commands/codes/Inputs in your posts as per forum rules. Now for your requirement, you have told us like you want to sort the file as per 1st column then if this is the case then output must be having line 02006YB|09/25/15|0.859 too, if this is NOT the case then please do let us know requirement more clearly. Following may help you in same.

awk 'FNR==NR{A[$1]=$0;next} ($1 in A){print $0;delete A[$1]}' Input_file Input_file

Output will be as follows.

38376KZ|09/25/15|1.057
02006YB|09/25/15|0.859
12593PS|09/25/15|2.803
14041NL|09/25/15|1.415
02006JAB|09/25/15|0.214
 

Thanks,
R. Singh

1 Like