Remove duplicate lines (the first matching line by field criteria)

joggdial3000 · May 3, 2010, 10:20am

Hello to all,

I have this file

2002     1       23      0       0       2435.60         131.70   5.60   20.99    0.89      0.00         285.80  2303.90
2002     1       23      15      0       2436.60         132.90   6.45   21.19    1.03      0.00         285.80  2303.70
2002     1       23      30      0       2438.10         134.90   7.20   21.50    1.15      0.00         285.80  2303.20
2002     1       23      45      0       2437.85         134.65  11.64   21.47    1.86      0.00         285.80  2303.20
2002     2       0       0       0       2437.60         134.60  14.80   21.46    2.36      0.00         285.80  2303.00
2002     2       0       0       0       2442.70         139.70  16.00   22.27    2.55      0.00         285.80  2303.00
2002     2       0       15      0       2442.50         139.70  14.40   22.27    2.30      0.00         285.80  2302.80
2002     2       0       30      0       2442.30         139.70  12.60   22.27    2.01      0.00         285.80  2302.60
2002     2       0       45      0       2442.55         140.15  11.20   22.34    1.79      0.00         285.80  2302.40
2002     2       1       0       0       2443.30         141.40   9.60   22.54    1.53      0.00         285.80  2301.90
2002     2       1       15      0       2443.85         141.95   9.11   22.63    1.45      0.00         285.80  2301.90

and I want to remove the first line where the 4th column match, like this:

2002     1       23      0       0       2435.60         131.70   5.60   20.99    0.89      0.00         285.80  2303.90
2002     1       23      15      0       2436.60         132.90   6.45   21.19    1.03      0.00         285.80  2303.70
2002     1       23      30      0       2438.10         134.90   7.20   21.50    1.15      0.00         285.80  2303.20
2002     1       23      45      0       2437.85         134.65  11.64   21.47    1.86      0.00         285.80  2303.20
2002     2       0       0       0       2442.70         139.70  16.00   22.27    2.55      0.00         285.80  2303.00
2002     2       0       15      0       2442.50         139.70  14.40   22.27    2.30      0.00         285.80  2302.80
2002     2       0       30      0       2442.30         139.70  12.60   22.27    2.01      0.00         285.80  2302.60
2002     2       0       45      0       2442.55         140.15  11.20   22.34    1.79      0.00         285.80  2302.40
2002     2       1       0       0       2443.30         141.40   9.60   22.54    1.53      0.00         285.80  2301.90
2002     2       1       15      0       2443.85         141.95   9.11   22.63    1.45      0.00         285.80  2301.90

I'v tried uniq command

uniq -w 15 filename

and AWK

awk '!_[$1,$2,$3,$4]++' filename

but both remove the second line of match criteria not the first.

thanks for any help.

vidyadhar85 · May 3, 2010, 10:45am

try this...

awk '{A[$4]=$0}END{for (i in A){print A}}' filename

joggdial3000 · May 3, 2010, 10:55am

thanks for the quick answer but the output is not the expected:

2002     2       0       45      0       2442.55         140.15  11.20   22.34    1.79      0.00         285.80  2302.40
2002     2       0       30      0       2442.30         139.70  12.60   22.27    2.01      0.00         285.80  2302.60
2002     2       1       0       0       2443.30         141.40   9.60   22.54    1.53      0.00         285.80  2301.90
2002     2       1       15      0       2443.85         141.95   9.11   22.63    1.45      0.00         285.80  2301.90

I want to remove just the lines where the 4th column have consecutive values not all the others.

vidyadhar85 · May 3, 2010, 11:02am

Just altering the awk what u tried..

awk '{A[$1,$2,$3,$4]=$0]}END{for (i in A){print A}}' filename

joggdial3000 · May 3, 2010, 11:13am

thanks, i think it worked but with unsorted output

2002     2       1       15      0       2443.85         141.95   9.11   22.63    1.45      0.00         285.80  2301.90
2002     2       0       45      0       2442.55         140.15  11.20   22.34    1.79      0.00         285.80  2302.40
2002     1       23      45      0       2437.85         134.65  11.64   21.47    1.86      0.00         285.80  2303.20
2002     2       0       0       0       2442.70         139.70  16.00   22.27    2.55      0.00         285.80  2303.00
2002     2       1       0       0       2443.30         141.40   9.60   22.54    1.53      0.00         285.80  2301.90
2002     2       0       30      0       2442.30         139.70  12.60   22.27    2.01      0.00         285.80  2302.60
2002     1       23      30      0       2438.10         134.90   7.20   21.50    1.15      0.00         285.80  2303.20
2002     1       23      0       0       2435.60         131.70   5.60   20.99    0.89      0.00         285.80  2303.90
2002     2       0       15      0       2442.50         139.70  14.40   22.27    2.30      0.00         285.80  2302.80
2002     1       23      15      0       2436.60         132.90   6.45   21.19    1.03      0.00         285.80  2303.70

I think I can manage this to get a sorted output like described above

thanks a lot.

ahmad.diab · May 3, 2010, 11:16am

perl  -wlane '$h{"@F[0..3]"}=$_ ; END{$,="\n" ; print sort values %h}' infile.txt

or using nawk:-

nawk '{_[$1,$2,$3,$4]=$0}END{for (i in _) print _}' infile.txt | sort

joggdial3000 · May 3, 2010, 11:18am

just pipping a sort solved the problem,

awk '{A[$1$2$3$4]=$0}END{for (i in A){print A}}' filename | sort

thanks a lot, it worked like a charm

---------- Post updated at 04:18 PM ---------- Previous update was at 04:17 PM ----------

thank you both