Delete Duplicates on the basis of two column values.

neeraj617 · January 6, 2011, 7:15am

Hi All,
i need ti delete two duplicate processss which are running on the same device type (column 1) and port ID (column 2). here is the sample data

p1sc1m1 15517 11325  0 01:00:24 ?         0:00 scagntclsx25octtcp 2967 in3v mvmp01 0 8000 N S 969 750@751@752@
p1sc1m1 15519 11325  0 01:00:24 ?         0:00 scagntclsx25octtcp 2968 in3v mvmp02 0 8000 N S 970 750@751@752@
p1sc1m1 15522 11325  0 01:00:24 ?         0:00 scagntclsx25octtcp 2969 in3v mvmp01 0 8000 N S 971 750@751@752@
p1sc1m1 15544 11325  0 01:00:24 ?         0:00 scagntclsx25octtcp 2949 innv mvmp02 0 8000 N S 977 750@751@752@
p1sc1m1 15546 11325  0 01:00:24 ?         0:00 scagntclsx25octtcp 2956 innv mvmp03 0 8000 N S 978 750@751@752@
p1sc1m1 17445 11325  0 01:00:43 ?         0:00 scagntclsx25octtcp 2950 zin5 mvmp02 0 8000 N S 1384 750@751@752
p1sc1m1 17451 11325  0 01:00:43 ?         0:00 scagntclsx25octtcp 2957 zin5 mvmp03 0 8000 N S 1385 750@751@752
p1sc1m1 17475 11325  0 01:00:44 ?         0:00 scagntclsx25octtcp 2952 zt4v mvmp02 0 8000 N S 1391 750@751@752
p1sc1m1 17478 11325  0 01:00:44 ?         0:00 scagntclsx25octtcp 2959 zt4v mvmp03 0 8000 N S 1392 750@751@752
p1sc1m1 17481 11325  0 01:00:44 ?         0:00 scagntclsx25octtcp 2970 zt5v mvmp01 0 8000 N S 1393 750@751@752
p1sc1m1 17487 11325  0 01:00:44 ?         0:00 scagntclsx25octtcp 2960 zt6v mvmp01 0 8000 N S 1395 750@751@752
p1sc1m1 17489 11325  0 01:00:44 ?         0:00 scagntclsx25octtcp 2962 zt6v mvmp03 0 8000 N S 1396 750@751@752

The first and the third row should be deleted as the two column values are same(they are highlited in red).

Thanks for the help in advance.
Neeraj Vashishty

anurag.singh · January 6, 2011, 7:28am

Assuming that ONLY 10th and 11th columns are considered to decide duplicates (No other column check)

awk 'NR==FNR{a[$10"_"$11]++;next;}{if(a[$10"_"$11] < 2) print $0}' inputFile inputFile

neeraj617 · January 6, 2011, 7:43am

Thanks Arung it is working , one more thing just i case i want to display these duplicate value and not to delete them , can you give me the command for the same ?

michaelrozar17 · January 6, 2011, 8:02am

awk 'NR==FNR{a[$10$11]=$0;next}(a[$10$11]!=$0)' inputfile inputfile #prints out only duplicate

awk 'NR==FNR{a[$10$11]=$0;next}(a[$10$11]==$0)'  inputfile inputfile # prints out distinct lines i.e removes duplicate

awk 'NR==FNR{a[$10$11]=$0;next}{print a[$10$11]==$0?$0:$0"--dup"}' inputfile inputfile # prints out all lines with duplicate line appended with dup

anurag.singh · January 6, 2011, 8:02am

If you want ONLY duplicates.

awk '{if(a[$10"_"$11]) print $0;a[$10"_"$11]=1}' inputFile

neeraj617 · January 7, 2011, 5:13am

Thanks Anurag and Michael , its worked fin and i had a successfull deployment today thanks for your help.