Delete Duplicates on the basis of two column values.

Hi All,
i need ti delete two duplicate processss which are running on the same device type (column 1) and port ID (column 2). here is the sample data

p1sc1m1 15517 11325  0 01:00:24 ?         0:00 scagntclsx25octtcp 2967 in3v mvmp01 0 8000 N S 969 750@751@752@
p1sc1m1 15519 11325  0 01:00:24 ?         0:00 scagntclsx25octtcp 2968 in3v mvmp02 0 8000 N S 970 750@751@752@
p1sc1m1 15522 11325  0 01:00:24 ?         0:00 scagntclsx25octtcp 2969 in3v mvmp01 0 8000 N S 971 750@751@752@
p1sc1m1 15544 11325  0 01:00:24 ?         0:00 scagntclsx25octtcp 2949 innv mvmp02 0 8000 N S 977 750@751@752@
p1sc1m1 15546 11325  0 01:00:24 ?         0:00 scagntclsx25octtcp 2956 innv mvmp03 0 8000 N S 978 750@751@752@
p1sc1m1 17445 11325  0 01:00:43 ?         0:00 scagntclsx25octtcp 2950 zin5 mvmp02 0 8000 N S 1384 750@751@752
p1sc1m1 17451 11325  0 01:00:43 ?         0:00 scagntclsx25octtcp 2957 zin5 mvmp03 0 8000 N S 1385 750@751@752
p1sc1m1 17475 11325  0 01:00:44 ?         0:00 scagntclsx25octtcp 2952 zt4v mvmp02 0 8000 N S 1391 750@751@752
p1sc1m1 17478 11325  0 01:00:44 ?         0:00 scagntclsx25octtcp 2959 zt4v mvmp03 0 8000 N S 1392 750@751@752
p1sc1m1 17481 11325  0 01:00:44 ?         0:00 scagntclsx25octtcp 2970 zt5v mvmp01 0 8000 N S 1393 750@751@752
p1sc1m1 17487 11325  0 01:00:44 ?         0:00 scagntclsx25octtcp 2960 zt6v mvmp01 0 8000 N S 1395 750@751@752
p1sc1m1 17489 11325  0 01:00:44 ?         0:00 scagntclsx25octtcp 2962 zt6v mvmp03 0 8000 N S 1396 750@751@752

The first and the third row should be deleted as the two column values are same(they are highlited in red).

Thanks for the help in advance.
Neeraj Vashishty

Assuming that ONLY 10th and 11th columns are considered to decide duplicates (No other column check)

awk 'NR==FNR{a[$10"_"$11]++;next;}{if(a[$10"_"$11] < 2) print $0}' inputFile inputFile
1 Like

Thanks Arung it is working , one more thing just i case i want to display these duplicate value and not to delete them , can you give me the command for the same ?

awk 'NR==FNR{a[$10$11]=$0;next}(a[$10$11]!=$0)' inputfile inputfile #prints out only duplicate
awk 'NR==FNR{a[$10$11]=$0;next}(a[$10$11]==$0)'  inputfile inputfile # prints out distinct lines i.e removes duplicate
awk 'NR==FNR{a[$10$11]=$0;next}{print a[$10$11]==$0?$0:$0"--dup"}' inputfile inputfile # prints out all lines with duplicate line appended with dup

If you want ONLY duplicates.

awk '{if(a[$10"_"$11]) print $0;a[$10"_"$11]=1}' inputFile

Thanks Anurag and Michael , its worked fin and i had a successfull deployment today :slight_smile: thanks for your help.