Hi,
This is a followup to my earlier post
him mno klm 20 76 . + . klm_mango unix_00000001;
alp fdc klm 123 456 . + . klm_mango unix_0000103;
her tkr klm 415 439 . + . klm_mango unix_00001043;
abc tvr klm 20 76 . + . klm_mango unix_00000001;
abc def klm 83 84 . + . klm_mango unix_0000103;
abc def klm 83 84 . + . klm_mango unix_1233333;
abc def klm 83 84 . + . klm_mango unix_845454;
abc def klm 83 84 . + . klm_mango unix_7875654;
abc def klm 83 84 . + . klm_mango unix_8784552;
Now, I want to delete all the duplicate records by excluding the match on the last column. But, the first record of the duplicate rows should be considered and printed.
So, my output will be
him mno klm 20 76 . + . klm_mango unix_00000001;
alp fdc klm 123 456 . + . klm_mango unix_0000103;
her tkr klm 415 439 . + . klm_mango unix_00001043;
abc tvr klm 20 76 . + . klm_mango unix_00000001;
abc def klm 83 84 . + . klm_mango unix_0000103;
birei
April 3, 2012, 11:43am
3
Hi jacobs.smith,
One way with perl:
$ cat infile
him mno klm 20 76 . + . klm_mango unix_00000001;
alp fdc klm 123 456 . + . klm_mango unix_0000103;
her tkr klm 415 439 . + . klm_mango unix_00001043;
abc tvr klm 20 76 . + . klm_mango unix_00000001;
abc def klm 83 84 . + . klm_mango unix_0000103;
abc def klm 83 84 . + . klm_mango unix_1233333;
abc def klm 83 84 . + . klm_mango unix_845454;
abc def klm 83 84 . + . klm_mango unix_7875654;
abc def klm 83 84 . + . klm_mango unix_8784552;
$ cat myscript.pl
use warnings;
use strict;
my %duplicate;
while ( <> ) {
chomp;
my @f = split;
if ( ++$duplicate{ join qq[], @f[ 0..($#f-1) ] } == 1 ) {
printf qq[%s\n], $_;
}
}
$ perl myscript.pl infile
him mno klm 20 76 . + . klm_mango unix_00000001;
alp fdc klm 123 456 . + . klm_mango unix_0000103;
her tkr klm 415 439 . + . klm_mango unix_00001043;
abc tvr klm 20 76 . + . klm_mango unix_00000001;
abc def klm 83 84 . + . klm_mango unix_0000103;
Thanks bartus, but it is printing the whole file again. It is not removing the duplicates.
Toiday
April 24, 2012, 2:42pm
6
awk '{if (Previous_Line!=$1$2$3$4$5) print; Previous_Line=$1$2$3$4$5}' file
This is assuming the duplicated lines are already together like you have in your example. Otherwise, sort the file then pipe to the awk statement.