How to remove duplicate lines of a record without changing the order

abhi.roy03 · August 2, 2008, 6:57am

Hi all,
I have to remove duplicate lines in a file without chainging the order.for eg if i have a record

pqr
def
abc
lmn
pqr
abc
mkh
hgf

the output should be
pqr
def
abc
lmn
mkh
hgf

Plz help me.It is urgent

Abhishek

radoulov · August 2, 2008, 7:18am

Is this a homework?
Try searching the forums.

redoubtable · August 2, 2008, 5:42pm

#!/usr/bin/perl -w

while (<STDIN>)
{
        push (@lines, $_);
}

print "-\n";

foreach my $i (@lines)
{
        if (scalar (grep { /$i/ } @lines) == 1)
        {
                print $i;
        }
}

Usage:

Tsunami repeated_lines # perl repeat.pl 
pqr
def
abc
lmn
pqr
abc
mkh
hgf
-
def
lmn
mkh
hgf
Tsunami repeated_lines #

To stop input, just hit Ctrl-D and the script will give you all the non-repeated strings in the input order.

You can also do something like:

Tsunami repeated_lines # cat lines |perl repeat.pl 
-
def
lmn
mkh
hgf
Tsunami repeated_lines # cat lines 
pqr
def
abc
lmn
pqr
abc
mkh
hgf
Tsunami repeated_lines #

fpmurphy · August 2, 2008, 6:12pm

Using awk ...

$ awk '! a[$0]++' file
pqr
def
abc
lmn
mkh
hgf
$

radoulov · August 2, 2008, 6:18pm

As I said,
you just need to search the forums to find the following solutions:

awk '!_[$0]++' input

perl -ne'print unless $_{$_}++' input

or:

perl -ne'$_{$_}++||print' input

era · August 3, 2008, 9:25am

If the requirement is to keep the last instead of the first occurrence, it's only marginally harder.

perl -ne '$n{$_} = $.; END { print sort { $n{$a} <=> $n{$b} } keys %n }'

If the last line lacks a newline, that will count as a unique line. It's not terribly hard to fix, but I didn't want to complicate the script.

radoulov · August 3, 2008, 10:56am

With GNU Awk:

awk 'END { for (k in r) t[sprintf("%10d", r[k])] = k
  n = asorti(t, _)
  while (++i <= n) print t[_] }
{ r[$0] = NR }' filename

radoulov · August 3, 2008, 11:10am

Or event better (ha!):

[GNU Awk]

WHINY_USERS=1 awk 'END { 
  for (k in r) t[sprintf("%10d", r[k])] = k
  for (k in t) print t[k]}
  { r[$0] = NR }
' filename

If GNU Awk is not available:

awk '{ _[$0] = NR }
END { for (k in _) print _[k], k }
' OFS="\t" filename | 
              sort -n | 
                cut -f2-