Hi all,
I have to remove duplicate lines in a file without chainging the order.for eg if i have a record
pqr
def
abc
lmn
pqr
abc
mkh
hgf
the output should be
pqr
def
abc
lmn
mkh
hgf
Plz help me.It is urgent
Abhishek
Hi all,
I have to remove duplicate lines in a file without chainging the order.for eg if i have a record
pqr
def
abc
lmn
pqr
abc
mkh
hgf
the output should be
pqr
def
abc
lmn
mkh
hgf
Plz help me.It is urgent
Abhishek
Is this a homework?
Try searching the forums.
#!/usr/bin/perl -w
while (<STDIN>)
{
push (@lines, $_);
}
print "-\n";
foreach my $i (@lines)
{
if (scalar (grep { /$i/ } @lines) == 1)
{
print $i;
}
}
Usage:
Tsunami repeated_lines # perl repeat.pl
pqr
def
abc
lmn
pqr
abc
mkh
hgf
-
def
lmn
mkh
hgf
Tsunami repeated_lines #
To stop input, just hit Ctrl-D and the script will give you all the non-repeated strings in the input order.
You can also do something like:
Tsunami repeated_lines # cat lines |perl repeat.pl
-
def
lmn
mkh
hgf
Tsunami repeated_lines # cat lines
pqr
def
abc
lmn
pqr
abc
mkh
hgf
Tsunami repeated_lines #
Using awk ...
$ awk '! a[$0]++' file
pqr
def
abc
lmn
mkh
hgf
$
As I said,
you just need to search the forums to find the following solutions:
awk '!_[$0]++' input
perl -ne'print unless $_{$_}++' input
or:
perl -ne'$_{$_}++||print' input
If the requirement is to keep the last instead of the first occurrence, it's only marginally harder.
perl -ne '$n{$_} = $.; END { print sort { $n{$a} <=> $n{$b} } keys %n }'
If the last line lacks a newline, that will count as a unique line. It's not terribly hard to fix, but I didn't want to complicate the script.
With GNU Awk:
awk 'END { for (k in r) t[sprintf("%10d", r[k])] = k
n = asorti(t, _)
while (++i <= n) print t[_] }
{ r[$0] = NR }' filename
Or event better (ha!):
[GNU Awk]
WHINY_USERS=1 awk 'END {
for (k in r) t[sprintf("%10d", r[k])] = k
for (k in t) print t[k]}
{ r[$0] = NR }
' filename
If GNU Awk is not available:
awk '{ _[$0] = NR }
END { for (k in _) print _[k], k }
' OFS="\t" filename |
sort -n |
cut -f2-