Hi,
I have an entry file for a perl script from which I need to remove duplicate entry.
For example:
one:two:three
one:four:five
two:one:three
must become :
one:two:three
two:one:three
The duplicate entry is only the first field. I try many options of sort system command but don't find how do this.
Is someone can help ?
Thks
So simple, even *I* might be able to help with this one.
When I need to remove duplicate entries in a text file I use the Perl Hash to get the job done.
Here is some sample code that I pulled from a previous message that I responded to. [Hint: Forum search is your friend]
#!/usr/bin/perl
# RemoveDupes.pl
# Auswipe 21 Feb 2002
# Auswipe sez: "Hey, no guarantees!"
# Usage:
#
# RemoveDupes.pl -file someTextFile
use Getopt::Long;
GetOptions("file=s");
my %dataHash = ();
my $currentLine = 0;
if ($opt_file) {
open(INPUTFILE, "$opt_file") || die "Error: $!";
while ($logEntry = <INPUTFILE> ) {
chomp($logEntry);
if (!exists($dataHash{$logEntry})) {
$dataHash{$logEntry} = $currentLine;
};
$currentLine++;
};
close($opt_file);
} else {
print STDOUT "You didn't select a file!\n";
};
foreach $logOutput (sort { $dataHash{$a} <=> $dataHash{$b} } (keys(%dataHash))) {
print STDOUT "$logOutput\n";
};
D'oh!
I was re-reading your message and I see that you need to remove duplicates based upon the FIRST field of the colon seperated values.
That makes it a bit tricker but I'll see what I can do to help. The previous perl script is still good for complete lines of duplicate text.
EDIT: This code might help, however there might be some problems. I sort the removed duplicates on the first pass and then remove dupes based upon the first colon sperated value. This might be a problem for you in your application.
Give it a try and lemme know if it gets the job done.
#!/usr/bin/perl
# RemoveDupes.pl
# Auswipe 21 Feb 2002
# Auswipe sez: "Hey, no guarantees!"
# Usage:
#
# RemoveDupes.pl -file someTextFile
use Getopt::Long;
GetOptions("file=s");
my %dataHash = ();
my $currentLine = 0;
if ($opt_file) {
open(INPUTFILE, "$opt_file") || die "Error: $!";
while ($logEntry = <INPUTFILE> ) {
chomp($logEntry);
if (!exists($dataHash{$logEntry})) {
$dataHash{$logEntry} = $currentLine;
};
$currentLine++;
};
close($opt_file);
} else {
print STDOUT "You didn't select a file!\n";
};
my %secondHash = ();
foreach $logOutput (sort { $dataHash{$a} <=> $dataHash{$b} } (keys(%dataHash))) {
my @columns = split(/:/, $logOutput);
my $firstColumn = $columns[0];
if (!exists($secondHash{$firstColumn})) {
$secondHash{$firstColumn} = $logOutput;
};
};
foreach $firstColumn (sort {$secondHash{$a} <=> $secondHash{$b} } (keys(%secondHash))) {
print STDOUT "$secondHash{$firstColumn}\n";
};
Hi, thanks for help but it is not working.
I've got an error:
my %dataHash = (/: unmatched () in regexp line 10
Here is my script:
#!/usr/bin/perl
use Getopt::Long;
GetOption(file=s);
my %dataHash = ();
my $currentline = 0;
$entry = "/var/yp/script/removefile";
if ($entry)
{
open (IN, "$entry") || die "Error: $!";
while ($logentry = <IN>)
{
chomp($logentry);
if(!exists($dataHash($logentry)))
{
$dataHash($logentry) = $currentline;
};
$currentline++;
};
close(IN);
} else {
print "You didn't select a file\n";
};
my %secondHash = ();
foreach $logOutput (sort { $dataHash{$a} <=> $dataHash{$b}} (keys(%dataHash)))
{
my @columns = split (/:/,$logOutput);
my $firstcolumn = $columns[0];
if (!exists($secondHash{$firstcolumn}))
{
$secondHash{$firstcolumn} = $logOutput;
}
}
foreach $firstcolumn (sort { $secondHash{$a} <=> $secondHash{$b} (keys(%dataHash)))
{
print "$secondHash{$firstcolumn}\n";
}
Thanks
Don't search more !!!
I find another way to do the final goal !!!
Thanks for help