Your requirements are a bit vague, but here is a possible perl solution:
#!/usr/bin/perl
use warnings;
use strict;
#use Data::Dumper; #uncomment for debugging
unless (scalar @ARGV == 2){
die "Usage: perl scriptname.pl inputfile outputfile\n";
}
my $outfile = pop @ARGV;
my %names = ();
my %count = ();
while (<>){
chomp;
my ($first,$mi,$last,$state) = unpack("a10a2a11a2",$_);
(s/^\s*//, s/\s*$//) for ($first,$mi,$last,$state);
$names{"$first,$last"}={count => ++$count{"$first,$last"},
name => "$first $mi $last $state",
};
}
#print Dumper \%names; #uncomment for debugging
open my $out , '>' , $outfile or die "$!";
foreach my $person (keys %names) {
next if $names{$person}{count}>1;
print $out $names{$person}{name},"\n";
}
close $out;
print STDOUT "finished";
exit(0);
$names{"$first,$last"} creates a hash key from the first and last name.
its' value is in turn a hash:
$names{"$first,$last"} = {count=>'' , name => '' };
the "count" keys value is the value of another hash: %count, which is keeping count of how many times the first,last names are found:
++$count{"$first,$last"}
so we can determine later if it is a unique combination or not. If it has a count of 1 (one) then it is unique.
the "name" keys is just the original line from the file which we use to print to the output file if the value of the "count" key is 1 (one).
You can uncomment the lines that say to "uncomment for debugging" and you will see the data structure of %names printed when the script finishes running.
That part actually removes leading and trailing spaces from the list of variables. If there are internal spaces they are kept because names can have spaces in them, and if you removed the internal spaces you could potentially create false matches, example:
John W "Van Johnson" (last name in quotes to show it is one field)
John W VanJohnson
This is probaly a rare circumstance (and not a very good example) but it is possible, especially if the names are not in English.
You're welcome. Actually that data structure could have been a bit simpler:
while (<>){
chomp;
my ($first,$mi,$last,$state) = unpack("a10a2a11a2",$_);
(s/^\s*//, s/\s*$//) for ($first,$mi,$last,$state);
$names{"$first,$last"}{count}++;
$names{"$first,$last"}{name} = "$first $mi $last $state",
}
This eliminates the need for the seperate hash to keep track of the counts. I like to use the seperate hash for counts because in general data is much more complex than this and incrementing a count can be much easier done if it is kept seperate.
This hash idiom associates the full record ($first $mi $last $state) with each $first,$last instance in hash %names.
If my language is not correct in this description, then my thinking is also incorrect. I am going to reread this article on hashes by Simon Cozens: perl.com: Hash Crash Course and try to get a fuller understanding.