I have a directory with a large # of files and in each file I am looking to match a string in one file with a string in the subsequent n file(s). If there is a match between a string in one file and a string in the next n file(s) then delete the subsequent duplicate file(s). Here is sample input:
What constitutes "subsequent"? Is it the next file in the order "ls -l", and is a "subsequent" list terminated by a file not containing the match characters, or can the "subsequent" list extent to any file containing the match characters?
for i in ???.txt
do
c=$(head -1 $i)
echo "$c|$i"
done | perl -e '{my %s; while(<>){chomp;($st,$fn) = split(/\|/);if (! defined($s{$st})) {$s{$st} = $fn; print "$s{$st}\n";}}}' | xargs ls -l
Description:
For each file,
echo the string, followed by pipe symbol, followed by the filename
end of for loop, pass this into perl script via standard in
the perl script splits output on the pipe symbol,
checks if the string name is defined in the hash, if not, store the filename value, with the string as the key to the hash, then print the filename
Send this output as standard input to the xargs which passes each filename to the "ls -l" command.