duplicate line in a text file

nixguy · April 25, 2008, 1:45pm

i would like to scan file in for duplicate lines, and print the duplicates to another file,
oh and it has to be case insensitive.

example

line1
line2
line2
line3
line4
line4

outputfile:
line2
line4

any ideas

era · April 25, 2008, 1:49pm

perl -ne '$l = lc(); print if $m{$l}++; ' file

lc returns the specified string in lowercase; in this case, with no parameter, it defaults to the current input line. $m{$l} is the count of number of times we have seen $l; if it's nonzero, it's a duplicate, so we print it.

nixguy · April 25, 2008, 1:54pm

era:

perl -ne '$l = lc(); print if $m{$l}++; ' file
lc returns the specified string in lowercase; in this case, with no parameter, it defaults to the current input line. $m{$l} is the count of number of times we have seen $l; if it's nonzero, it's a duplicate, so we print it.

thanks it's worked for one file, but i forgot to add that if i need it to run on a directory and make sure to do in on a file only and not subdirectory

what modifications would you do

era · April 25, 2008, 2:21pm

Loop over files in a directory, and remove duplicates? Do you want to replace the files?

for f in directory/*; do
  test -d "$f" && continue   # skip if it's a subdirectory
  perl -ne '$l = lc(); print if $m{$l}++; ' "$f" >"$f.tmp"
  mv "$f.tmp" "$f"
done

nixguy · April 25, 2008, 2:45pm

Thank you Era, it worked,

summer_cherry · April 29, 2008, 2:19am

awk '{
  a[$0]++
}
END{
 for (i in a)
 if (a=2)
 print i
}' filename