How to find duplicate entries

I have a file contails as below

I/P:

123456
123456
234567
987654
678905
678905

Like above i have 1000's of entries
I need output as below
O/P:

123456
678905

I'm using uniq -d filename it is showing results but it is missing few duplicate entries and i dont know why.Please help me.

awk 'A[$0]++==1' file

Better to use $1 instead of $0 to avoid skipping some duplicate numbers due to leading/trailing whitespace.

2 Likes

Hi Buzzme,

awk '!d[$0]++' file

Correction: This ll have only unique entries

That will not work.
Revisit the OP's requirement.

elixir_sinari,
you are correct, that ll not work!. I did not understand the problem at first shot. Thanks.. for correcting me.

Buzzme,
> I'm using uniq -d filename it is showing results but it is missing few duplicate entries and i dont know why

  • You may need to use
sort 

before

uniq -d 

, to have it work correctly. Wondering if you have tried it.

Please check it out with sorting numerical order:

sort -n file|uniq -d

Here is onother version with uniq that ll give output inclduing a numerical sorted output:

sort -n file|uniq -c|awk '{if ($1>1) print $2}'

Enjoy Have fun!.

You cannot meaningfully use a numeric sort, since uniq expects its data to be sorted lexicographically.

uniq will not consider "01" to be equal to "1", nor 1.0 to 1.00, and nor " 1" to "1". If leading/trailing zeroes/whitespace are a concern, then either the file needs to be preprocessed to normalize the entries, or a more capable tool should be used, e.g. perl or AWK.

Demonstration:

$ printf '%s\n' 1 01 001 '  1'
1
01
001
  1
$ printf '%s\n' 1 01 001 '  1' | sort -un
1
$ printf '%s\n' 1 01 001 '  1' | sort -n | uniq
  1
001
01
1

Notice how sort -un knows that it's doing a numeric comparison and considers all 4 terms to be equal. However, uniq considers each value to be distinct.

Regards,
Alister

1 Like

Then they are not really duplicates.
Check if you have trailing spaces.

$ cat input # first line has trailing space
123456
123456
234567
987654
678905
678905
$ uniq -d input
678905
$ tr -d " " < input | uniq -d
123456
678905

It could as be the result of the file not being sorted, as was mentioned earlier.

$ printf '%s\n' 1 2 1 2 1 2 1 2 > file
$ cat file
1
2
1
2
1
2
1
2
$ uniq -d file
$ sort file | uniq -d
1
2

Regards,
Alister

Good point. I assumed that was already taken into account. But the sample input provided is not sorted.

Back to original poster: uniq only works correctly on sorted file. It runs on whatever you provide it, but to get meaningful results the input to uniq must be sorted. uniq looks for adjacent duplicated lines.