Find duplicates in file with line numbers

vatigers · July 14, 2018, 4:36pm

Hello All,

This is a noob question. I tried searching for the answer but the answer found did not help me .

I have a file that can have duplicates.

the number 100 is duplicated twice. I want to find the duplicate along with the line number.

expected output

100 5

please help

Regards
David

RudiC · July 14, 2018, 5:12pm

Welcome to the forum.

Sure none of the links given at bottom left (under "More UNIX and Linux Forum Topics You Might Find Helpful") can't help? e.g. this one?

vatigers · July 14, 2018, 6:09pm

I did look at those links.
I tried

awk -F: 'x[$1]++ { print $1 " is duplicated"}' fileName

The output I get is

is duplicated

I want the duplicate value along with the line number

RudiC · July 15, 2018, 2:42am

That seems an indicator your file has non-*nix DOS line terminators (<carriage return>, <CR>, \r, ^M, 0x0D). Remove before continuing, e.g. with dos2unix .

There's no colon in the file - no reason to set it as FS. Try

awk 'a[$1]++ {print $0, NR}' file
100 5

Don_Cragun · July 15, 2018, 12:00pm

If you don't want to use dos2unix before processing your input files, you can also build that code into your awk script:

awk '
{	sub(/\r$/, "")
}
a[$1]++ {
	print $0, NR
}' file

Note that the above code prints complete lines when the 1st field on the given lines is duplicated. If you want to only print the 1st field when the 1st field is duplicated, use:

awk '
{	sub(/\r$/, "")
}
a[$1]++ {
	print $1, NR
}' file

If you want to print complete lines when complete lines are duplicated, use:

awk '
{	sub(/\r$/, "")
}
a[$0]++ {
	print $0, NR
}' file