Find duplicates in the first column of text file

Hello,

My text file has input of the form

abc dft45.xml
ert  rt653.xml
abc ert57.xml

I need to write a perl script/shell script to find duplicates in the first column and write it into a text file of the form...

abc dft45.xml
abc ert57.xml

Can some one help me plz?

Hi

awk 'NR==FNR{a[$1]++;next;}{ if (a[$1] > 1)print;}' file file

You need to give the filename twice as shown above.

Guru.

Can u plz explain the awk command what it is doing? & why u have mentioned "file" two times?

Hi
First time, when the file is processed, it takes the count of 1st column duplicates. Second time, when it is processed, it starts printing those lines which has count more than 1.

btw, did it work?

Guru.

1 Like

A single-pass version (increased ram requirement since all lines of the file are stored for END use):

 awk '{a[NR]=$0; a[NR,"k"]=$1; k[$1]++} END {for (i=1; i<=NR; i++) if (k[a[i,"k"]] > 1) print a}' data

Regards,
Alister

what is meant by "First time process" and "Second time process" ?

I will try out & comment here quicky ASAP b'coz there is a problem in my machine.

---------- Post updated at 08:05 PM ---------- Previous update was at 07:52 PM ----------

Can u explain what's the code is doing?

---------- Post updated 06-28-10 at 02:16 PM ---------- Previous update was 06-27-10 at 08:05 PM ----------

It worked ! thanks...But i need to find the count of it's occurence

awk ' { per[$1] += 1}
END { for (i in per)
print i, per } ' dupli.txt > dupli_count.txt 

in the above code i need to print the Total count as "Sum=????" (i need to count 2nd column.)