Hello,
My text file has input of the form
abc dft45.xml
ert rt653.xml
abc ert57.xml
I need to write a perl script/shell script to find duplicates in the first column and write it into a text file of the form...
abc dft45.xml
abc ert57.xml
Can some one help me plz?
Hi
awk 'NR==FNR{a[$1]++;next;}{ if (a[$1] > 1)print;}' file file
You need to give the filename twice as shown above.
Guru.
Can u plz explain the awk command what it is doing? & why u have mentioned "file" two times?
Hi
First time, when the file is processed, it takes the count of 1st column duplicates. Second time, when it is processed, it starts printing those lines which has count more than 1.
btw, did it work?
Guru.
1 Like
A single-pass version (increased ram requirement since all lines of the file are stored for END use):
awk '{a[NR]=$0; a[NR,"k"]=$1; k[$1]++} END {for (i=1; i<=NR; i++) if (k[a[i,"k"]] > 1) print a}' data
Regards,
Alister
what is meant by "First time process" and "Second time process" ?
I will try out & comment here quicky ASAP b'coz there is a problem in my machine.
---------- Post updated at 08:05 PM ---------- Previous update was at 07:52 PM ----------
alister:
A single-pass version (increased ram requirement since all lines of the file are stored for END use):
awk '{a[NR]=$0; a[NR,"k"]=$1; k[$1]++} END {for (i=1; i<=NR; i++) if (k[a[i,"k"]] > 1) print a}' data
Regards,
Alister
Can u explain what's the code is doing?
---------- Post updated 06-28-10 at 02:16 PM ---------- Previous update was 06-27-10 at 08:05 PM ----------
guruprasadpr:
Hi
First time, when the file is processed, it takes the count of 1st column duplicates. Second time, when it is processed, it starts printing those lines which has count more than 1.
btw, did it work?
Guru.
It worked ! thanks...But i need to find the count of it's occurence
awk ' { per[$1] += 1}
END { for (i in per)
print i, per } ' dupli.txt > dupli_count.txt
in the above code i need to print the Total count as "Sum=????" (i need to count 2nd column.)