Find the replicate record using awk

We usually use the following awk code to delete of find out the replicate record.

awk -F, '{a[$1]++} END {for (i in a) if (a>=2) print i a}' file

My question is how can I print the whole record. The following code doesn't work.

awk -F, '{a[$1]++} END {for (i in a) if (a>=2) print $0}' file

Thank you!

Try this:

'{a[$0]++} END {for (i in a) if (a>=2) print i}' file
1 Like

And one needn't wait till the file is read completely, to determine/print the duplicate records:

awk 'a[$0]++==1' file
1 Like

Sorry I can't express my desire clearly. What I want is printing out the record when they have replicate $1.

---------- Post updated at 02:18 AM ---------- Previous update was at 02:18 AM ----------

Sorry I can't express my desire clearly. What I want is printing out the record when they have replicate $1.

Hi,

Try this one,

awk -F, '{a[$1]++;if(v[$1]){v[$1]=v[$1] ORS $0;}else{v[$1]=$0;}} END {for (i in a) if (a>=2) print v}' file

If you want to disply only the duplicated lines,

awk -F';' '{a[$1]++;}a[$1]>1{if(v[$1]){v[$1]=v[$1] ORS $0;}else{v[$1]=$0;}} END {for (i in a) if (a>=2) print v}' file

Cheers,
Ranga :slight_smile:

1 Like

With some assumptions:

sort -t, -k1,1 file|awk -F, 'p1==$1{if(p) print p0;p=0;print;next}{p1=$1;p0=$0;p=1}'
1 Like

It works, Thank you!

---------- Post updated at 02:38 AM ---------- Previous update was at 02:35 AM ----------

Nice! Thank you! Can you explain the code in awk? I never saw that kind of code.

Sure. First of all, I'd made a mistake in my earlier script. Corrected now in my post.
It sorts the input file on the first field (delimited by commas) so that the duplicate records (w.r.t. the first field) are adjacent.
In the awk script, track is kept of the previous first field (p1) and record (p0). When p1 happens to be the same as the current first field, that's the start of a duplicate "bunch".

1 Like

I know a little more about awk again. Thanks!