xshang
November 6, 2012, 12:02am
1
We usually use the following awk code to delete of find out the replicate record.
awk -F, '{a[$1]++} END {for (i in a) if (a>=2) print i a}' file
My question is how can I print the whole record. The following code doesn't work.
awk -F, '{a[$1]++} END {for (i in a) if (a>=2) print $0}' file
Thank you!
ripat
November 6, 2012, 12:58am
2
Try this:
'{a[$0]++} END {for (i in a) if (a>=2) print i}' file
1 Like
And one needn't wait till the file is read completely, to determine/print the duplicate records:
awk 'a[$0]++==1' file
1 Like
xshang
November 6, 2012, 1:18am
4
Sorry I can't express my desire clearly. What I want is printing out the record when they have replicate $1.
---------- Post updated at 02:18 AM ---------- Previous update was at 02:18 AM ----------
Sorry I can't express my desire clearly. What I want is printing out the record when they have replicate $1.
Hi,
Try this one,
awk -F, '{a[$1]++;if(v[$1]){v[$1]=v[$1] ORS $0;}else{v[$1]=$0;}} END {for (i in a) if (a>=2) print v}' file
If you want to disply only the duplicated lines,
awk -F';' '{a[$1]++;}a[$1]>1{if(v[$1]){v[$1]=v[$1] ORS $0;}else{v[$1]=$0;}} END {for (i in a) if (a>=2) print v}' file
Cheers,
Ranga
1 Like
With some assumptions:
sort -t, -k1,1 file|awk -F, 'p1==$1{if(p) print p0;p=0;print;next}{p1=$1;p0=$0;p=1}'
1 Like
xshang
November 6, 2012, 1:38am
7
rangarasan:
Hi,
Try this one,
awk -F, '{a[$1]++;if(v[$1]){v[$1]=v[$1] ORS $0;}else{v[$1]=$0;}} END {for (i in a) if (a>=2) print v}' file
If you want to disply only the duplicated lines,
awk -F';' '{a[$1]++;}a[$1]>1{if(v[$1]){v[$1]=v[$1] ORS $0;}else{v[$1]=$0;}} END {for (i in a) if (a>=2) print v}' file
Cheers,
Ranga
It works, Thank you!
---------- Post updated at 02:38 AM ---------- Previous update was at 02:35 AM ----------
Nice! Thank you! Can you explain the code in awk? I never saw that kind of code.
Sure. First of all, I'd made a mistake in my earlier script. Corrected now in my post.
It sorts the input file on the first field (delimited by commas) so that the duplicate records (w.r.t. the first field) are adjacent.
In the awk script, track is kept of the previous first field (p1) and record (p0). When p1 happens to be the same as the current first field, that's the start of a duplicate "bunch".
1 Like
xshang
November 6, 2012, 2:12am
9
elixir_sinari:
Sure. First of all, I'd made a mistake in my earlier script. Corrected now in my post.
It sorts the input file on the first field (delimited by commas) so that the duplicate records (w.r.t. the first field) are adjacent.
In the awk script, track is kept of the previous first field (p1) and record (p0). When p1 happens to be the same as the current first field, that's the start of a duplicate "bunch".
I know a little more about awk again. Thanks!