How can i delete the duplicates based on one column of a line

rdhanek · August 4, 2009, 5:20am

I have my data something like this

(08/03/2009 22:57:42.414)(:) king aaaaaaaaaaaaaaaa bbbbbbbbbbbbbbbbbbbbbb
(08/03/2009 22:57:42.416)(:) John cccccccccccc cccccvssssssssss baaaaa
(08/03/2009 22:57:42.417)(:) Michael ddddddd tststststtststts
(08/03/2009 22:57:42.425)(:) Ravi vvvvvvvvvvvvvvvvvvsssssssss bsbbbbs
(08/03/2009 22:57:42.426)(:) John bgbhhhhhhhhhhhhhhhhh dddddddddddddd
(08/03/2009 22:57:42.427)(:) king hhhhhhhhhhhhhssssss rr

Here i need to take the 3rd column as the key foir finding the duplicate rows. I need the output to have the rows with only one king,one john and so on...

Output expected :

(08/03/2009 22:57:42.414)(:) king aaaaaaaaaaaaaaaa bbbbbbbbbbbbbbbbbbbbbb
(08/03/2009 22:57:42.416)(:) John cccccccccccc cccccvssssssssss baaaaa
(08/03/2009 22:57:42.417)(:) Michael ddddddd tststststtststts
(08/03/2009 22:57:42.425)(:) Ravi vvvvvvvvvvvvvvvvvvsssssssss bsbbbbs

can some expert help me with this? this will be very helpful for my script.

johnbach · August 4, 2009, 5:51am

May not be efficient

awk '!arr[$3]++ {print}'  file

rdhanek · August 4, 2009, 6:04am

I am getting syntax error with that command. Could you verify the syntax please?

Franklin52 · August 4, 2009, 6:20am

Use nawk or /usr/xpg4/bin/awk on Solaris.

Regards

rdhanek · August 4, 2009, 6:38am

I tried using

nawk '!arr[$3]++ {print}' file

it's not removing the duplicates..just printing all the rows.

ilikecows · August 4, 2009, 7:02am

Very inefficient:

awk '{x = $3
if (x != y) print
y = $3
}' file

rdhanek · August 4, 2009, 7:09am

This is printing all the lines without removing the lines with duplicate column3

Franklin52 · August 4, 2009, 7:38am

This should work:

awk '!arr[$3]++' file

This is my output:

$ cat file
(08/03/2009 22:57:42.414)(:) king aaaaaaaaaaaaaaaa bbbbbbbbbbbbbbbbbbbbbb
(08/03/2009 22:57:42.416)(:) John cccccccccccc cccccvssssssssss baaaaa
(08/03/2009 22:57:42.417)(:) Michael ddddddd tststststtststts
(08/03/2009 22:57:42.425)(:) Ravi vvvvvvvvvvvvvvvvvvsssssssss bsbbbbs
(08/03/2009 22:57:42.426)(:) John bgbhhhhhhhhhhhhhhhhh dddddddddddddd
(08/03/2009 22:57:42.427)(:) king hhhhhhhhhhhhhssssss rr
$
$ awk '!arr[$3]++' file
(08/03/2009 22:57:42.414)(:) king aaaaaaaaaaaaaaaa bbbbbbbbbbbbbbbbbbbbbb
(08/03/2009 22:57:42.416)(:) John cccccccccccc cccccvssssssssss baaaaa
(08/03/2009 22:57:42.417)(:) Michael ddddddd tststststtststts
(08/03/2009 22:57:42.425)(:) Ravi vvvvvvvvvvvvvvvvvvsssssssss bsbbbbs

Regards

ilikecows · August 4, 2009, 7:45am

awk '{x = x + 1
> y[x] = $3
> for ( z = 0; z <= x; ++z ) {
> if ( z == x )
> print
> if ( y[z] == $3 )
> break
> }}' file1

kshji · August 4, 2009, 11:33am

sort -u -k 3,3 file

rdhanek · August 5, 2009, 1:36am

Hi Franklin,
I am afraid, I tried that in my SunOS box and still getting the syntax error...Not sure if that won't work in Sun box.

Franklin52 · August 5, 2009, 2:16am

Use nawk or /usr/xpg4/bin/awk on Solaris.

Regards