Duplicate lines in a file

faiz1985 · May 6, 2010, 4:09am

Hi All,

I am trying to remove the duplicate entries in a file and print them just once. For example, if my input file has:

00:44,37,67,56,15,12
00:44,34,67,56,15,12
00:44,58,67,56,15,12
00:44,35,67,56,15,12
00:59,37,67,56,15,12
00:59,34,67,56,15,12
00:59,35,67,56,15,12
00:59,58,67,56,15,12
01:14,35,68,53,15,12
01:14,37,68,53,15,12
01:14,34,68,53,15,12
01:14,58,68,53,15,12

I am trying to get the output as :

00:44,37,67,56,15,12
00:59,37,67,56,15,12
01:14,35,68,53,15,12

So basically what I am trying here is - if a line is repeated more than once, I want it to be printed(or stored in a file) just once.

Have tried using "uniq" command, but doesn't seem to be working.

Any help would be greatly appreciated. Thanks in advance!!

Scott · May 6, 2010, 4:15am

Hi.

None of your lines are actually duplicated (they're all unique). You mean the first field?

$ awk -F, '!A[$1]++' inputfile
00:44,37,67,56,15,12
00:59,37,67,56,15,12
01:14,35,68,53,15,12

faiz1985 · May 6, 2010, 4:22am

scottn:

Hi.

None of your lines are actually duplicated (they're all unique). You mean the first field?
$ awk -F, '!A[$1]++' inputfile
00:44,37,67,56,15,12
00:59,37,67,56,15,12
01:14,35,68,53,15,12

Hi Scottn,

Yes, I do mean the first field, sorry for missing out that earlier.

The awk command does not seem to be working for me

hws006a001: awk -F, '!A[$1]++' diskspace_Dywhapp_DR
awk: syntax error near line 1
awk: bailing out near line 1

Any idea what the error is?

Just for the record, the Unix environment is SunOS and I have tried usking ksh and bash. Thanks..

Scott · May 6, 2010, 4:23am

faiz1985:

Hi Scottn,

Yes, I do mean the first field, sorry for missing out that earlier.

The awk command does not seem to be working for me
hws006a001: awk -F, '!A[$1]++' diskspace_Dywhapp_DR
awk: syntax error near line 1
awk: bailing out near line 1
Any idea what the error is?

Just for the record, the Unix environment is SunOS and I have tried usking ksh and bash. Thanks..

Good ol' Solaris awk!

Use nawk, or /usr/xpg4/bin/awk

faiz1985 · May 6, 2010, 4:25am

nawk worked perfectly

Thanks a lot for your help buddy

alister · May 6, 2010, 7:35am

Hi, faiz1985:

scottn's awk approach works great, but it doesn't hurt to know other possible solutions

sort -ut, -k1,1 inputfile

Cheers,

Alister

aigles · May 6, 2010, 7:51am

The two solutions don't give the same result:
Scottn's solution :The first record for each key is selected.

00:44,37,67,56,15,12
00:59,37,67,56,15,12
01:14,35,68,53,15,12

alister's solution : The first record in alphabetic order for each key is selected

00:44,34,67,56,15,12
00:59,34,67,56,15,12
01:14,34,68,53,15,12

Jean-Pierre.

alister · May 6, 2010, 10:36am

Good point, aigles. sort's -u option does not specify which line of the set having an identical key will be returned. It's implementation defined. The results of your sort differ from the awk solution, but my sort gives an identical result.

Regards,
Alister