sed sorting command explanation

mukeshguliao · March 14, 2011, 10:59pm

 
sed '$!N; /^\(.*\)\n\1$/!P; D'

i found this file which removes duplicates irrespective for sorted or unsorted file. keep first occurance and remove the further occurances.

can any1 explain how this is working..

i need to remove duplicates following file. duplicate criteria is not the complete line, but only first 3 parameters. ie (a,b,c) or (e,r,t)

 
a,b,c,d
a,b,c,s
e,r,t,y
a,b,c,a
a,b,c,e
e,r,t,y

i need an output like

 
a,b,c,d
e,r,t,s

taking the first occurance and ignoring the rest.

please help. i am getting nowhere from it.

yinyuemi · March 14, 2011, 11:07pm

Do you mean this?

awk '!++a[$1$2$3]' FS=","

Chubler_XL · March 14, 2011, 11:26pm

Funny but the above didn't work for me I needed the following:

awk -F, '!a[$1,$2,$3]++'

Note: extra commas are to avoid false match between "aa,b,c" and "a,ab,c" or ",a,b" and "a,,b"

mukeshguliao · March 14, 2011, 11:48pm

 
awk -F, '!a[$1,$2,$3]++' UnixEg.dat

i tried above, but got an error

a[$1,$2,$3]++': Event not found

guess could be beacause of ! . escaped by \

 
awk -F, '\!a[$1,$2,$3]++' UnixEg.dat

then got

awk: syntax error near line 1
awk: bailing out near line 1

am i missing something. trying to run in csh shell.

---------- Post updated at 10:48 PM ---------- Previous update was at 10:39 PM ----------

fixed it though.

changed to ksh shell
and ran this

 
nawk -F, '!a[$1,$2,$3]++' UnixEg.dat

works superbly.

thanks man.