removing duplicates

stevie_velvet · July 5, 2008, 11:41am

Hi I have a file that are a list of people & their credentials i recieve frequently The issue is that whne I catnet this list that duplicat entries exists & are NOT CONSECUTIVE (i.e. uniq -1 may not weork here )
I'm trying to write a scrip that will remove duplicate entries
the script can typically made up of the following :
--------------
Ms AA
Unique to A
More of A

Mr BB

Mr CC

Ms AA
Unique to A
More of A

Mr DD

Mr EE

Mr BB

------------

Some of my technqiues of just are't working quite right especially with ignoring white spaces (maybe sed here)
(e.g. awk -F, '! mail[$3]++' inputfile )

any tips ?

ts

ms s

stevie_velvet · July 5, 2008, 1:25pm

No worries
I've seen this forma ased manual somewhere...

# delete duplicate, consecutive lines from a file (emulates "uniq").
# First line in a set of duplicate lines is kept, rest are deleted.
sed '$!N; /^$.*$\n\1$/!P; D'

# delete duplicate, nonconsecutive lines from a file. Beware not to
# overflow the buffer size of the hold space, or else use GNU sed.
sed -n 'G; s/\n/&&/; /^$[ -~]*\n$.*\n\1/d; s/\n//; h; P'

ALSO

XXXXX '!($0 in a);{a[$0]=1}' logfile
#where XXXX= awkfor Linuz & NAWK for Solaris

stevie_velvet · July 5, 2008, 6:06pm

anyone know how to integrate ignoring blank lines in the above scripts ?

stevie_velvet · July 7, 2008, 7:00am

bump ;
Any clever SED / *AWK'rs out there who knows how to ignore blank lines & can integrate into the above examples.....?

zaxxon · July 7, 2008, 7:02am

Ignore blank lines:

sed 's/^$//g'
# or
grep -v ^$

radoulov · July 7, 2008, 7:42am

What is the expected output given your sample data?