I'm a biologist trying to analyse some data and I'll appreciate some help with the following problem. I have a column of characters which I'll like to delete the duplicated characters in each line and report only the unique one.No sorting should be done. E.g.
The original data:
GTG
CTC
CTC
CTC
GCGAGC
GCGAGC
GCGAGC
GCGAGC
GATGTG
GATGTG
GATGTG
GATGTG
A
A
C
I have tried something with python, ghostdog74, please suggest.
def u(list):
set = {}
return [set.setdefault(a,a) for a in list if a not in set]
for line in open("input.txt"):
t = tuple(line)
p = u(t)
print "/".join(p),