delete duplicated characters in each line

ivpz · December 21, 2009, 9:33pm

I'm a biologist trying to analyse some data and I'll appreciate some help with the following problem. I have a column of characters which I'll like to delete the duplicated characters in each line and report only the unique one.No sorting should be done. E.g.

The original data:

GTG
CTC
CTC
CTC
GCGAGC
GCGAGC
GCGAGC
GCGAGC
GATGTG
GATGTG
GATGTG
GATGTG
A
A
C

And I'm hoping to get:

G/T/
C/T/
C/T/
C/T/
G/C/A/
G/C/A/
G/C/A/
G/C/A/
G/A/T/
G/A/T/
G/A/T/
G/A/T/
A/
A/
C/

I've tried using tr, awk with if conditions but getting nowhere.

Thank you.

daptal · December 21, 2009, 10:15pm

cat abc.txt |  perl -e '
while(<>){ 
         chomp;
         my %hash;
         map { print "$_/" } grep(!$hash{$_}++, split(//));
         print "\n";
}'

HTH,
PL

jaduks · December 21, 2009, 10:57pm

I have tried something with python, ghostdog74, please suggest.

def u(list):
    set = {}
    return [set.setdefault(a,a) for a in list if a not in set]

for line in open("input.txt"):
        t = tuple(line)
        p = u(t)
        print "/".join(p),

$ python mt.py
G/T/
C/T/
C/T/
C/T/
G/C/A/
G/C/A/
G/C/A/
G/C/A/
G/A/T/
G/A/T/
G/A/T/
G/A/T/
A/
A/
C/

rdcwayx · December 21, 2009, 11:53pm

awk -F "" '{for (i=1;i<=NF;i++) { a[$i]++ ; if (a[$i]==1) printf $i"/" }} {printf "\n"} {for (i in a) a=0} ' urfile

ivpz · December 22, 2009, 5:35am

Thanks everyone for your help!

radoulov · December 22, 2009, 5:57am

perl -nle'
  %_=()or print join("/",grep!$_{$_}++,split//),"/";
  ' infile