Replace string ids with unique numbers

ryan9011 · May 14, 2012, 3:44am

Hello,

I have a file with a 1000 ids in the form of strings. I want to replace each id with a unique numbers in the whole file. each id is repeating in all the columns. I know I can use sed command but there are many ids in file which are need to be converted

example of input file

B752 B295 B289
B295 Y710 B921
B289  B294 B294
B294 B289 B752
B584 B294 X216
B023 B584 B000
B99 B023 B584
B921 B99 B584
B000 T563 B000
24752 Y710  B295
T563 X216 B294 
Y710 B289 B289
X216 B752 B295
T53635 Y710 Y710
B629 24752 T53635 
BX99 B000 B289
BT24 B629 B294

Thanks in advance.
/ryan

ygemici · May 14, 2012, 4:01am

what is your expected output?

bakunin · May 14, 2012, 4:07am

Yes! That's the right attitude! ;-))

You could prepare a translation table in a second file and let a script read this file and invoke sed on every line. For instance (just a sketch):

The translation file:

0001=B752
0002=B295
0003=B289
...

The script:

#! /bin/ksh

chCode=""
chID=""
fTranslationTable="/path/to/some/file.xlate"
fWork="/path/to/your/file"

cat $fTranslationTable | cut -d'=' -f1,2 | while read chCode chID ; do
     sed "/${chID}/${chCode}/g" ${fWork} > ${fWork}.tmp
     mv ${fWork}.tmp ${fWork}
done

If the codes you want to replace the IDs with are only required to be unique you could even create them dynamically by splitting the lines of your original file so that every ID sits on a single line. sort -u will give you a list of unique IDs then and you can automatically create a code for every one of them, getting the translation table i used above. From there on you could use my method to replace the IDs with these codes.

I hope this helps.

bakunin

ryan9011 · May 14, 2012, 4:47am

for example if the number id for string B752 is 001. The code will replace B752 in
the file with the 001.

output for first row and its linked row will be as following

001 B295 B289
X216 001 B295

Rest of the string will also be converted accordingly.

ygemici · May 14, 2012, 6:18am

ryan9011:

for example if the number id for string B752 is 001. The code will replace B752 in
the file with the 001.

output for first row and its linked row will be as following
001 B295 B289
X216 001 B295
Rest of the string will also be converted accordingly.

did you try the @bakunin solution?
a similiar solution

# ./justdoit infile trans
001 002 003
002 0012 008
003  004 004
004 003 001
005 004 0013
006 005 009
007 006 005
008 007 005
009 0011 009
0010 0012  002
0011 0013 004
0012 003 003
0013 001 002
0014 0012 0012
0015 0010 0014
0016 009 003
0017 0015 004

# more trans
B752=001
B295=002
B289=003
B294=004
B584=005
B023=006
B99=007
B921=008
B000=009
24752=0010
T563=0011
Y710=0012
X216=0013
T53635=0014
B629=0015
BX99=0016
BT24=0017

## justdoit ##
cp $1 ${1}.bck
while IFS="=" read id val
do
sed "s/$id/$val/g" $1 >${1}.tmp
mv ${1}.tmp $1
done <$2
more $1