Help with \u0401 codes ? unicode or something

system · October 8, 2010, 2:39pm

hello
there's some stranges code symbols they looks like this:
\u0438 \u0247. unicode i think
this code can be viewed by javascript so i need it
i need to convert casual characters to this code with perl
atm stucked with ord, chr, pack, etc things but they giving other digits

fpmurphy · October 8, 2010, 4:17pm

Looks like CYRILLIC SMALL LETTER I (i.e the reversed small N) and LATIN SMALL LETTER E WITH STROKE (i.e small e with a forward stroke though it) but my betting is that the are something else as far as your application is concerned.

Can you provide an example of the "casual characters" you need to convert?

system · October 9, 2010, 1:30am

yes cyrillic but latin too
here's example:
<table><tr><td> </td></tr></table>

i need to encode this line to line of characters: \u0203\u0472 etc (i don't actualy know the right code for those characters)

niterobin · October 9, 2010, 2:20am

I don't know if this will be of use to you - it's a website with character lookup tables.

Free Online Unicode Character Map

HTH, Rob.

(I've never done any unicode translation myself, but found this with a web search "unicode lookup".)

system · October 9, 2010, 5:51am

well those codes looks similar to what i need but anyway i need to get them from casual characters through perl
perl somehow can convert it but i can't find how

---------- Post updated at 12:27 PM ---------- Previous update was at 10:40 AM ----------

found such function on javascript with whole charmap:

function koi2unicode(str) {
   var charmap   = unescape(
"\u2500\u2502\u250C\u2510\u2514\u2518\u251C\u2524\u252C\u2534\u253C\u2580\u2584\u2588\u258C\u2590"+
"\u2591\u2592\u2593\u2320\u25A0\u2219\u221A\u2248\u2264\u2265\u00A0\u2321\u00B0\u00B2\u00B7\u00F7"+
"\u2550\u2551\u2552\u0451\u2553\u2554\u2555\u2556\u2557\u2558\u2559\u255A\u255B\u255C\u255D\u255E"+
"\u255F\u2560\u2561\u0401\u2562\u2563\u2564\u2565\u2566\u2567\u2568\u2569\u256A\u256B\u256C\u00A9"+
"\u044E\u0430\u0431\u0446\u0434\u0435\u0444\u0433\u0445\u0438\u0439\u043A\u043B\u043C\u043D\u043E"+
"\u043F\u044F\u0440\u0441\u0442\u0443\u0436\u0432\u044C\u044B\u0437\u0448\u044D\u0449\u0447\u044A"+
"\u042E\u0410\u0411\u0426\u0414\u0415\u0424\u0413\u0425\u0418\u0419\u041A\u041B\u041C\u041D\u041E"+
"\u041F\u042F\u0420\u0421\u0422\u0423\u0416\u0412\u042C\u042B\u0417\u0428\u042D\u0429\u0427\u042A")
var code2char = function(code) {
               if(code >= 0x80 && code <= 0xFF) return charmap.charAt(code - 0x80)
               return String.fromCharCode(code)
            }
   var res = ""
   for(var i = 0; i < str.length; i++) res = res + code2char(str.charCodeAt(i))
   return res
}

but i still need it on perl :\

---------- Post updated at 01:51 PM ---------- Previous update was at 12:27 PM ----------

EXAMPLE
unpack("U",u046F) gives 117
117 is the code of character "u" (chr(117) = u, ord(u) = 117)

so HOW to get u046F back from 117 ???

pack("U",117) gives "u" but not hes code
hex(117) gives 279..

fpmurphy · October 9, 2010, 12:28pm

Have you looked at Text::Iconv?

system · October 9, 2010, 1:23pm

why? i don't need to convert text to other encoding
i need characters code from every symbol in that text

anyway i've done that with regexp and "hard replacement"
just every cyrillic symbol s/// to \u0char

fpmurphy · October 9, 2010, 9:11pm

Well now that you have the KOI8-R (Kod Obmena Informatsiey, 8 bit, if I am not mistaken) to Unicode remapping, it should only take you a few minutes to convert it into Perl.