UTF8 encoding

Hi experts,

I have a gz file from other system(solaris), which is ftped to our system(solaris).

After gunzip, the file is a xml file and we are using ORACLE built in xml transformiing tool ORAXSL to transform XML to TXT.

Now the issue is we come accross issue regarding UTF8 as below:
Error occurred while parsing RDC2010052100149253.xml: Invalid UTF8 encoding.

As informed by our SA/DBA, we have set NLS_LANG as below in shell script, still got same error. (Even we use dos2unix -ascii to transform the unzipped xml file)

export NLS_LANG=AMERICAN_AMERICA.UTF8

Is there anyway to fix this issue?

Any help to clue will be highly appreciated.

First of all, does your XML contain an encoding declaration as the first line? Something like:

<?xml version="1.0" encoding="UTF-8"?>

If not, please let us what the encoding declaration is.

Next of all, not all valid UTF8 characters are valid XML characters. See Section 2.2 of Extensible Markup Language (XML) 1.0 (Second Edition) This applies to CDATA also. If this is the case, all you can do is write a filter to scrub the offending characters.