An invalid XML character (Unicode: 0x1a)

While uploading an exl file to my application in Solaris 10 the upload failed with error

Error! Parsing Error: /SPLM/TC83/tcdata83/model/model_dbextract.xml  Line:65576 Column:73 An invalid XML character (Unicode: 0x1a) was found  in the value of attribute "unitOfMeasureSymbol" and element is  "TcUnitOfMeasure".
Please check the errors.
Aborting...
Exception Encountered!!!
java.lang.NullPointerException

what i found is xml file when i open in windows the failed line shows something like this

<TcUnitOfMeasure unitOfMeasureName="Microampere" unitOfMeasureSymbol="�A"/>
        <TcUnitOfMeasure unitOfMeasureName="Microfarad" unitOfMeasureSymbol="�F"/>

same line after transfering to unix using ascii option in ftp looks like

 <TcUnitOfMeasure unitOfMeasureName="Microampere" unitOfMeasureSymbol="\265A"/>
                <TcUnitOfMeasure unitOfMeasureName="Microfarad" unitOfMeasureSymbol="\265F"/>

if i use ftp transfer option as binary looks like

<TcUnitOfMeasure unitOfMeasureName="Microampere" unitOfMeasureSymbol="^ZA"/>
                <TcUnitOfMeasure unitOfMeasureName="Microfarad" unitOfMeasureSymbol="^ZF"/>

hence the symbol for micofard mu is not parsing in unix, can experts help me how i can solve this issue

Thank you
Raghu

Your file seems to be encoded in ISO-8859-1 by windows while UTF-8 is likely expected.

Is an encoding specified in its header ?
Something like:

<?xml version="1.0" encoding="utf-8" ?> 

?

In any case, this should work:

 unitOfMeasureSymbol="A"

Header is

?xml version="1.0" encoding="UTF-8" standalone="no"?>

i checked for my locale settings in server

-> locale
LANG=
LC_CTYPE="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_COLLATE="C"
LC_MONETARY="C"
LC_MESSAGES="C"
LC_ALL=
and locale - lists this output

C
POSIX
hi_IN.UTF-8
iso_8859_1
ja
ja_JP.PCK
ja_JP.UTF-8
ja_JP.eucJP
ko
ko.UTF-8
ko_KR.EUC
ko_KR.EUC@dict
ko_KR.UTF-8
ko_KR.UTF-8@dict
th
th_TH
th_TH.ISO8859-11
th_TH.TIS620
th_TH.UTF-8
zh
zh.GBK
zh.UTF-8
zh_CN.EUC
zh_CN.EUC@pinyin
zh_CN.EUC@radical
zh_CN.EUC@stroke
zh_CN.GB18030
zh_CN.GB18030@pinyin
zh_CN.GB18030@radical
zh_CN.GB18030@stroke
zh_CN.GBK
zh_CN.GBK@pinyin
zh_CN.GBK@radical
zh_CN.GBK@stroke
zh_CN.UTF-8
zh_CN.UTF-8@pinyin
zh_CN.UTF-8@radical
zh_CN.UTF-8@stroke
zh_HK.BIG5HK
zh_HK.BIG5HK@radical
zh_HK.BIG5HK@stroke
zh_HK.UTF-8
zh_TW
zh_TW.BIG5
zh_TW.BIG5@pinyin
zh_TW.BIG5@radical
zh_TW.BIG5@stroke
zh_TW.BIG5@zhuyin
zh_TW.EUC
zh_TW.EUC@pinyin
zh_TW.EUC@radical
zh_TW.EUC@stroke
zh_TW.EUC@zhuyin
zh_TW.UTF-8

so that means i don't have corret UTF-8 locale?

i used "tcunitOfMeasureSymbol="A" still no sucess

You may have something else going on. What parser are you using? Can your parser handle the following short XML document?

<?xml version="1.0" encoding="utf-8" ?>
< ="2.5-7">
 < id="14">
  <>45-3454-123</>
  <>1512</>
  < xml:lang="ja"></>
 </>
 < id="64">
  <>45-7894-456</>
  <>1435</>
  < xml:lang="ja"></>
 </>
</>

Okay. Then that is the problem. Your � is not in UTF-8 in this file.

It seems you are using 7 bit ASCII locale. What says

set|grep LC

?

This is odd. What error message do you get ?

Here is what it return

-> set|grep LC
MAILCHECK=600

about the error, it's same error and at the same line it start "microfarad"

Localization Extraction Completed.
Please refer [/SPLM/TC83/server_root/logs/business_model_extractor_2011_05_14_08-09-49.log] for log information
An invalid XML character (Unicode: 0x1a) was found in the value of attribute "unitOfMeasureSymbol" and element is "TcUnitOfMeasure".
org.xml.sax.SAXParseException: An invalid XML character (Unicode: 0x1a) was found in the value of attribute "unitOfMeasureSymbol" and element is "TcUnitOfMeasure".
at org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown Source)
at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
at org.apache.xerces.impl.XMLScanner.reportFatalError(Unknown Source)
at org.apache.xerces.impl.XMLScanner.scanAttributeValue(Unknown Source)
at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanAttribute(Unknown Source)
at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at com.teamcenter.bmide.base.core.loader.XMLContentParser.parseWithValiation(Unknown Source)
at com.teamcenter.bmide.base.core.loader.XMLContentParser.parse(Unknown Source)
at com.teamcenter.bmide.base.core.loader.XMLContentParser.parse(Unknown Source)
at com.teamcenter.bmide.foundation.core.loader.BusinessDataContentParser.parse(Unknown Source)
at com.teamcenter.bmide.foundation.core.util.ServerCoreUtil.buildModels(Unknown Source)
at com.teamcenter.bmide.foundation.core.util.ServerCoreUtil.buildModels(Unknown Source)
at com.teamcenter.bmide.tcplmxml.xsdgen.impl.TcPlmXmlXsdInstallToTC.install(Unknown Source)
at com.teamcenter.bmide.tcplmxml.xsdgen.impl.TcPlmXmlXsdInstallToTCMain.main(Unknown Source)
Aborting...
fpmurphy

My server coludn't hadle the test xml file you gave, transfered it in ascii/binary format in ftp and checked

-> cat test.xml
<?xml version="1.0" encoding="utf-8" ?>
<????????? ??????="2.5-7">
 <?????? id="14">
  <????????????>45-3454-123</????????????>
  <????????>1512</????????>
  <???????? xml:lang="ja">?????</????????>
 </??????>
 <?????? id="64">
  <????????????>45-7894-456</????????????>
  <????????>1435</????????>
  <???????? xml:lang="ja">???????�???</????????>
 </??????>
</?????????>infodba-ie10ux013:/home/infodba

---------- Post updated at 04:13 AM ---------- Previous update was at 04:12 AM ----------

Here is what it return

-> set|grep LC
MAILCHECK=600

about the error, it's same error and at the same line it start "microfarad"

Localization Extraction Completed.
Please refer [/SPLM/TC83/server_root/logs/business_model_extractor_2011_05_14_08-09-49.log] for log information
An invalid XML character (Unicode: 0x1a) was found in the value of attribute "unitOfMeasureSymbol" and element is "TcUnitOfMeasure".
org.xml.sax.SAXParseException: An invalid XML character (Unicode: 0x1a) was found in the value of attribute "unitOfMeasureSymbol" and element is "TcUnitOfMeasure".
at org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown Source)
at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
at org.apache.xerces.impl.XMLScanner.reportFatalError(Unknown Source)
at org.apache.xerces.impl.XMLScanner.scanAttributeValue(Unknown Source)
at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanAttribute(Unknown Source)
at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at com.teamcenter.bmide.base.core.loader.XMLContentParser.parseWithValiation(Unknown Source)
at com.teamcenter.bmide.base.core.loader.XMLContentParser.parse(Unknown Source)
at com.teamcenter.bmide.base.core.loader.XMLContentParser.parse(Unknown Source)
at com.teamcenter.bmide.foundation.core.loader.BusinessDataContentParser.parse(Unknown Source)
at com.teamcenter.bmide.foundation.core.util.ServerCoreUtil.buildModels(Unknown Source)
at com.teamcenter.bmide.foundation.core.util.ServerCoreUtil.buildModels(Unknown Source)
at com.teamcenter.bmide.tcplmxml.xsdgen.impl.TcPlmXmlXsdInstallToTC.install(Unknown Source)
at com.teamcenter.bmide.tcplmxml.xsdgen.impl.TcPlmXmlXsdInstallToTCMain.main(Unknown Source)
Aborting...
Hello fpmurphy

My server coludn't handle the test xml file you gave, transfered it in ascii/binary format in ftp and checked

-> cat test.xml
<?xml version="1.0" encoding="utf-8" ?>
<????????? ??????="2.5-7">
 <?????? id="14">
  <????????????>45-3454-123</????????????>
  <????????>1512</????????>
  <???????? xml:lang="ja">?????</????????>
 </??????>
 <?????? id="64">
  <????????????>45-7894-456</????????????>
  <????????>1435</????????>
  <???????? xml:lang="ja">???????�???</????????>
 </??????>
</?????????>infodba-ie10ux013:/home/infodba

You have no locale set. What says:

cat /etc/default/init | grep -v "^#"

?

It looks like you didn't replace all occurences of "�" by "".

Set your locale to hi_IN.UTF-8 and test your XML document again. You are using the Xerces processsor - which is extremely robust.

There are many Unicode characters that are not allowed in an XML document, according to the XML specification. See sections 2.2 and 4.1 of the 1.0 specification. Typical disallowed characters are control characters (such as 0x1a) , even if you escape them using the Character Reference form, i.e. &#xxxx; I would examine your XML document to see if there is a spurious 0x1a there. (use od -hc, xxd, 1.0 etc.)

1 Like

The our of command cat /etc/default/init | grep -v "^#"

-> cat /etc/default/init | grep -v "^#"
TZ=Asia/Calcutta
CMASK=022

Yes i didn't replace all occurance, relaced first two lines, thing to check if error move from that line, but it didn't.., it stuck at the first line where i had made changes

That doesn't make sense. The error message you posted "An invalid XML character (Unicode: 0x1a) was found in the value of attribute" implies the line hasn't been modified.

I did replaced all the occurrences

<TcUnitOfMeasure unitOfMeasureName="Microampere" unitOfMeasureSymbol="A"/>
<TcUnitOfMeasure unitOfMeasureName="Microfarad" unitOfMeasureSymbol="F"/>
TcUnitOfMeasure unitOfMeasureName="Microgram/liter" unitOfMeasureSymbol="GL"/>
<TcUnitOfMeasure unitOfMeasureName="Microgram/cubic meter" unitOfMeasureSymbol="GQ"/>
<TcUnitOfMeasure unitOfMeasureName="micro Hertz" unitOfMeasureSymbol="HZ"/>
<TcUnitOfMeasure unitOfMeasureName="Microliter" unitOfMeasureSymbol="L"/>
TcUnitOfMeasure unitOfMeasureName="Micrometer" unitOfMeasureSymbol="M"/>

but some how my server is not picking the characters and still throwing exception at same point/line

---------- Post updated at 08:51 PM ---------- Previous update was at 08:37 PM ----------

Mean time i am trying to install UTF-8 locale, as per my search in need to execute localeadm command

-> localeadm -l -v
Verbose mode
You do not appear to have created a fresh config file since you began using this                               application.
If you have a set of Solaris install images available to you, it is recommended                               that you do so before proceeding.


Do you wish to create a new config file? [y/n]: y

Please select the option that was used to install Solaris

1.  CD installation/net installed CD images
2.  DVD installation/net installed combined image

Please enter your choice:

This mean i need to request my admin team get the install CD's and install?, is there any way i can download a package file and install these

What says:

grep 'TcUnitOfMeasure unitOfMeasureName="Microampere"' model_dbextract.xml | od -c

?

---------- Post updated at 07:59 ---------- Previous update was at 07:52 ----------

You don't if one of hindi, japanese, korean, thai or chinese is okay for you.
You can simply set this variable in your profile and log in again:

LC_ALL=hi_IN.UTF-8
export LC_ALL
1 Like

Hello experts

jlliagre & fpmurphy

, today i tired temporarily setting the locale to

LC_ALL=hi_IN.UTF-8 export LC_ALL

The upgrade went fine, and now i reverted back to old locale by removing etry in .profile, mean while requested my IS team to load the locales on my servers

Thank you very much much for all effort you guys spent in helping to fix this issue and mean while i learnt quite a bit about locale feature in Solaris