Problem With UTF8 Byte Order Make

Hi

Im migrating a few websites from my old webserver (CentOS-5) to a new server (CentOS6) , one of these websites is multilingual and has a lot of utf8 files(html,php) with different languages (i.e arabic, persian, russian ,etc).

In old server when i do:

file mailer.php

I get :

mailer.php: UTF-8 Unicode C++ program text

But when i transfer these files to the new server and do the file command again i get this :

mailer.php UTF-8 Unicode (with BOM) C++ program text

And the files will display question marks "????????" when i browse the website!

What should i do to stop the OS to add BOM to these files ?

In all likelihood, the OS didn't change the files, it is just that the file utility was made smarter to distinguish between different types of C++ program text.

Look at /etc/magic on both systems and see if there is something containing the string "with BOM". It may help you understand why CentOS6 adds that phrase in the output from the file utility. However, since it is a programming language check, the rules file is using might not be in /etc/magic.

I've checked the OSs and only CentOS6 has /etc/magic file and there is nothing in there .

Any other suggestions would be appreciated :slight_smile:

On each of the two machines, calculate each file's md5 digest. Are they identical?

If not, then you need to specify exactly how you transferred them.

If they are identical, the problem lies elsewhere.

Simply mishandling a BOM is unlikely to generate a lot of "???????" sequences. I suspect that the browser is using an incorrect encoding (perhaps because either the webserver or php is not configured correctly). Have you compared the headers sent from each server? Or have you checked that in both cases the browser is using the same encoding?

Regards,
Alister

CheckSum is the same and php/apache configurations are also the same in both servers.

Are you using the exact same browser (same user account, same version, same computer) for both sites? Did you check the http headers that the browser is receiving? Did you check the encoding that the browser is using in each case? Did you try forcing the encoding in the browser to see if the page's multi-lingual text renders correctly?

Regards,
Alister

Problem Solved .

This is how :
I just found out the files are BOM in both servers but the old server (CentOS 5) doesn't show it because of the old "file" rpm package that doesn't have "BOM" in it's magic file "/usr/local/share/file/magic" but in new server (CentOS 6) with new "file" package BOM is included and it could detect it!
The reason why i got question marks in the output was because of a php option called "zend.multibyte" which suppose to handle files with BOM but it want! so i recompiled php without this option and everything back to normal .

1 Like