Solaris 5.8 - BinaryFile - Endianness

angshuman_ag · September 2, 2009, 12:25am

Dear Users,
How do I account for endian-ness while reading a Binary File generated from Solaris 5.8 (Sparc) on a Windows XP (x86) system. I just know that its a Binary File. I have no knowledge about the native datatype of the Binary File.

jim_mcnamara · September 2, 2009, 12:47am

IF the file came from tape or other media created on a Solaris machine, then it is big endian.

If you (s)ftp'ed the file to Windows then you can read it on Windows. Windows is little endian and so is the file.

Otherwise you will have to convert the file, search the forums for 'change endianness of file'

angshuman_ag · September 2, 2009, 3:35am

Hi,
Thanks for the reply.
This file comes in an encapsulated form from some other Unix machine over the network and then the Target OS i.e. Solaris 5.8 reads and extracts the relevant part from the encapsulation as a Binary and writes & saves in a file on the Hard Disk. This saved file is the input binary file for me which I took in a USB or through FTP.

So, how do I interpret the endian-ness here ?

jlliagre · September 2, 2009, 6:31am

A file endianness as not that much to do with OS endianness. That's the file format that matters so if you have no idea about it, why bother ?
Windows on x86 can certainly create big endian files if the binary format mandates it just like Solaris on SPARC will create binary files with little endian data inside. Any combination of big and little endian can be found in the binary files, just like binary files flowing the internet can be little, big or a mixture of these indianness.

angshuman_ag · September 2, 2009, 7:40am

Ok. I agree.
But, the problem is the Unix C Code which parses this Binary File is not working on the WIN32 platform although it is compiling successfully on VS-2005. It starts parsing from the file's base address and tries to read data at particular Hardcoded Offsets from the base & these memory locations return garbage value on Windows. Hence, the question of endian-ness arose.

Any thoughts on this behavior ?

jlliagre · September 2, 2009, 7:51am

Not only the data would be garbage but also the offset if not hardcoded.

You didn't tell you had the parser source code. In that case, you need to modify that code for it to be able to correctly parse the data. You can use the htonl family of macros on the x86 machine to convert from network (big endian) to host (little endian) order.

angshuman_ag · September 2, 2009, 8:31am

I used a HEX Editor and converted the binary file to BIG-endian format. And, then I ran the same parser code. It still returns garbage

So, keeping the original Binary File intact, while reading, how do I judge the native datatype for byte swapping order - because these n/w apis suggest something like this -

ntohs() - �network to host short�
Converts the unsigned short integer netshort from network byte order to host byte order.
ntohl() - �network to host long�
Converts the unsigned integer netlong from network byte order to host byte order.

jlliagre · September 2, 2009, 8:35am

A binary hexadecimal editor is of no use with a structured binary file which I guess yours is.

That's not the whole file that you need to convert but only the various portions of potentially various size that are spread out at various locations.

angshuman_ag · September 2, 2009, 1:52pm

When I open/dump the Binary File on Solaris using "strings" command & "od" command and compare the dump with the same Binary File opened on Windows in a HEX Editor, I can see that the values are exactly at the same offset in both OSes.

Am I missing something while parsing (which I have not noticed yet) ?

jlliagre · September 2, 2009, 5:37pm

You are missing what endianness is about. It is absolutly expected a file content is the same regardless of the CPU architecture. What differs is how numerical values are interpreted.

angshuman_ag · September 3, 2009, 2:19am

Thanks for the assistance.
While interpreting the numerical values in the parser code, the parser does something like this :-

Reads 2048 bytes and stores in a char* buffer.
Then, reads an "int" like this :
int count = *(buffer + 8);

So, to take care of the endianness, should I swap four bytes first i.e. swap based on the sizeof(datatype) and then assign the value to the variable ? Will this suffice for endian-ness ?

jlliagre · September 3, 2009, 2:59am

That's the correct way.

angshuman_ag · September 4, 2009, 8:02am

Thank You.
Now, if I do that way, then is it necessary to swap every BYTE (based on sizeof(datatype) ) ? Because, at some memory locations I am getting the expected numerical value directly (without swapping) and some values appear to be correct only after swapping. Is it correct ?

jlliagre · September 4, 2009, 8:59am

This looks odd. It would help if you can have a look at the source code that wrote the data.

angshuman_ag · September 4, 2009, 9:11am

Hi,
I am getting expected values with the way I wrote in my previous two posts. I think the source code (not available, as of now) which has written this binary file has been written like that. Else how is it possible to get the expected the values ?

---------- Post updated at 06:41 PM ---------- Previous update was at 06:38 PM ----------

Any thoughts on this behavior ?

achenle · September 4, 2009, 9:16am

What is the exact data type at those locations? What is the source code you're using to read it? The more specific data you can present, the better.

Cross-platform coding is tedious. Literally every bit has to be accounted for. Which is why in cases where space and performance aren't too much of an issue I like to use portable formats like XML or even simple ASCII text.

angshuman_ag · September 4, 2009, 9:26am

True.
I am inferring the exact datatypes at those locations by looking at the parser code bcos original BIN file generation source is not there. And, yes, this is very tedious ! That is why, as you said we have XMLs. But, we always have some legacy systems with which sometimes we have to adjust.

jlliagre · September 4, 2009, 10:16am

It doesn't make sense for a binary file to contain both big and little endian values but this happens and sometimes there are even explanations about it.
However, if your Windows little endian code reads correctly say a 16bit or 32 bit integer value AND the very same code compiled on a big endian machine read the very same values at the same location without byte swapping, then you are just lucky enough to have a value that reads the same in both standards, like (assuming 16 bit unsigned integers) 0, 257, 514, 771, ..., 65535.

angshuman_ag · September 6, 2009, 3:43am

What about 32-bit unsigned long ?
In the parser code it is tryng to read something like this -

unsigned long temp = *(unsigned long *) ptr;It gives a value like this "2030043136" which is not expected. Now I tried to swap 32-bits
like this -

unsigned long temp = Swap32Bits(*(unsigned long *) ptr);
but still not sure of the correct value ! How to test such cross-platform code ?

jlliagre · September 6, 2009, 4:35am

Not sure about how Swap32Bits is implemented but your value converts to 121 which looks a pretty valid number.