writing binary file (C++)

lamachejo · June 10, 2011, 9:52am

Hi guys, I am writing a vector into a binary file (linux of course), but somehow there seems to be something wrong with this code, because the output file size in binary is BIGGER than doing it with the same vector but insead of binary, in plain text, and I as far as I understand, binary files size it's lesser than saving it as a text.

Here's the code

	
int aux=0;
	    ofstream fo(file, ios::out | ios::binary);
	    fo.write("MP-PER-B \n",10*sizeof(char));
	    fo.write((const char*) (&numero),sizeof(int));
	    for(int i=0; i<numero; i++){
			aux=vector;
			fo.write ((const char *) (&aux),sizeof(int));
		}	
	  fo.close();

Also, being unable to read it from linux command line makes things difficult (I can't see if it wrote the numbers since I do not know any command that would read the file made ).

DGPickett · June 10, 2011, 10:40am

I like cat -vt and od -bc for looking at binary files, to see what is extra. If you are hex or decimal not octal, adjust od options per man od.

Corona688 · June 10, 2011, 10:54am

Depends what's in it. Your integers are probably four bytes. "32\n" would be slightly smaller. Whereas binary is always the same size every time.

hexdump -C filename

lamachejo · June 10, 2011, 11:00am

Doing the hexdump -C shows letters and symbols, (and the MP-PER-B at the first line)...

It's not supposed to do that, right? It should print numbers since I wrote (or at least that was my intention :wall: ) numbers in the file.

DGPickett · June 10, 2011, 11:04am

Yes, but realize that in x86 and other systems, integers and floats are little-endian, so a short of 258 is hex 0201, whereas in SPARC and other big-endian systems, and in Internet packet headers, it is hex 0102.

Second, "#pragma pack" tells the compiler what modulus to allocate storage on, so if you write a struct or such, it may be padded.

lamachejo · June 10, 2011, 11:07am

Is it any different on 64 bits OS?

Corona688 · June 10, 2011, 11:14am

Look closer.

#include <stdio.h>
int main(void)
{
        int n=32;
        fwrite(&n, 1, sizeof(n), stdout);
}

$ ./a.out > bin.bin
$ hexdump -C bin.bin
00000000  20 00 00 00                                       | ...|
00000004

Imagine that; you get four binary bytes representing the integer 32. Because this is a little-endian system, they end up all backwards. Try echo $((0x00000020)) in your shell.

It is.

They are numbers. Binary numbers.

---------- Post updated at 09:14 AM ---------- Previous update was at 09:08 AM ----------

'int' types are 32-bit even on most 64-bit systems. long integers, though, are generally 64-bit on 64-bit systems (and 32-bit elsewhere). And if your system is neither 32 nor 64 bits, all bets are off.

If you're concerned about your integers changing size when your code gets moved, you can #include <stdint.h> and use int32_t to get a 32-bit integer that'll always be a 32-bit integer.

lamachejo · June 10, 2011, 11:18am

Thanks! Well it seems it actually works

writing 5 random numbers

00000000 4d 50 2d 50 45 52 2d 42 20 0a 05 00 00 00 02 00 |MP-PER-B .......|
00000010 00 00 01 00 00 00 02 00 00 00 04 00 00 00 01 00 |................|
*
00000020

this is the output of the binary file

Corona688 · June 10, 2011, 11:24am

Makes sense, yeah. It looks like hexdump squeezed some zeroes together though, where it put the *, got to watch for that.

You can write entire structures and such to file by the way, you don't have to deal with individual atomic types.

int arr[512];

struct
{
        int a;
        float b;
        char c[32];
} stuff={1, 3.14159, "stuff"};

fwrite(arr, 512, sizeof(int), fileptr);
fwrite(&stuff, 1, sizeof(stuff), fileptr);

You can't write entire classes, though. Or any structures containing pointers. Both may contain information that may not make sense when reloaded.