help with data type sizes

omega666 · January 14, 2011, 4:18pm

i'm using a C program and running it on a linux server, i got 2 adressess of 2 variables, and 2 addresses of 2 chars, and compared it. and got the size of a int and the size of a char.
why is a size of a int (4 bytes) bigger then the size of a char (1 byte)?

also if i do &a-&b i get 1, but if i get the difference of the address i get 4, why is that?

DGPickett · January 14, 2011, 4:44pm

When loading phydsical accumulators and registers with numbers, they are stretched to fit, with zeros if unsigned or by repeating the sign bit at the high order bit of the comp-2 representation. Similarly, C stretches the values to the bigger type to compare or compute.

Usually, a char is 1 byte unsigned (0-255), and an int is 4 bytes signed (+/- 2 billion plus). Old C models had int and short at 2 and long at 4, most 32 bit models have int and long at 4, and long models have int at 4 and long at 8.

Floating types are particularly variable in computers, but in C it is 4 bytes, and there is an IEEE spec on floating numbers that makes them portable. Differnt float types are 64 Double or 80 Long Double or is it Double Double, no, that feels wrong, bits. Float has some bits for mantissa, usually as a 2's complement fraction between .5 and < 1.0 when normalized, where 010000- is 1/2, and exponent, usually a 2's complement power of 2 but sometimes a 2's complement power of 16 or an "excess" number, where the boundary between negative and positive is shifted to a more useful range. None of this leaks into IEEE floating representations.

C increments pointers for each type by its size. types.h gives a lot of info on alternative C types. Some APIs just typedef their int types int1, int2, int4 and int8 or unsigned uint1, uint2, uint4, uint8.

Positions of items in a struct are usualy formatted #pragma pack 4 by default, and generally, you can save space by declaring like sized types adjacent.

omega666 · January 14, 2011, 4:47pm

not sure if i understand why &a-&b gets me 1 when i print it, but if i get the difference manually via calculator i get 4.

DGPickett · January 14, 2011, 5:05pm

Well, & is the address, and it goes up one for every char and usually 4 for every int. Addresses are guaranteed to be subtractable even if they are in odd formats internally. Also, adding integers is the same as offsetting an array, so for array "char c[] = "Hello" ;", c + 5 = &(c[5]) points c[5] = *(c+5) to the null of the "" string after the 'o' of hello.

For int arrays, adding one moves it 4 in char terms, but if you subtract int addresses, you get a result / 4 unless you cast them as (char*) first. So, for "int i[8];", &(i[5]) - &(i[2]) = 3, which turns out to be more useful than 12 unless you are doing i/o or malloc(), which are byte char operations.

If you are doing an array of struct, the struct size is multiplied for you into any increment or offset. You can discover the real size with sizeof() or by casting addresses of two adjacent struct to (char*) and subtracting, a popular interview question some places.

Corona688 · January 14, 2011, 5:06pm

It's giving the offset in integer-sizes, I suppose. If you want it to just be integers for easy arithmetic, cast pointers into unsigned long before doing arithmetic.

omega666 · January 14, 2011, 5:09pm

still not sure if i understand, say i do this
a=6; b=7;
print (&a-&b)
i would get 1
but if i do
c=&a; d=&b;
print(c-d)
i would get 4

why is this?

DGPickett · January 14, 2011, 5:19pm

Well, the first makes arithmetic sense, and the second makes memory allocation sense. Whether int 4 are in the heap or the stack, they are packed tight and so are 4 apart. Automatic are in the stack, so each pulls the stack pointer down by 4, but static or global are in the heap, which moves up.

---------- Post updated at 05:19 PM ---------- Previous update was at 05:14 PM ----------

I am not sure that is portable for all CPUs!

I think there is a printf option for printing pointers that is portable.

Corona688 · January 14, 2011, 5:28pm

I think it's reasonably portable among 32-bit and 64-bit varieties of UNIX, and 32-bit varieties of Windows. Perhaps not 64-bit windows, in which long inexplicably doesn't change size even though the integer width of the processor doubles...

The printf "%p" option is portable, but what it actually ends up printing is entirely platform and library-specific. Sometimes you get hex, sometimes you don't. Sometimes you get a leading 0x, sometimes you don't. On an old DOS compiler I got ????:???? for segment and offset -- making pointers portable on that was a whole new ball of wax, it had at least two different kinds!

omega666 · January 14, 2011, 5:31pm

so when i do the print (&a-&b) which gets me a 1, that means that the values were in the heap, and when i do the print(c-d), c and d were in the stack.
but why does values in the heap have a different arithmetic then the values in the stack?

Corona688 · January 14, 2011, 5:38pm

Where the variable is stored has nothing to do with how it does pointer math. I can't see your code from here, but I'd venture they were all on the stack, with nothing in the heap. What type it's stored in does matter. Math on pointers is done in multiples of the base size, math on integers is just plain math.

So &a-&b is arithmetic on pointers themselves, and happens in multiples of the base size.

But if you convert them to integers first -- integers, not pointers to integers -- the compiler is no longer aware of any base size and does just plain arithmetic.

Try this:

#include <stdio.h>

int main(void)
{
        int a;
        int *b=&a;
        int *c=b+1;
        int d=b;
        int e=c;

        printf("diff between pointers is: %d\n", (unsigned long)(c-b));
        printf("diff between ints is: %d\n", e-d);
}

This makes it more obvious what's going on.

When you do math on a pointer, it puts it in terms of it's base size. An integer pointer plus one, actually gets sizeof(int) added to it, in this case 4. For plain integers, it makes no adjustment.

omega666 · January 14, 2011, 5:40pm

ok, can you explain more about
"When you do pointer arithmetic directly, the compiler does a little more legwork for you, delivering the result in multiples of the base size."

Corona688 · January 14, 2011, 5:47pm

I think I did, and we crossposted. But to simplify a bit:

int a;
int n=&a;
n++; // n is an integer.  it adds only one.
int *o=&a;
o++; // o is a pointer.  it adds sizeof(int).

Pointer math happens relative to the base size. integer math is just plain math.

omega666 · January 14, 2011, 7:08pm

for chars if i do a="a" then get the address (&a) i get some large negative number, but i know the size of a char is 1 byte or 8 bits, so shouldn't the value be between 0 and 255 ?

also if converting an address of an int to hex and printing it why is it never negative?

Driver · January 15, 2011, 9:41am

The reason pointer arithmetic is scaled for the base type is simply to support array operations.

Given

int x[2];

... we access the second element of the array by writing x[1] or equivalently *(x + 1). You cannot fully separate arrays and pointers because the value of an array is a pointer to its first element, such that most array operations will involve pointers to their elements. It would be quite inconvenient and error-prone if we had to multiply the element number with the base type size to access the element.

The address of your char has a different type - it's char *, not char - and will probably (but not necessarily) have the same size as any other pointer, and typically amounts to 32 or 64 bits on most systems.

Signedness is a matter of how the data is interpreted. The %x format specifier expects an unsigned argument and will therefore treat all bits as data, whereas if you're using %d, the most significant bit of your value will be treated as sign bit - if it's set, the number is negative, otherwise it isn't.

Pointer encoding is system-specific, but it is generally not signed, so you should not print it as a signed value.

On Windows (and arguably other systems as well, though I tend to use unsigned long) it is best to convert your pointer to size_t, because that will give you an __int64.

omega666 · January 15, 2011, 4:14pm

int a=1;
int b=2;
for the example with &a-&b = 1, can you explain how the pointer math (where its relative to the base size) works and how its different from normal math arithmetics, and why it becomes 1.

for this example i get
&a = -1078176900
and
&b = -1078176904

Corona688 · January 16, 2011, 1:07am

They're integers, and the address of a is precisely one sizeof(int) away from the address of b. It's that simple.

They're not really negative numbers. Since they're in stack space their addresses are very high, and end up flipping over to negative when you print them as signed integers. Try %x or %u instead of %d. (or really, to be portable, you should use %p).

omega666 · January 16, 2011, 12:50pm

even through the addresses are negative, why does %x (convert to hex) make it positive? is it because %x expects only unsigned integers, and since this is c, integers are signed on default, so the conversion will think the leftmost bit is part of the numbers magnitude, thus making the hex a positive always?

Corona688 · January 16, 2011, 1:53pm

Look at it this way. Is this group of 32 bits signed or unsigned?

11111111111111111111111100000000

by itself it's neither... Whether it's signed or not is wholly up to how you interpret it.

When you print it as a %d, it assumes the bit in red is the sign bit, meaning that bit adds -2147483648. It even does this if you try to print an unsigned int with %d.

When you print it as %x or %u, it just considers that bit +2147483648.

So, whether the variable's signed or not, it's just a 32-bit "thing" as far as printf's concerned, it formats it according to how you tell it to.

DGPickett · January 17, 2011, 4:51pm

Heaps are allocated up from 0 and stacks down from the virtual address range ceiling, so subsequent allocations go opposite ways. This should usually exceed ( 3,000,000,000 / 4 ) if it does not overflow the accumulator:

{
 static int heap ;
 int stack ;
 printf( "%u\n", &stack-&heap );
}

Driver · January 18, 2011, 7:33am

I don't think the way you are using this term is common. Most of the time when people refer to "the heap" of a program, they mean the area(s) from which dynamic memory is allocated (malloc()/calloc()/realloc() in C with new/delete added in C++).

Your variable named "heap" is static and uninitialized, so it will commonly be stored in a "BSS" area which is allocated and fixed at program startup time. Dynamically allocated memory can be (but does not need to be) obtained using (s)brk(), which is a legacy way that indeed implies a block area of dynamic memory. However, the more modern mmap() interface to obtain memory may behave in a completely different manner and grab pages from varying distinct locations (one obvious newfangled justficiation comes to mind immediately: address space randomization), and that's what many modern malloc() implementations use instead of (s)brk().

Also, the issue is even less clear-cut in multithreaded applications - which have more than one stack.

Counter example: HP-UX, where the stack "grows upwards".

To summarize, address space layout is strongly system-specific and it is rarely possible, meaningful or useful to generalize its characteristics.

What point were you trying to make?