Pointer and address

yifangt · May 7, 2013, 3:07pm

This code is to print out the program name and arguments list one by one:

  1 #include<stdio.h>
  2 
  3 void main(int argc, char *argv[])
  4 {
  5         int iCount = 0;
  6         while (iCount < argc) {
  7                 printf("argc:%d\t%s\n",iCount, argv[iCount]);
  8                 iCount++;
  9         }
 10 }
 11

I understand it is about pointer and pointer address of the array. Can somebody explain to me why line 7 should not be as :

printf("argc:%d\t%s\n",iCount, *argv[iCount]);

I thought *argv[iCount] is the correct way as argv[iCount] is one of the array members address, not the content of the pointer pointing to. But my understanding is wrong! What did I miss? Thanks a lot!

DGPickett · May 7, 2013, 3:24pm

argv is an array of pointers to arrays of char, so first think of it at char**, something that needs one dereferencing ( * or [#] ) to get to char*, which C uses as string. The actual rules for argv is that the last element is a null pointer, so you can size it somewhat analagous to strlen(). The argc is just for your convenience, provided by the loader. char** could be char[][] which is a 2 dimensional heap of characters locally, but the name would still need dereferencing. Since printf's '%s' demands a char*, one layer of dereferencing is right. Another way to think of '*' is '[0]', so *argv is the first char of argv[0], is argv[0][0]. Array names are of type type even if the array is local, but local arrays have size of the array not size of a pointer. So, 'char *x = "Y" ;' is a local pointer variable initialized with a pointer size 4 to a constant string, two hunks of storage, but 'char x[] = "Y" ;' is a local array size 2 of characters Y and null (sized implicitly by the initializer), one small hunk of storage, and there is no pointer storage, just an array name that is always a pointer.

hanson44 · May 7, 2013, 3:31pm

Suppose you had "int int_array [12]". Then "int_array [n]" would be an int.

argv is declared as "char *argv " So "argv [n]" is a "char *".

"char *" is what "printf %s" expects and what makes sense, because you are trying to print a string.

"argv [iCount]" is the content of an array slot. The content happens to be an address, because "char *" is an address.

Suppose "argv [1]" is "123456". Then "*argv [1]" is '1', the single character 1. You could do an experiment and change the code to the following and verify this (untested).

printf("argc:%d\t%c\n",iCount, *argv[iCount]);

The basic answer is that argv[iCount] is compatible with %s and *argv[iCount] is compatible with %c.

yifangt · May 7, 2013, 3:45pm

Actually I did many similar test as you suggested, but I always got error message like:

program.c: In function �main':
program.c:7:3: warning: format �%c' expects argument of type �int', but argument 3 has type �char *' [-Wformat]

I think this is the point that I did not catch

argv [iCount] , The content  happens to be an address,

My confusion was also with a similar code I was trying:

1 #include<stdio.h>
  2 
  3 void Print(char *[]);
  4 void main()
  5 {
  6         char * pn[] = {"Fred", "Barney", "Betty", "Wilma", NULL};
  7         Print(pn);
  8 }
  9 void Print(char * arr[])
 10 {
 11         while (*arr != NULL) {
 12                 printf("arr %p \t %s\n", &arr, *arr);
 13                 arr++;
 14         }
 15 }

In this code line 12 was what I expected and printed out correct thing.
and the output is:

arr 0x7fffd4f3bb58      Fred
arr 0x7fffd4f3bb58      Barney
arr 0x7fffd4f3bb58      Betty
arr 0x7fffd4f3bb58      Wilma

Why this time in line 12 *arr was used instead of arr?

DGPickett · May 7, 2013, 4:24pm

Yes, %s is for char*, a pointer to a null terminated string like "Hi!", a pass by reference

but %c is for just one char, like '!'. A char is 8 bits treated as an unsigned integer and passed here by value, just like int to %d.

hanson44 · May 7, 2013, 5:18pm

Because arr[0] is exactly the same as *arr

In general, arr is exactly the same as *(arr + 1)

This is one of the most difficult and fundamental concepts of C programming. It took me years of reading and re-reading to "get it". You need to keep at this, because it's so important to understand pointers and arrays.

char *arr is an array of pointers. The loop keeps incrementing the address of the start of the array. Each time, it tests the contents of the first element, and if == NULL stops the loop. Each time, it prints the contents of the first element of the array, because *arr and arr[0] mean the same thing.

--------------------------------

If you can compile, run, and understand this, it will likely help:

$ cat array.c
#include<stdio.h>

void main(int argc, char *argv[])
{
        int iCount;

        for (iCount = 0; iCount < argc; iCount++) {
                printf("argc:%d\t%s\n",iCount, argv[iCount]);
        }
        for (iCount = 0; iCount < argc; iCount++) {
                printf("argc:%d\t%c\n",iCount, *argv[iCount]);
        }
        for (iCount = 0; iCount < argc; iCount++) {
                printf("argc:%d\t%c\n",iCount, argv[iCount][0]);
        }
}

$ gcc array.c
$ a.out 123 xyz
argc:0  a.out
argc:1  123
argc:2  xyz
argc:0  a
argc:1  1
argc:2  x
argc:0  a
argc:1  1
argc:2  x

The second two loops are doing exactly the same thing, just written two different ways, *argv[iCount] and argv[iCount][0]

yifangt · May 7, 2013, 6:13pm

Thanks a lot! I had thought I had understood pointer, but actually not at all!
Can I say the address of a pointer is an array of char there in these two cases (strings!)? In the second sample *arr was printed out by looping thru the actual char array, but argv[iCount] was not looped thru, where *argv[iCount] was only the first char of its contents. Is this correct?
Thank you so much,

hanson44 · May 7, 2013, 9:57pm

Here's how I would put it. Much of this you probably already know.

If you declare int n = 4; , then the compiler reserves a special location to store an int value, and stores 4 there. No big deal. The location for n is an address, some big number the program uses to keep track of where that n variable is located. We cannot change the address of n, so that's an important point. And we normally don't need to know the value of the address. The address value for n might change each time the program runs.

If you declare int *n_ptr = &n; , then the compiler reserves a special location to store a pointer value, and stores the address of n there. n and n_ptr are both variables. Again, n_ptr has an address, some big number used to keep track of where it is. Again, we cannot change the address of n_ptr. Of course, *n_ptr is the contents of the n var, so 4 in this case.

The difference between n and n_ptr is that the program "knows" (because it reads the declaration) that n holds an int, and that n_ptr holds the address of an int variable.

Another difference is that n++ results in adding one (like 4 -> 5), but n_ptr++ results in adding something like four or eight (like 1000000 -> 1000004) to move the address to the next slot that can hold an int. If the size of an int is four bytes, then n_ptr++ increases n_ptr by 4 (not 1).

&n and &n_ptr both make sense. You can take an address of either variable. You could do int **n_pp = &n_ptr;

But only *n_ptr makes sense (would equal 4). *n makes no sense, does not compile, because the n variable does not point to anything else. You can only dereference (with *) a pointer variable, or an array.

Inside the running program, a pointer (like char *ptr_var; and array (like char array_var[10]; ) are treated the same in many ways, but are different things.

Some similarities: &ptr_var and &array_var both work and produce the same result. *ptr_var, *array_var, ptr_var[0], array_var[0] all work and produce the same result.

A difference: ptr_var++ works, and increases the stored value. But array_var++ does not work, because it will try to increase the address of array_var and this does not make sense, is not allowed. Just like we cannot change the address of the n variable, we cannot change the address of the array_var. You could say array_var[0]++ and that's OK.

--------------------------------

I would say the pointer has an address and also holds an address.

I would say argv was looped through in both cases, but just a different way, either by increasing iCount or by increasing the address held in the argv variable.

Yes, *argv[iCount] is only the first char of the contents, and is exactly the same as argv[iCount][0]

----------------------------

This stuff is confusing for many or most C programmers. But it's very interesting and really important for effective programming. As you are doing, one good way to learn is to write little test programs, and see what happens.

yifangt · May 7, 2013, 11:37pm

Thanks a lot Hanson!
I am always told "without pointer you cannot get C"! I need more reading and digestion. With your help, and those from the forum, I feel optimistic to catch it. Thank you very much again!
Yifangt

DGPickett · May 8, 2013, 1:21pm

Better "The address in pointer argv points to memory containing an array of pointers, each pointing to memory containing a character array, except the last (highest) pointer in the array is null." The argv pointer is stored on the call stack as a parameter, first bit of memory, points to the array of pointers, second bit of memory, and if there are, say, 5 pointers in the array, 4 point to additional areas of memory with null terminated character arrays, and the 5th, highest is set to zeros (null). For instance, &argv might be 0xFFFFE078, containing a heap address 0x00031244, and the array of pointers is in 0x00031244-57 inclusive. The first pointer in 0x00031244-7 might be 0x00030711. The first character array may occupy 0x00030711-5, loaded with "haha", 4 char and a null.

alister · May 8, 2013, 3:24pm

To the OP:
If what follows confuses the issue, please ignore it. It's not critical to what you are dealing with at the moment.

If you'll pardon a few nits ...

The null pointer is not required to be all zeroes. Its representation is implementation defined. Further, null pointers to different types are allowed to have different internal represenations (even though a zero in source code in a pointer context is always converted to the correct internal representation for a null pointer of that type).

You mention argv pointing to the heap. You did not state that this is invariably the case, nor is it my intention to imply that you did. However, I wanted to mention that main's arguments and the environment can be found above the stack (at least on Linux and *BSD x86/amd64). On all of my systems, heap < stack < argv.

A typical result from a 32-bit x86 Linux system:

heap: 0x804a000
stack: 0xbf9023ac
argv: 0xbf9023d4

I would be genuinely interested in knowing if some of the proprietary unices (Solaris, HP-UX, AIX, etc) do things differently, but such posts would muddle the OP's thread. If you (or anyone else) are interested, please visit O argv, argv, wherefore art thou argv?

Regards,
Alister

DGPickett · May 8, 2013, 3:41pm

Some would argue the heap is only things malloc()'d/new, and the other stuff is code pages, constants pages, preloaded initialized variables.

I have noticed environment at the end of core dumps, so exec() may use copy it to automatic storage before calling main. It'd be interesting to see if what you pass exec as environment can affect the core image.

Generally, executable stuff and control structures are at the bottom. The OS may want constants in separate pages marked read only, code in pages marked read/execute no write and of course the modifiable heap in read write no execute pages. The stack is growing down from the top, and the heap up from the bottom, and an mmap() may take out a hunk of address space.

Of course, with 64 bit CPUs, the addresses double in size, increasing reach and slowing execution. It's about time for someone to make a CPU with variable length addresses, like utf-8. Past CPUs have had variable length addressing to speed smaller bits of code needing only short relative reach.

alister · May 8, 2013, 6:19pm

I would consider that to be a weak argument, even though it has historical appeal. Once upon a time, the limits of malloc'd memory coincided with sbrk(0). This is no longer necessarily the case.

Several modern malloc implementations utilize mmap, meaning that the allocated memory is no longer contiguous. It does not make much sense to refer to malloc'd memory as a heap if it is fragmented.

glibc malloc uses sbrk for allocations less than MMAP_THRESHOLD and mmap for allocations greater than or equal to that value (iirc, MMAP_THRESHOLD is usually set at 1 megabyte). FreeBSD and NetBSD mallocs use mmap but can be configured to use sbrk. OpenBSD malloc uses mmap exclusively (in part because memory mismanagement of non-contiguous regions is more likely to trigger a segfault and expose bugs, and has done so in the past).

I have no knowledge of proprietary UNIX mallocs, but I welcome any information on that front.

To me, the heap ends with sbrk(0).

Forgive me if I seem to be quibbling over semantics, because that is not my intention. Nor is it my intention to win an "Internet argument" that is ultimately of little importance. I simply consider this to be an interesting and relevant topic on which many people are misinformed (I am not suggesting that you are one of them).

Regards,
Alister

yifangt · May 8, 2013, 9:14pm

Now the discussion is way beyond my current comprehension, but it is what I am expecting to understand. I am lost! Thank you guys!

DGPickett · May 9, 2013, 1:29pm

One needs to know 1 layer too much to know you know enough. It is great that the compiler and linker take care of the alllocation of all the bits, but you should know that 'char *x;' makes one pointer x on the local stack not initialized, 'char *x = "123";' makes a writable initialized character array of 4 bytes on the heap and one pointer x on the stack initialized to point to that array, but 'char x[] = "123" ;' creates a 4 character array on the stack and x is a pointer to it, not stored in run time memory, just known to the compiler. Some things are compile time only, and some are run time.

JAVA and I suppose parts of C# have automatic housekeeping so junk createdon the heap is recycled when all references to it are released. So, 'String x = "123" ;' creates a local String Class reference x and an immutable String object on the heap holding the three characters. If later in the code flow you say 'x = "234" ;', the last reference to "123" is destroyed, so it can be garbage collected, and a new String Object is created on the heap.

In C/C++, if you rewrite your char*, the old char[] is lost in a memory leak.