Unclear pointer and array

Hello,
The purpose of the program is to print a sub string from the prompt inputs. I do not understand why char pointer does not work but char array will for line 40 and Line 41.

./a.out thisisatest 0 8
substring = "thisisat"

And my code is:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

char *substring(size_t start, size_t stop, const char *src, char *dst,
        size_t size)
{
    unsigned int count = stop - start;
    if (count >= --size) {
    count = size;
    }
    sprintf(dst, "%.*s", count, src + start);
    return dst;
}

int main(int argc, char **argv)
{

    char *text; //Line 40, change to text[100] will work
    char *a;    //Line 41, change to a[100] will work
    int start, end;

    if (argc != 4) {
    printf("Error! Usage:\n\t \
argv[0]=program;\n\t \
argv[1]=input string\n\t \
argv[2]=start_position of string\n\t \
argv[3]=end_postion of string\n");

    return 1;
    }

    strcpy(text, argv[1]);
    start = atoi(argv[2]);
    end = atoi(argv[3]);

    printf("substring = \"%s\"\n",
       substring(start, end, text, a, sizeof(a)));

    return 0;
}

Thanks a lot!

Hai yifangt

allocate memory like this, for your program

char *text=(char*)malloc(64);
char *a=(char*)malloc(64);

char a[7] = "HELLO";

Here String "Hello" is stored in Character Array 'a' , where size is 7 . here array 'a' stores characters in contiguous Memory Location. It will take following form after initialization.Each array location will get following values - and last 2 are unused.

| 0 | 1 | 2 | 3 | 4 |  5 | | |
|'H'|'E'|'L'|'L'|'O'|'\0'| | |

In this method element is accessed Sequentially

char *a = "HELLO";

String "Hello" will be stored at any Anonymous location in the form of array. We even don't know the location where we have stored string, However String will have its starting address .

here Address = [Base Address of Anonymous Array] +

suppose you have to access a[3] then you need to get pointer value, add 3 to the pointer value , and gets the character pointed to by that value.

#include <stdio.h>

int main()
{
  char (*ptr)[7];
  char arr[7] = {'y','i','f','a','n','g','t'};
  int i;

  ptr = &arr;
  for(i=0; i<7; i++)
  {
    printf("Pointer value: %c Array value: %c\n", (*ptr),arr);
  }
}
$ ./a.out 
Pointer value: y Array value: y
Pointer value: i Array value: i
Pointer value: f Array value: f
Pointer value: a Array value: a
Pointer value: n Array value: n
Pointer value: g Array value: g
Pointer value: t Array value: t
1 Like

Thanks Akshay!
So the problem is, at Line 40 &41 of my code, the pointers (*text, *a) that were only declared but not allocated to memory, while the arrays text[100], and a[100] are allocated to memory automatically when they were declared. Is my understanding right? I do not have clear ideas on these points.
And, when I tried your suggestion:

char *text=(char*)malloc(64);
    char *a=(char*)malloc(64);

There is a bug that seems to be related to the memory allocation.

$ ./a.out thisisatest 0 6
substring = "thisis"
$./a.out thisisatest 0 7
substring = "thisisa"
$ ./a.out thisisatest 0 8
substring = "thisisa"
$ ./a.out thisisatest 0 9
substring = "thisisa"
$ ./a.out thisisatest 0 10
substring = "thisisa"

It seems to me *a only gets 7 byte plus '\0' at the end, total 8 bytes. Why not 64 bytes as allocated at Line 41?

The declarations:

    char *text;
    char *a;

allocate space on the stack for two pointers to characters. And the size of a is the size of a pointer (4 bytes in a 32-bit application; 8 bytes in a 64-bit application).

With these declarations, the initial value of these pointers is whatever random bytes happen to have been on the stack. When you copy data or read data into an area pointed to by an uninitialized pointer, you will get a memory fault, a bus error, or overwrite data at some random location depending on what random bytes on the stack happen to underly your pointers. If you malloc() space for arrays of characters and assign the pointers that malloc() returned to your pointers, then you won't be overwriting a random location in memory, but you would still have a problem because the sizeof(a) in:

       substring(start, end, text, a, sizeof(a)));

is the size of the pointer; not the number of bytes allocated by malloc() to the array pointed to by a .

The declarations:

    char text[100];
    char a[100];

allocate two arrays of 100 bytes each on your stack. And the size of a is 100 bytes.

Thanks Don!
Got your points about the array and pointer. But, when the substring is less than 8 char long, the output is correct, which seems the first 7 bytes are "not randomly overwritten". Why?
So it seems I have to use array in the original function (Line 40, Line 41). Then, what is the correct way if pointer is used? Thanks a lot again!

I repeat:

And, since you're getting 8 bytes for sizeof(a) , we know that you are building your application as a 64-bit app; not as a 32-bit app. You don't want sizeof(a) when a is a pointer; you need to use the number of bytes allocated to the space pointed to by that pointer instead.

1 Like

Yes, got it!
Want to catch the pointer from the example. After I changed to:

substring(start, end, text, a, strlen(a)));

seems working, but need confirm if I did in the correct way.

That doesn't sound right. Please show us all of your current code.

Here it is:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

char *substring(size_t start, size_t stop, const char *src, char *dst)
{
    unsigned int count = stop - start;
    size_t size = strlen(dst);

    if (count >= --size) {
    count = size;
    }
    sprintf(dst, "%.*s", count, src + start);
    return dst;
}

int main(int argc, char **argv)
{
    char *text=(char*)malloc(100*sizeof(char));  //Line 40, change to text[100] will work
    char *a=(char*)malloc(100*sizeof(char));    //Line 41, change to a[100] will work
    int start, end;

    if (argc != 4) {
    printf("Error! Usage:\n\t \
argv[0]=program;\n\t \
argv[1]=input string\n\t \
argv[2]=start_position of string\n\t \
argv[3]=end_postion of string\n");

    return 1;
    }

    strcpy(text, argv[1]);
    start = atoi(argv[2]);
    end = atoi(argv[3]);

    printf("substring = \"%s\"\n", substring(start, end, text, a));

    return 0;
}

Rewrote the substring() function as compared with the original version which may only handle string array specifically.
I came to the dead corner of my knowledge on string pointer in C: How to get the string length when you dynamically allocate the memory space for it? Or after you allocate the memory of a string pointer, how to get the string length?

Try this instead:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define IOBufLen        100

char *substring(size_t start, size_t stop, const char *src, char *dst, size_t size)
{
    unsigned int count = stop - start;
    if (count >= --size) {
        count = size;
    }
    sprintf(dst, "%.*s", count, src + start);
    return dst;
}

int main(int argc, char **argv)
{
    char *text=(char*)malloc(IOBufLen);  //Line 40, change to text[100] will work
    char *a=(char*)malloc(IOBufLen);   //Line 41, change to a[100] will work
    int start, end;

    if (argc != 4) {
        printf("Error! Usage:\n\t \
argv[0]=program;\n\t \
argv[1]=input string\n\t \
argv[2]=start_position of string\n\t \
argv[3]=end_postion of string\n");

        return 1;
    }

    strncpy(text, argv[1], IOBufLen - 1); // -1 to be sure text is null terminated.
    start = atoi(argv[2]);
    end = atoi(argv[3]);

    printf("substring = \"%s\"\n", substring(start, end, text, a, IOBufLen));

    return 0;
}

Do you see why these changes make it work?

1 Like

Thanks.
Does your code waste some memory for a if you have

#define IOBufLen        100

for the print out line:

printf("substring = \"%s\"\n", substring(start, end, text, a, IOBufLen));

as a is only the substring of text? In another way,

strncpy(text, argv[1], IOBufLen - 1); // -1 to be sure text is null terminated.

does this line always allocate 100 bytes to text even text is only 10 char long?
I am concerned if to process 100 millions of lines (my sequences).

It wastes exactly the same amount of space your code wasted. My changes to your code just made changes to keep you from overwriting data following the space pointed to by text if your input string is longer than the space you allocated for your buffer.

Are you going to keep 100 millions of lines of data in memory at once, or are you going to process a line at a time? If you are processing a line at a time, wasting something less than 100 bytes is absolutely nothing to worry about.

If you're going to try to malloc() 100 million buffers for your output substrings, you need to reconsider lots of issues as you design code to deal with that much data. (Note that you're talking about 800,000,000 bytes just to hold the pointers to your buffers!) Obviously, you could allocate space for your output buffer in your substring() function based on the length of the string it is actually going to store into that buffer.

1 Like

Yes,

but I do not know how to at this moment. Thanks a lot!

Hi yifangt try this simplified version

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

char *substring(size_t start, size_t stop, const char *src)
{
    
    unsigned int count = stop - start;
    char *dst=malloc(stop+1);
    sprintf(dst, "%.*s", count, src + start);
    return dst;
}

int main(int argc, char **argv)
{
    if (argc != 4) {
    printf("Error! Usage:\n\t \
            argv[0]=program;\n\t \
            argv[1]=input string\n\t \
            argv[2]=start_position of string\n\t \
            argv[3]=end_postion of string\n");

    return 1;
    }

    printf("%s\n",substring(atoi(argv[2]), atoi(argv[3]), argv[1]));
    return 0;
}
1 Like

Thanks Akshay!
Seems I understand sprinf() to return a string, but could you please explain this line?

sprintf(dst, "%.*s", count, src + start);

suppose the input string is "A test string" from start = 3, end = 9.
Or, more specifically the part "%.*s", which I assume is for the formats. How does it work? Thanks again.

Could this help you ?

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

main(){

int start = 3;
int stop  = 9;
const char src[] = "A test string";

// print 6 char from src+start 
printf("%.*s\n",stop-start, src + start);

// &src[3] == src+3
printf("%s %s\n", &src[3], src+3);

// what is src+start ?
printf("src + %d = %s\n",start,src+start);

// print nchar from src+start
int nchar = 9;
printf("%.*s\n",nchar, src + start);


}
$ gcc test2.c
$ ./a.out 
est st
est string est string
src + 3 = est string
est strin

---edit----

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

main(){

int start = 3;
int stop  = 9;
const char src[] = "A test string";

// I confused from  variable "count" that whether you like to print "char-start to char-stop" '3 - 9' or n char from start ?
unsigned int count = stop - start;

// print 6 char from src+start 
printf("char %d-%d = %d ~ %.*s\n",start,stop,count,count, src + start);

// n char specified in stop from starting
printf("Total char %d from start ~ %.*s\n",stop,stop,src + start);


}
$ ./a.out 
char 3-9 = 6 ~ est st
Total char 9 from start ~ est strin
1 Like

You need to look at the man page for the printf() function. You'll see that:

sprintf(dst, "%.*s", count, src + start);

is shorthand for:

char fmt[10];
sprintf(fmt, "%%.%ds", count);
sprintf(dst, fmt, src + start);

without the need for the space for the intermediate format string.

1 Like

Hi Don!
Your reply confused me more. (It seems a typo for man page for sprintf() instead of printf() function. Right?). Yes, I did look at the man page first. The closest part from the manpage of sprintf is:

int sprintf(char *str, const char *format, ...);

sprintf(),  snprintf(),  vsprintf()  and vsnprintf() write to the character string str.
 One  can  also specify explicitly which argument is taken, at each place where an argument is required, by writing "%m$" instead of '%' and "*m$" instead of '*', 
where the decimal integer m denotes the position in the argument list of the desired argument, indexed starting from 1.  Thus,
           printf("%*d", width, num);    and     printf("%2$*1$d", width, num);  are equivalent.

Still not clear to me.
For Akshay, I am fine with the pointer moving like src+start. The difficult part is the "%.*s", which is my first time to see it, and I was thinking:

sprintf(dst, "%.*s", count, src + start);

is equal to:

sprintf(dst, "%s %d", count, src + start);

Is that right? But Don's line

sprintf(fmt, "%%.%ds", count);

confused me even more.
The format specification pointers and address with printf() is one of the most difficult part for me. Whenever I saw warning or error msgs like:

warning: format �%c� expects type �int�, but argument 2 has type �char *� [-Wformat]
warning: format �%c� expects type �int�, but argument 2 has type �size_t *� [-Wformat] etc. 

I panic and bang my head on the table.

swap argument like this sprintf(dst, "%s %d", src + start,count);

Let me take one example printf("%*d", 5, 10) will result in " 10" being printed, with a total width of 5 characters, and
printf("%.*s", 3, "akshay") will result in "aks" being printed.

#include <stdio.h>
main(){
             printf("%*d\n", 5, 10); 
             printf("%.*d\n", 3, 10); 
             printf("%.*s\n", 3, "akshay");
      }

$ ./a.out 
   10
010
aks
#include <stdio.h>
main(){
           // Equal
           printf("%.*s\n", 3, "akshay");
           printf("%.3s\n", "akshay" );
      }
$ ./a.out 
aks
aks

Don's approach is not confusing me

#include <stdio.h>
main(){
          int count = 3;
          char fmt[10];
          char src[6] = "akshay";
          char dst[6];

          // Don's approach : following statements creates format that is
          // fmt will be "%.3s"
          sprintf(fmt, "%%.%ds", count);

          printf("Format is : %s\n", fmt);

          // following statement is equal to sprintf(dst,"%.3s",src)
          sprintf(dst, fmt, src);
          printf("output is : %s\n", dst);
      }

$ ./a.out 
Format is : %.3s
output is : aks
1 Like

Now I got it!

sprintf(fmt, "%%.%ds", count)

Don used a "recursion-like" style and escaped the symbol %%, that I never used before. (By the way, wish there is a book to cover those tricks/exceptions that you can't find in K&R book.)Thank you both so much!