Make sure strtok_r() function

yifangt · January 16, 2014, 12:16pm

Hello,
I was trying to understand more on strtok_r() function with following code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
/* *From http://www.gsp.com/cgi-bin/man.cgi?section=3&topic=strtok_r
 A FreeBSD man pages  * */

int main()
{
    char string1[80];
    char *sep = "\\/:;=-";               /*Delimiters*/
    char *word, *tmp1;                   /*For the loop, *word is the return value, tmp1 is the temporary holder of the rest*/

    strcpy(string1, "This;is.a:test:of=the/string\\tokenizer-function.");

    while( (word=strtok_r(string1, sep, &tmp1)) != NULL) {   
        printf("So far we're at %s\n", word);
        string1 = tmp1;                             //Want move string1 to the next token. Error!!!
    }
    return 0;
}

The problem is with the loop at line: string1 = tmp1 where I want move the pointer to the next token

error: incompatible types when assigning to type �char[80]� from type �char *�

Explanation and correction is greatly appreciated.

migurus · January 16, 2014, 12:32pm

Make sure to follow proper usage of this function, where on the first call the 1st argument should point to the input string, then on the following calls it should be NULL.

Corona688 · January 16, 2014, 1:46pm

I have no idea what you're even trying to do with string1 = tmp1;. That operation does not make sense.

Every time you call strtok_r(string1, ...) you are telling it to start over! Remember how strtok works.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
/* *From http://www.gsp.com/cgi-bin/man.cgi?section=3&topic=strtok_r
 A FreeBSD man pages  * */

int main()
{
    char string1[80]="This;is.a:test:of=the/string\\tokenizer-function."; // You can assign directly, without strcpy
    const char *sep = "\\/:;=-";               /*Delimiters*/
    char *word, *tmp1;                   /*For the loop, *word is the return value, tmp1 is the temporary holder of the rest*/

    word=strtok_r(string1, sep, &tmp1);

    while(word != NULL)
    {
        printf("word = '%s'\n", word);
        word=strtok_r(NULL, sep, &tmp1);
    }

    return 0;
}

$ gcc strtok.c
$ ./a.out

word = 'This'
word = 'is.a'
word = 'test'
word = 'of'
word = 'the'
word = 'string'
word = 'tokenizer'
word = 'function.'

$

yifangt · January 16, 2014, 2:37pm

Thanks Corona!
My last thread related to strtok() function in this forum triggered this one , that I had hard time to understand the pointer moving along the line:

The other reason is when I use an extra pointer, it worked fine but I do not know why!

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

/* Example how to use strtok_r() function
 *
 *From http://www.gsp.com/cgi-bin/man.cgi?section=3&topic=strtok_r
 A FreeBSD man pages
 * */

int main()
{
    char string1[80];
    char *sep = "\\/:;=-";            /*Delimiters*/
    char *word, *tmp1;                /*For the loop, *word is the return value, tmp1 to hold the rest of the parsed line */
    char *ptr;  
 
    //I want strcpy to handle FILE stream line-by-line later, if it is not the wrong way.
    strcpy(string1, "This;is.a:test:of=the/string\\tokenizer-function.");  
    ptr = string1;                    /*Extra variable to hold string1*/
    
    while( (word=strtok_r(ptr, sep, &tmp1)) != NULL) {
        printf("So far we're at %s\n", word);
        ptr = tmp1;  //string1 = tmp1;     //Want move string1 to the next token. Error!!! 
    }
    return 0;
}

So why did it work by using the extra pointer ptr?

Corona688 · January 16, 2014, 3:04pm

It doesn't work "because" of the extra pointer. You found a way around the syntax error, it's valid to assign a pointer to a pointer, but I can't imagine it's actually giving the intended results.

Every time you give strtok_r a string as the first parameter, you are telling it to start over. Keep giving it the same string, and you will keep getting the same results. Give it NULL instead, and it will advance.

That is really not how you're supposed to be using it. I repeat, this is how it's supposed to work:

char original_string[]="this is a string";
char *sep=" ";
char *token=strtok_r(original_string, sep, &tmp); // Give it the original string once and only once.

while(token != NULL)
{
        printf("token is '%s'\n", token);
        token=strtok_r(NULL, sep, &tmp); // Give it NULL.  Not the previous token.  Not the original string.  ONLY NULL!
}

yifangt · January 16, 2014, 3:51pm

Yes, your code worked perfectly and much simpler. The only thing bugs me is the NULL for the second strtok_r():

Swallowing it without digestion bothers me a lot. I was thinking this way according to the explanation:

:

word=strtok_r(ptr, sep, &tmp1))    //pop one token from the beginning, rest (&tmp1) is the left_over
ptr = tmp1;                      //move the pointer to the beginning of the rest (&tmp1)

But not sure, especially without the new pointer ptr. It seems I have to just remember and use it in your way. Trying more to get the prototype of strtok_r(), and to understand reentrancy, a related concept new to me. I hate C but I love it more.
Ok, MUST follow the usage of the function as migurus said:

...... on the first  call the 1st argument should point to the input string, then on the  following calls it (1st argument) should be NULL.

Corona688 · January 16, 2014, 4:08pm

From man strtok:

DESCRIPTION
       The strtok() function parses a string into a sequence  of  tokens.   On
       the  first call to strtok() the string to be parsed should be specified
       in str.  In each subsequent call that should parse the same string, str
       should be NULL.

strtok() wants a NULL because somewhere inside, there is an if(string==NULL) { // Use the string we had last time and literally no other reason. It's just a weird old library call that insists you use it in a very particular way.

There is no point worrying what the contents of tmp1 are either. It might not even be the same in a different libc.

I think your confusion is related to the concept of re-entrancy. A re-entrant function, if you call it twice with the exact same parameters, would do the exact same thing. strtok() violates this, because it remembers what string you gave it last time you called it.

Imagine you're breaking up the string "a:b:c|d:e:f|g:h:i" with strtok upon "|". It gives you "a:b:c", and you call strtok\(\) again with ":" to break it into "a", "b", "c". In doing so, strtok\(\) will forget the original string!

This is perfectly okay with strtok_r since you can give them different tmp variables. Those variables, not the function itself, will remember where it was last, so there is no conflict.

Corona688 · January 16, 2014, 4:33pm

An example:

//strtok-ing a string strtok gave you.
// You can do this with strtok_r, but not strtok, because
// strtok would lose its place.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main() {
        char rstr[]="a:b:c|d:e:f|g:h:i";
        char *tmp1, *tok1;

        tok1=strtok_r(rstr, "|", &tmp1);
        while(tok1 != NULL)
        {
                char *tmp2, *tok2;
                printf("Outer token:  '%s'\n", tok1);

                tok2=strtok_r(tok1, ":", &tmp2);
                while(tok2 != NULL)
                {
                        printf("\tInner Token:  '%s'\n", tok2);
                        tok2=strtok_r(NULL, ":", &tmp2);
                }

                tok1=strtok_r(NULL, "|", &tmp1);
        }

}

$ gcc strtok_r.c
$ ./a.out

Outer token:  'a:b:c'
        Inner Token:  'a'
        Inner Token:  'b'
        Inner Token:  'c'
Outer token:  'd:e:f'
        Inner Token:  'd'
        Inner Token:  'e'
        Inner Token:  'f'
Outer token:  'g:h:i'
        Inner Token:  'g'
        Inner Token:  'h'
        Inner Token:  'i'

$

yifangt · January 16, 2014, 4:55pm

This is way beyond what I had thought!!! Thanks a lot!