Segfault When Parsing Delimiters In C

Another project, another bump in the road and another chance to learn. I've been trying to open gzipped files and parse data from them and hit a snag. I have data in gzips with a place followed by an ip or ip range sort of like this:

Some place:x.x.x.x-x.x.x.x

I was able to modify some code I found that works fine for parsing the data to only show the ips:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main (void) {
char str[128];
char *ptr;

strcpy (str, "Some place:x.x.x.x-x.x.x.x");
strtok_r (str, ":", &ptr);

printf ("%s\n", ptr);
return 0;

}

Result:

$ ./test       
x.x.x.x-x.x.x.x

However, when I add it to the code I have for opening the gzips and reading them I get a segmentation fault. Here is the code I am trying to work from now:

#include <stdlib.h>
#include <string.h>
#include <errno.h>

int main(int argc, char *argv[])
{
  const char prefix[] = "zcat ";
  const char *arg;
  char *strip;
  char *range;
  char *cmd;
  FILE *in;
  char buf[4096];

  if (argc != 2) {
    fprintf(stderr, "Usage: %s file\n", argv[0]);
    return 1;
  }

  arg = argv[1];
  cmd = malloc(sizeof(prefix) + strlen(arg) + 1);
  if (!cmd) {
    fprintf(stderr, "%s: malloc: %s\n", argv[0], strerror(errno));
    return 1;
  }

  sprintf(cmd, "%s%s", prefix, arg);

  in = popen(cmd, "r");
  if (!in) {
    fprintf(stderr, "%s: popen: %s\n", argv[0], strerror(errno));
    return 1;
  }

  while (fscanf(in, "%*s %99[^\n]", buf) == 1){
    strcpy (strip, buf);
    strtok_r (strip, ":", &range);
    printf("%s\n", range);
  }

  if (ferror(in)) {
    fprintf(stderr, "%s: fread: %s\n", argv[0], strerror(errno));
    return 1;
  }
  else if (!feof(in)) {
    fprintf(stderr, "%s: %s: unconsumed input\n", argv[0], argv[1]);
    return 1;
  }

  return 0;
}

I tried to look at this with strace and it seems to die directly after reading the first line. Any thoughts appreciated.

You forgot to include stdio.h for printf, etc, which is a crash-causing error in 64-bit programs.

Most major problem:

char *strip;

...

strcpy(strip,buf);

You use strip without giving it any sort of valid pointer like what you did with cmd and malloc, or just giving it contents from the start, like buf.

There's no point copying it either, use buf directly.

There's also no point making your program more complicated with strtok_r for a program this simple.

You also forgot error checking after calling strtok, which would be a crash-causing error for any line not containing :

Also, while that's a clever use of scanf but there's a built-in function which does that faster and more simply called fgets.

Also, you forgot to call pclose when the program's done, which can cause zombie processes.

Also, prefix should be a define, not a variable. (What that really does is copy from a constant array into a non-constant one at runtime. Using a #define, or just a "" string, just uses the original source.)

Also, never use sizeof() to determine the lengths of strings. That worked by pure coincidence here, since you put it in an array of content-defined length, but that will surprise you in some contexts. sizeof(buf) would always be 4096. sizeof(cmd) would either be 4(32-bit systems) or 8(64-bit systems). strlen() avoids that ambiguity.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>

#define PREFIX "zcat "

int main(int argc, char *argv[])
{
  const char *arg;
  char *cmd, buf[4096];
  FILE *in;

  if (argc != 2) {
    fprintf(stderr, "Usage: %s file\n", argv[0]);
    return 1;
  }

  arg = argv[1];
  cmd = malloc(strlen(PREFIX) + strlen(arg) + 1);
  if (!cmd) {
    fprintf(stderr, "%s: malloc: %s\n", argv[0], strerror(errno));
    return 1;
  }

  sprintf(cmd, "%s%s", PREFIX, arg);

  in = popen(cmd, "r");
  if (!in) {
    fprintf(stderr, "%s: popen: %s\n", argv[0], strerror(errno));
    return 1;
  }

  while(fgets(buf, 4096, in)) {
    char *tok=strtok(buf, ":"); // First token, 'place'
    // Second token, 'xxx-xxx'
    if(tok != NULL) tok=strtok(NULL, ":");
    // If anything was found, print it
    if(tok != NULL) printf("%s", tok);
  }

  if (ferror(in)) {
    fprintf(stderr, "%s: fread: %s\n", argv[0], strerror(errno));
    return 1;
  }
  else if (!feof(in)) {
    fprintf(stderr, "%s: %s: unconsumed input\n", argv[0], argv[1]);
    return 1;
  }

  pclose(in);

  return 0;
}

This is a much easier shell script than a C program, by the way.

#!/bin/sh

zcat "$@" | awk -F: '$2 { print $2 }'
2 Likes

Thank you Corona688. I actually had stdio.h in my code, but did not copy it correctly when pasting to this thread. I found strtok was used when using delimeters in C when I looked up a lot of examples. I used strtok_r to be thread safe for later.

There's no doubt this could be more easily done with Bash, but when you are parsing multiple lists that are millions of lines in length, C seemed like a better option. Plus I have more to do that would be better used with C in this project.

I appreciate all your suggestions and will review all this tonight in hopes of making this better and cleaner. As will most of my projects, I think being sleep deprived got to me.

Thanks

The more complicated you make it, the better off you'd be using a text processing language for text processing.

That was not a BASH script, that was an awk one. awk is quite efficient.

It is dead easy to read gz files if you use this:

http://http://www.zlib.net/manual.html#Utility

It works on plain files too so you don't need to worry if its gz or not.

If you can find a suitable scanf-format, you might be able to do it all with one fscanf. If you have to change things later, this could be useful, or not. Probably not, but you are the judge.

while (fscanf(fp, "%*[^:]:%4095[^\n]\n", buf) == 1)
  ...

Juha

Once again, there is a built-in function to do that, fgets.