Strcpy vs strdup

Perhaps someone can help me explain this: both strcpy and strdup are pointers to strings according to the man pages, but a segmentation fault is returned when replacing line 36 with strcpy(node->country, "QQ"); in the following code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAXSIZE 100

typedef struct sData {
  int date;
  char *country;
  char *symbol;
  struct sData *next;
} Data;

Data *first = NULL;
Data *last = NULL;

int main (int argc, char *argv[]) {
  char line[MAXSIZE], symbol[MAXSIZE];
  int date = 0;
  Data *node;

  FILE *input = fopen(argv[1], "r");;

  while (fgets (line, sizeof(line), input) != NULL) {
    line[strlen (line)-1] = '\0';

    if (sscanf (line, "%s %d", symbol, &date) != 2) {
      printf ("Line didn't scan properly: %s\n", line);
      return 1;
    }

    node = malloc (sizeof (Data));

    // Calculate members of the data node
    node->date = date;
    node->country = strdup("QQ");     // USING strdup HERE INSTEAD OF strcpy
    node->symbol = strdup (symbol);
    node->next = NULL;
    printf("%d %s %s\n", node->date, node->country, node->symbol);
    // if the list has been populated already, create the next node
    if (first != NULL) {
      last->next = node;
      last = node;
    // else create the list anew
    } else {
      first = node;
      last = node;
    }
  }

  fclose (input);
  return 0;
}

My input file is as follows:

ZWC 20220906
ZWC 20220907
ZWC 20220908
ZWC 20220909
ZWC 20220912
ZWC 20220913
ZWC 20220914
ZWC 20220915
ZWC 20220916
ZWC 20220919

Compilation: gcc structtest_.c -g -Wall -o structtest
Execution: ./structtest symbols.data

Why would the use of strcpy on a struct-member give a segmentation fault and strdup not?

strcpy expects the place(pointer to) its copying into has enough space - if there's none or not-enough space then expect the program to fault

man strcpy

strdup allocates memory into which it copies the string being duplicated
man strdup

2 Likes

In addition to the excellent answer by @munkeHoller , there is an additional subtlety.

strcpy takes 2 args, dest and source, and copies the string. It returns a copy of dest which can be passed to another function, but will never be different to what you passed. It is almost always ignored.

strdup only takes a source arg. You have to store the returned char* (or you can't even access the copy), and you have to eventually call free() on it (or you have a memory leak).

I thought that is what node = malloc (sizeof (Data)); was supposed to do? I still don't understand why strcpy would overwrite memory it shouldn't, whereas strdup has no such behaviour. Should there be another memory allocation step for just node->country between the malloc step and the strcpy step?

I interpret this comment to mean that constructs like the following are seldomly used:
usemelater = strcpy(node->country, "QQ");
Otherwise, what exactly is almost always ignored?

Thank you for pointing that out, however, wouldn't memory be freed upon normal termination of the program? Or are you suggesting that downstream calculations could overwrite memory that is not freed?

memory is 'freed' upon program completion/exit/crash

memory will 'leak' when you leave the function it was in and never go back or worse call the function repeatedly - ie each strdup call results in 'new' memory being allocated so any previously allocated memory is no longer accessible as the pointer has been overwritten with the new address ...

crude example

    const char* s1= "Hello World";
    char *theword = strdup (s1);   /* memory allocated and string copied etc */
    theword = strdup(s1);  /* 'new' memory allocated and string copied, 1st lot of memory no longer accessible - its lost - constitutes a 'leak' */

#
# 
#
    char *theword = strdup (s1);   /* memory allocated and string copied etc */
    free(theword); /* memory allocate now released back to pool, */
    theword = strdup(s1);  /* memory allocated and string copied, no leak  */

if not familiar with valgrind its an excellent tool wrt debugging etc

1 Like

all node = malloc (sizeof (Data)); is doing is allocating space to hold a structure of
Data size which is

int 4 bytes to store a number in
all the others are pointer types * , so either 4 or 8 bytes
the pointer types are placeholders they don't have associated 'storage' that needs to be allocated and the address of that allocated storage is assigned to the appropriate pointer.

sizeof( Data ) will tell you the actual size.

1 Like

usemelater has no obvious purpose because it will be a copy of node->country, which you already have.

In fact, usemelater would be a potential risk of later errors. For example, you might free both node->country and usemelater, which could corrupt the free list. You might free the memory with either of them, and later access that memory via the other pointer when it has already been re-used by another malloc.

My personal style is to keep data structs such that there is only one primary chain of addresses to any object -- no cross-links exist.

Memory is indeed taken back by modern OS when the process ends. (Some old Windows systems failed do be reliable in this respect.) However, many long-running processes continually churn memory as they process different events, and leaks can accumulate until the system eventually struggles with performance.

1 Like

If country and symbol were character arrays like char country[40];, then sizeof (Data) would include space for the string, but it would be uninitialised. It would be up to you to copy the string into it, after ensuring it would fit in (39 chars plus a '\0').

Defining char *country avoids any size restriction on the text, and also makes Data smaller because it now contains an 8-byte pointer instead of a 40-byte char array.

The downside is now that country is a pointer that is uninitialised, not a char text. So you need to both create a pointer (malloc) and copy in the text (stdcpy).

strdup (symbol) is basically shorthand for:

char *addr = malloc (strlen (symbol) + 1);
return ((addr == NULL) ? NULL : strcpy (addr, symbol));

And that is probably also the only time I ever used the return address from strcpy.

1 Like