Concatenating struct members for sorting purposes

I don't understand why the key is correct in the first instance, but defaulted to the last known value in the second instance:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAX_LINES (int)1E03

const int MAX_LINE_SIZE = 30;
const char delim[2] = "\t";

struct temperature {
  char *date;
  char *country;
  char *location;
  char *key;
  float temp;
};
struct temperature temps[MAX_LINES];

/* compare function for sorting purposes, note that this sorting relies on there not being a duplicate */
int compare (const void *a, const void *b) {
  return strcmp((*(struct temperature*)a).key, (*(struct temperature*)b).key);
}

/* main function */
int main (int argc, char *argv[]) {
  char line[MAX_LINE_SIZE];
  FILE *f = NULL;
  f = fopen(argv[1], "r");

  int n = 0;
  char key[100];
  while (fgets(line, MAX_LINE_SIZE, f)) {
    temps[n].date = strtok(strdup(line), delim);
    if (temps[n].date != NULL) {
      temps[n].country = strtok(NULL, delim);
      temps[n].location = strtok(NULL, delim);
      temps[n].temp = atof(strtok(NULL, delim));
      // create a key (string) for sorting purposes: first by country, then by location, then by date
      strcat(strcat(strcpy(key, temps[n].country), temps[n].location), temps[n].date);
      temps[n].key = key;
      printf("n: %d\t%s\n", n, temps[n].key);
      n++;
    }
  }
  fclose(f);
  printf("Number of lines read: %d\n", n);

  for (int i = 0; i < n; i++) { printf("i: %d\t%s\t%s\t%s\t%.2f\t%s\n", i, temps[i].date, temps[i].country, temps[i].location, temps[i].temp, temps[i].key); }
  // qsort(temps, n, sizeof(struct temperature), compare);
  // for (int i = 0; i < n; i++) { printf("i: %d\t%s\t%s\t%s\t%.2f\t%s\n", i, temps[i].date, temps[i].country, temps[i].location, temps[i].temp, temps[i].key); }

  return 0;
}

Input file:

2021-11-09	UK	ACK	16.3
2021-11-09	UK	AOO	15.6
2021-11-09	UK	AGE	22.4
2021-11-09	UK	ABI	11.8
2021-11-09	UK	APA	20.7
2021-11-09	UK	ARG	28.5
2021-11-09	UK	COF	21.6
2021-11-10	UK	ACK	15.3
2021-11-10	UK	AOO	14.6
2021-11-10	UK	AGE	11.8
2021-11-10	UK	ABI	22.4
2021-11-10	UK	APA	29.6
2021-11-10	UK	ARG	28.4
2021-11-10	UK	COF	14.1

The output is as follows, where the key is correct in the first instance, but incorrect in the second instance:

n: 0	UKACK2021-11-09
n: 1	UKAOO2021-11-09
n: 2	UKAGE2021-11-09
n: 3	UKABI2021-11-09
n: 4	UKAPA2021-11-09
n: 5	UKARG2021-11-09
n: 6	UKCOF2021-11-09
n: 7	UKACK2021-11-10
n: 8	UKAOO2021-11-10
n: 9	UKAGE2021-11-10
n: 10	UKABI2021-11-10
n: 11	UKAPA2021-11-10
n: 12	UKARG2021-11-10
n: 13	UKCOF2021-11-10
Number of lines read: 14
i: 0	2021-11-09	UK	ACK	16.30	UKCOF2021-11-10
i: 1	2021-11-09	UK	AOO	15.60	UKCOF2021-11-10
i: 2	2021-11-09	UK	AGE	22.40	UKCOF2021-11-10
i: 3	2021-11-09	UK	ABI	11.80	UKCOF2021-11-10
i: 4	2021-11-09	UK	APA	20.70	UKCOF2021-11-10
i: 5	2021-11-09	UK	ARG	28.50	UKCOF2021-11-10
i: 6	2021-11-09	UK	COF	21.60	UKCOF2021-11-10
i: 7	2021-11-10	UK	ACK	15.30	UKCOF2021-11-10
i: 8	2021-11-10	UK	AOO	14.60	UKCOF2021-11-10
i: 9	2021-11-10	UK	AGE	11.80	UKCOF2021-11-10
i: 10	2021-11-10	UK	ABI	22.40	UKCOF2021-11-10
i: 11	2021-11-10	UK	APA	29.60	UKCOF2021-11-10
i: 12	2021-11-10	UK	ARG	28.40	UKCOF2021-11-10
i: 13	2021-11-10	UK	COF	14.10	UKCOF2021-11-10

Using gcc. The key defaults to "UKCOF2021-11-10" in all instances and I do not know why and it prevents me from using qsort. Can anyone point out why this is happening?

hi @technossomy ,

the variable key is being overwritten, ie the last one read/processed, hence they all have the 'same' key
as they were all pointing to the same instance of key that declared in main char key[100];,
temps[n].key = key that assignment is 'the problem' :smiley:
each temp.key instance needs its own memory allocated to hold its key.
something like this snippet below should do the trick

sprintf( key, "%s%s%s", temps[n].country, temps[n].location, temps[n].date );
temps[n].key = strdup( key );
5 Likes

@munkeHoller Has identified and solved the problem.

However, strdup() returns a new char array for every key (new as in malloc()), so you will need to go back and free() all the new key fields when you need to free up temps. Doing a large number of malloc/key for small strings is wasteful.

A neater solution would be to have a more complicated compare() function that works on all three parts of the key, in the order of significance. Then, you would not need a composite key at all.

I would initially create temporary pointers like struct temperature *A = (struct temperature*) a; to clarify things.

Use r = strcmp (A.country, B.country); on the most significant sub-key. If that resolves r != 0, no need to go further, just return (r).

For r == 0, go on to test location in the same way. If that does not resolve the match, test date.

2 Likes

My last C program is more than 20 years ago, and it took me close to an hour to wrap my mind around that little piece of code and understand @munkeHoller's solution and @Paul_Pedant's objection and proposal.

Still I'd like to discuss an idea: I see that line is strduped and holds all the tokens that are pointed to by the temps array elements. How about extending that strduped line by the key length and store the key there?

Happy to have my less than basic understanding corrected.

3 Likes

hi, this 'program' by most accounts is (imho) nothing more than a bit of 'test' code, I took it as such, the issue was obvious (but having made/seen/fixed so many code issues im my 40+yrs coding this particular instance was trivial, my explanation took longer than the fix - i edited it 3 or 4 times ), I made (and make) no comment on the how/why of what was being done and focused purely on the 'error' at hand. any/all other feedback/critique dutifully noted and ignored (as it's not of interest to me per se , regardless of its merit or otherwise). If asked how to 'improve' (by the author) then that's a different kettle of fish, but there's others keen/willing to do so regardless, so I'll leave that to them.

2 Likes

@munkeHoller Has identified and solved the problem as noted by Paul.

I would like to add that every field in the struct, except for temp has the problem that the fields are all pointers with no memory assigned.
Since you've already decided to use a fixed number of data elements, if you decide on the maximum string length for your fields, just use those as opposed to pointers.
For example, I updated the structure as follows:

struct temperature {
  char date[11];
  char country[3];
  char location[4];
  char key[16];
  float temp;
};

and then just perform a string copy in the if block:

      strcpy( temps[n].country,  strtok(NULL, delim));
      strcpy( temps[n].location, strtok(NULL, delim));
      temps[n].temp = atof(strtok(NULL, delim));
      // create a key (string) for sorting purposes: first by country, then by location, then by date
      strcat(strcat(strcpy(key, temps[n].country), temps[n].location), temps[n].date);
      strcpy( temps[n].key, key );
      printf("n: %d\t%s\n", n, temps[n].key);
      n++;
1 Like

given this is as I previously assumed a test (noddy) piece of code, if you want to simplify the comparison, then something akin to the below would suffice without adding additional 'complexity' or substantial overhead wrt time/memory etc and do away with the need of the 'key' field in the structure.

my last word (promise) on this thread :smiley:

int compare (const void *a, const void *b) {
   static char keyA[255];  /* or calculate the strlen( country + location + date) and allocate/free as appropriate for the comparison */
   static char keyB[255]; /* ditto as for a */
   sprintf( keyA, "%s%s%s", (*(struct temperature*)a).country, (*(struct temperature*)a).location, (*(struct temperature*)a).date );
   sprintf( keyB, "%s%s%s", (*(struct temperature*)b).country, (*(struct temperature*)b).location, (*(struct temperature*)b).date );
   return strcmp(keyA, keyB);
}
2 Likes

Rats, I missed that one. The issue is that memory is assigned to all the fields in each struct, but the date member is special -- it gets the malloc() address for the entire dup of the input, but the other members get pointers offset (as returned by strtok) into the same area. That is totally counter-intuitive, and freeing all the allocations is going to be a rats-nest for maintenance.

I'm pretty fuzzy about the status of 'test' code, and the distinction between bug-fix and mentoring. In my mind, this bug is a result of an incomplete concept for the data structures, and being more rigorous about the data (like using char arrays for the members) would have sidestepped the bug completely.

1 Like