Sscanf variable based on the previous field from fgets in pure C

I need to parse input line into structure that contains union. In each line the 6th field is based on the previous (5th) field. Here is the sample data(with header):

ID		Name		Gender	Age		Job		Class/Teaching
1001	Johnson 	M		18		S			501
1002	Lily	 	F		19		S			302
1009	Lyon		M		36		T			Math
1010	Marilyn		F		32		T			Art
1011	Longha		M		20		S			601

Tried to use structure array to hold the record of each line. fgets is used to read in each line which is then parsed with sscanf to fill the members of the structure. The 6th field (Class/Teaching) contains string or integer based on the 5th field (Job), so that a union is used to hold the data in this field. Following is my code:

struct info {
	int id;
	char name[20];
	char gender;
	int age;
	char job;
	union {						/*Use union inside the struct */
		int classNumber;
		char Teaching[10];
	} category;
};

typedef struct info INFO;		/*Define INFO in place of struct info*/

INFO person[15] = {0};
char line[256];
unsigned int i = 0;				//increment counter

while (fgets(line, 256, stdin) != NULL) {
	sscanf(line, "%d %s %c %d %c", &(person[i].id), person[i].name, &(person[i].gender), &(person[i].age), &(person[i].job));
	if (person[i].job == 'T') {		/*according to the Job field, receive different data types in the union*/
		sscanf(line, "%d %s %c %d %c %s", 
	  		   &(person[i].id), person[i].name, &(person[i].gender), &(person[i].age), 
			   &(person[i].job), person[i].category.Teaching);
	}
	else if (person[i].job == 'S') {
		sscanf(line, "%d %s %c %d %c %d", 
			  &(person[i].id), person[i].name, &(person[i].gender), &(person[i].age), 
			  &(person[i].job), &(person[i].category.classNumber));
		}
		i++;
}

Because the 6th field (Class/Teaching) is based on the 5th field (Job), so I used sscanf() twice to parse the line. The first sscanf parses the first 5 fields (until Job field) to decide which type of data should be used for the 6th field (Class/Teaching). The second sscanf parses all the 6 fields till the end of each line.
The code seems working (attached is the full code with sample data), but I was wondering:

  1. Is this the correct way to do the job because sscanf() is used twice and the first 5 fields are scanned twice?
  2. I was thinking the way to parse the line till the 5th field, check its type, then continue to parse the 6th field so that each line is scanned only once. How to achieve this?
    For practice I push myself using pure C only. Thanks in advance.
    struct18_forum.txt (1.8 KB)

Hi @yifangt,

there is no need to parse each line twice. It is enough to read the last column as a char[] and then convert it to int if Job has the value S:

char class_or_teaching[10]; // same size as in union

while (fgets(line, 256, stdin)) {
    if (i++) { // skip header. note that persons start at index 1 then
        sscanf(line, "%d %s %c %d %c %s", &person[i].id, person[i].name, &person[i].gender, &person[i].age, &person[i].job, class_or_teaching);
        if (person[i].job == 'S') {
            if (!sscanf(class_or_teaching, "%d", &person[i].category.classNumber)) {
                printf("E: class %s is not a number in line %d\n", class_or_teaching, i+1); // converts to 0 in that case
            }
            printf("%s %d\n", person[i].name, person[i].category.classNumber);
        }
        else if (person[i].job == 'T') {
            sscanf(class_or_teaching, "%s", person[i].category.Teaching); // could also use strdup()
            printf("%s %s\n", person[i].name, person[i].category.Teaching);
        }
        else {
            printf("E: unknown job %c in line %d\n", person[i].job, i+1);
        }
    }
}
// you could also put the counter and fgets in a for loop:
// for (unsigned int i = 0; fgets(line, 256, stdin); i++)
// and you could use a switch statement for checking person[i].job

You could also omit the Job column entirely, since it's redundant: If the last column is an int, the data represents a student, otherwise it represents a teacher.

4 Likes

@yifangt , in addition to @bendingrodriguez advice, you could also save a few bytes by changing the int's to short if appropriate (and/or by aligning the members on [long]word boundaries).

( test using sizeof(struct) .... )

@bendingrodriguez @munkeHoller Thanks!
Yes, it works well with this one-round parsing (and replace ints with short based on the example data).
To confirm my understanding: char class_or_teaching[10]; is used as a temporary variable for a second parsing based on Job. Is this correct?
I like this smart use of sscanf() that I should have understood.

yes, it's only used for reading in the last column, then its value is converted resp. copied into the corresponding union member.

Thanks a lot!