Pipe usage error while visiting directories

To begin with FYI, I really struggled with the question before asking to simplify as much as I can around 1 hour and to increase the code's readability I neglect error checks.

I intend to communicate parent and child using PIPE. In the following program I do traverse given path and its subdirectories and calculate total size of what are included in the current directory individually. Upon completion, I write each directory's PID, size and path into `COMMON_FIFO` which will be read in `main()` eventually. My questions are following that

  • Can data corruption occur while writing `FIFO` cuz of its atomic size since the main process wait for all children to be reaped. Doesn't it? If yes, how can I solve the problem?
  • Is the written information(each directory's PID, size and path) guaranteed post-order since each directory is gone into orderly? If no, how can it be solved?
    char const * COMMON_FIFO = "FIFO_TEST";
    
    int walk_dir_and_calculate_sizes(char * path) {
    	opendir() ..
    	int totalRegularSizes = 0;
    	while ((entry = readdir) != NULL) {
    		char const *name = entry->d_name;
    		COMBINE path and name for example path is a/, name is b.txt(a/b.txt) or another DIRECTORY c(a/c)
    		IF (entry->d_type is DIRECTORY) {
    			if (name is "." or name is "..")
    			continue;
    
    			childPid = fork();
    			if (childPid == 0) {		//child
    				walk_dir_and_calculate_sizes(COMBINED_PATH);
    				_exit(0);
    			}
    
    			if (childPid > 0) {			// parent
    				continue;
    				// continue traversing if remaining dirs or files exist
    			}
    		}
    		ELSE {	// regular file like .txt, .pdf etc.
    			totalRegularSizes += another_func_giving_size_of_file(COMBINED_PATH);
    		}
    	}
    
    	closedir(..);
    	while (wait(&status) > 0);
    
    	RESTORE COMBINED_PATH to directory path, it is ok
    
    	int fd = OPEN(COMMON_FIFO, O_WRONLY);
    	char folderSizeInformation[100] = Current PID, directory size(= totalRegularSizes), directory name combinations
    	WRITE(fd, folderSizeInformation, strlen(folderSizeInformation)
    	CLOSE(fd)
    
    	return 123456 not important I think;
    
    }
    
    int main(char ** argv) {
    	mkfifo(COMMON_FIFO, 0644);
        OPEN(COMMON_FIFO, O_RDONLY|O_NONBLOCK);
    
        walk_dir_and_calculate_sizes(argv[1]);
    
        PRINT FIFO ON SCREEN
    
        exit(0);
        
    }

Let's modify the prior code's `walk_dir_and_calculate_sizes` function in which each directory's process calculates total sizes what the directory includes(regular files) to `WITH_PIPE_walk_dir_and_calculate_sizes` in which I intend to transmit each subdirectory's calculated size to its parent by using PIPE. For example, Directorey A includes B which includes C. Invidially `A's size is 10kb, B's is 5kb, C's 1kb`. In walk_dir_and_calculate_sizes gives result in `A's size is 10kb, B's is 5kb, C's 1kb` however in WITH_PIPE_walk_dir_and_calculate_sizes it yields `A's size is 16kb, B's is 6kb, C's 1kb` since C doesn't have child(subdirectory).

    char const * COMMON_FIFO = "FIFO_TEST";
    
    int walk_dir_and_calculate_sizes(char * path) {
    
    	/* PIPE */
    	int pfd[2];
    
    	opendir() ..
    	int totalRegularSizes = 0;
    	while ((entry = readdir) != NULL) {
    		char const *name = entry->d_name;
    		COMBINE path and name for example path is a/, name is b.txt(a/b.txt) or another DIRECTORY c(a/c)
    		IF (entry->d_type is DIRECTORY) {
    			if (name is "." or name is "..")
    			continue;
    
    			/* PIPE */
    			pipe(pfd);
    
    
    			childPid = fork();
    			if (childPid == 0) {		//child
    				walk_dir_and_calculate_sizes(COMBINED_PATH);
    				_exit(0);
    			}
    
    			if (childPid > 0) {			// parent
    				continue;
    				// continue traversing if remaining dirs or files exist
    			}
    		}
    		ELSE {	// regular file like .txt, .pdf etc.
    			totalRegularSizes += another_func_giving_size_of_file(COMBINED_PATH);
    		}
    	}
    
    	closedir(..);
    
    	/* PIPE COMMUNICATE READ */
    	while (wait(&status) > 0) {
    		close(pfd[1]);
    		int readVal = -1;
    		READ(pfd[0], &readVal, sizeof(readVal));
    		totalRegularSizes += readVal
    	}
    
    	RESTORE COMBINED_PATH to directory path, it is ok
    
    	int fd = OPEN(COMMON_FIFO, O_WRONLY);
    	char folderSizeInformation[100] = Current PID, directory size(= totalRegularSizes), directory name combinations
    	WRITE(fd, folderSizeInformation, strlen(folderSizeInformation)
    	close(fd)
    
    
    	/* PIPE COMMUNICATE WRITE */
    	// Since it is recursive function the deepest directory doesn't have any child(subdirectory)
    	// it omits wait() function and comes here
    
    	close(pfd[0]);
    	WRITE(pfd[1], &totalRegularSizes, sizeof(totalRegularSizes));
    
    
    	return 123456 not important I think;
    
    }

But in `WITH_PIPE_walk_dir_and_calculate_sizes` function I get Bad File Descriptor error while closing read end before writing to pipe. I'm really in dilemma. Why is my idea wrong? How can my intent be achieved Of course I'm not experienced guy, I can have another mistakes or oversights, please inform me.

Thanks a lot.

@Edit, by the way, I overlooked the point that I'm doing only one pipe as thinking one directory has one subdirectory BUT a directory can of course have more than one subdirectory. So I think we need more pipes to a parent.

If it's saying bad file descriptor it probably means it. Without seeing your actual code, I can't tell why you're closing a bad file descriptor, you should print the FD's to stderr when you open a pipe, and print them to stderr again when you try and close it to see what's going on. But I have some further comments.

fork() is pointless. Disks don't multithread. Forcing it to read 19 directories at once will make your disk run 19 times slower. You already benefit from the caching and read-ahead built into the OS, too.

Second, there's a system function for what you want to do, ftw() It operates depth-first, so every time you see a new second-level folder, you'll know everything afterwards will be within that folder until it leaves.

#include <ftw.h>
#include <sys/stat.h>
#include <stdio.h>
#include <string.h>

struct FTW {
        int base;
        int level;
};

struct result_t {
        char name[256];
        long int size;
} result[64];

int last_result=0;

int ftw_callback(const char *fpath, const struct stat *sb,
        int typeflag, struct FTW *ftwbuf);

int main(void) {
        int n;
        const char *ROOT="./";
        strcpy(result[0].name, ROOT);
        ftw(ROOT, ftw_callback, 8);

        for(n=0; n<=last_result; n++)
                printf("%s\t%ld bytes\n", result[n].name, result[n].size);
}

int ftw_callback(const char *fpath, const struct stat *sb,
        int typeflag, struct FTW *ftwbuf) {

        // Found a new second-level folder
        if((typeflag == FTW_D) && (ftwbuf->level == 1)) {
                last_result++;
                strcpy(result[last_result].name, fpath);
        }

        if(typeflag == FTW_F)
        {
                int res=last_result;
                // Special case for level-1 files, those are in ROOT
                if(ftwbuf->level == 1) res=0;
                result[res].size += sb->st_size;
        }

        return(0);
}
1 Like

If you really wanted to do IPC between processes just to count directories, though, shared memory beats pipes IMO. mmap() an anonymous segment, and each fork()ed process will have access to it. Give each child a unique index to mess with so they don't stomp on each other, wait() for each child to quit, and tada.

1 Like