Problems understanding pipes

ab_tall · October 7, 2011, 12:14pm

I am trying to create a csh clone, but am having problems implementing piped commands. Specifically, the below code simply hangs after input of ls | grep <text>
It does however filter the output and display it correctly, but it appears that grep hasn't exited and my shell never comes back to the waiting parent.
I am having doubts if I am using the pipe correctly.
Following is an extract of the relevant code(pseudo):

save stdin stdout and stderr using dup.
For every command
{
    int fd[2];
    if(not last command in pipe)
    {
        if(count ==0)
        {
            if(pipe(fd_pipe) < 0)
            {
                perror("pipe");
            }
 
        }
 
 
        if(dup2(fd_pipe[1], STDOUT_FILENO) <0)
        {
            perror("pipe");
        }
        if(dup2(fd_pipe[0], STDIN_FILENO) <0)
        {
            perror("pipe");
        }
        ++count;
    }
}
else if(last command)
{
    char line[100];
 
    if(dup2(fd_pipe[0], STDIN_FILENO) <0)
    {
        perror("pipe");
    }
    if(dup2(bstdout, 1) <0)
    {
        perror("pipe");
    }
}
Execute the command in subshell(fork and exec)
}
restore stdin out and stderr.

---------- Post updated at 03:37 AM ---------- Previous update was at 03:35 AM ----------

AArgh...Have cracked my head till 4: 30 am on this, but can't figure out what's wrong.
i tried using a separate file as an intermediate storage for the pipe output.
But then grep fails to filter the input from the file.

---------- Post updated at 12:14 PM ---------- Previous update was at 04:37 AM ----------

Anyone there?

To the moderator who moved the post...seems like moving it to a more "fitting" forum killed the chances of a reply

Corona688 · October 7, 2011, 12:52pm

Some of us might be in a different timezone than you, and this forum is populated by volunteers, we are not "on call".

You also agreed to not bump posts when you registered.

I can't tell if your pseudocode's wrong or not since you didn't mention fork() at all in there.

Writing up some pseudocode for you.

---------- Post updated at 10:52 AM ---------- Previous update was at 10:41 AM ----------

The read-end won't EOF until all of the write-ends are closed, and the write-end won't die with SIGPIPE until all of the read-ends are closed, which is usually why this hangs: Forgetting to close all ends of the pipe you weren't using.

Each process you fork() gets independent copies of any pipe FD's that were open when you fork()ed, any unused ones need to be closed separately.

Also, you need to close the write-end too, once you're done with it, for the reader to hit EOF. Or the writer quitting works, too.

# parent gets writing-end of pipe, child gets reading-end of pipe.
int pipefd[2];

pid_t pid;
pipe(pipefd);

pid=fork();

if(pid < 0)
{
        perror("couldn't fork");
        exit(1);
}
else if(pid == 0) // in child
{
        dup2(pipefd[0], 0); // Overwrite STDIN with read end of pipe
        // Close writing end of pipe!  ESSENTIAL!
        close(pipefd[1]);
        execlp("/bin/cat", NULL); // exec REPLACES the current process
        perror("Couldn't exec"); exit(1);
}

// If we reach here, we must be the parent
const char *str="the owls are not what they seem\n";
int status;
dup2(pipefd[1], 1); // overwrite STDOUT with write end of pipe
close(pipefd[0]); // close the read-end!  ESSENTIAL!
write(1, str, strlen(str));  // send data to cat
close(1); // close FD so the child will get EOF.  Also essential!  wait() would wait forever otherwise.
wait(&status);
fprintf(stderr, "Returned status %d\n", WEXITSTATUS(status));

ab_tall · October 7, 2011, 1:19pm

@Corona Apologise if the bumping was excessive..
The fork is done as a part of the line, run the command in a subshell.
Thanks for the explanation though, let me ensure if I am correctly closing the ends of the pipe.

EDIT: Followup question though, is it necessary to use different pipes if there are more than 2 commands in my pipe?
Or can i reuse the same pipe?
say if I want to an ls | grep <pattern> | more.
Once ls is done with its task and grep done with the reading, the pipe would no longer be used right? so can the same be used between grep and more? [I think not..as grep would need access to 2 pipes simultaneously...one to read from and other to write to..in that case I need to address the correct file descriptors in my 2D array]

Corona688 · October 7, 2011, 3:27pm

Yes, make n-1 pipes. Writing an example...

---------- Post updated at 01:27 PM ---------- Previous update was at 01:11 PM ----------

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>

int main(void)
{
        const char *cmd[]={"ls", "tac", "less", NULL};
        int curpipe[2]={-1, -1}, lastpipe=-1, n;
        pid_t pids[64];

        for(n=0; cmd[n] != NULL; n++)
        {
                // If this isn't the last in the chain,
                // make a pipe for cmd[n] to write into.
                if(cmd[n+1] != NULL) pipe(curpipe);

                switch(pids[n]=fork())
                {
                case -1:
                        perror("couldn't fork");
                        break;
                default:// parent code

                        // Our latest child has a copy of the writing
                        // end, we don't need it anymore.
                        if(curpipe[1] >= 0) close(curpipe[1]);

                        // ...but we'll need the reading end next time.
                        // and we should close the last loop's reading end.
                        if(lastpipe >= 0) close(lastpipe);
                        lastpipe=curpipe[0];

                        // don't use those FD's again.
                        curpipe[1]=-1;  curpipe[0]=-1;
                        break;

                case 0: // child code

                        // If we have a reading end, use it.
                        if(lastpipe >= 0)
                        {
                                dup2(lastpipe, 0);
                                close(lastpipe);
                        }

                        // if we have a writing end, use it.
                        if(curpipe[1] >= 0)
                        {
                                // make a copy.
                                dup2(curpipe[1], 1);
                                // close both ends.
                                close(curpipe[1]);
                                close(curpipe[0]);
                        }

                        execlp(cmd[n], cmd[n], NULL);
                        perror("couldn't exec");
                        exit(1);
                        break;
                }
        }

        for(n=0; cmd[n] != NULL; n++)
        {
                int status;
                wait(&status);
        }
}

ab_tall · October 7, 2011, 4:56pm

@Corona

That is one elegant piece of code. Initially, I was going to bug you with added clarifications on how the code was working, but then decided that after so much effort on your part, the least I could do was trace the code.

I did and it resolved those doubts (Big surprise!)

I also went to your projects page burningsmell, and it appears that I have been talking to one awesome programmer.
Keep up the good work! And Thanks!
P.S - Additional somewhat unanswered question:
Instead of creating n-1 pipes for n commands, would it be possible to use the same single pipe by suitable adjusting the Fds?
My initial attempt tried to do that, but I got confused as to what's to be closed and what not to be.
From your code I surmised, we need to close anything in the parent that we don't use. But a close() call => that the FD no longer refers to any file. So when the parent closes the write end of the pipe with say curpipe{5,6} , how is the child allowed to use 6 to refer to the write end of the pipe?{is it because, in its address space, 6 is not closed? }
Sorry if these questions sound too basic, but i am unable to clearly visualize the address spaces like that. I think we can debug the child process using gdb, but I am not very familiar debugging multiple threads.

alister · October 7, 2011, 5:56pm

Note: In what follows, a "file description" and a "file descriptor" are not synonymous.

When you open() a file or use the pipe() system call, the kernel will create what's called a file description. This file description is a data structure that keeps track of the file offset, permissions, access mode, etc, associated with the opened resource. Aside from creating that file description, an entry is added to the process file descriptor table and you are given an integer index which points to that new entry; this is the file descriptor.

Both the file description and file descriptor tables are inside the kernel's address space. A file description is a system-wide entity. File descriptor tables are a per-process data structure. Each process has its own descriptor table. There can be multiple file descriptors pointing to the same underlying file description.

When you fork, the newly-created process is provided with its own copy of the parent's descriptor table. Initially, each entry in the child's descriptor table points to the same underlying open file description as its counterpart in the parent's table. The same is true when you exec() a new executable image, except that file descriptors which have had their close-on-exec flag set are closed.

An open file description is not closed until all file descriptors in all processes which point to that file description are closed.

Since different file descriptors in different processes can manipulate the same underlying file description, it can be considered a mode of interprocess communication.

That's probably a lot of jargon to digest at once, but I believe it covers the essentials.

Regards,
Alister

ab_tall · October 7, 2011, 6:10pm

@Thanks alister.

The clarification was very helpful indeed.
While we are on the subject, (and I know there are scattered resources available on google for this), how is the pipe underlying structure implemented?
If it is just another file, is it possible to see the contents of the pipe {so as to know what's being passed on between the read and write ends}.
Finally,
Is there a way to find out which FDs point to the same underlying description?
If it's hidden somewhere in lsof, i'll dig deeper, but if not do let me know.

Thanks again. The community here on this site is much more forgiving towards the beginners.

Corona688 · October 7, 2011, 6:13pm

Nope. One pipe is one pipe, for a chain of 10 processes you need 9 pipes.

Not sure what you mean by "adjusting the FD". FD 6 isn't "pipe number 6", it just happens to be the sixth file your process opened. Add two to it and the kernel won't give you pipe number seven, just say "What?"

Or worse -- maybe there really does happen to be an FD 8. You just spun the roulette wheel and landed on a number your process opened already. What is it? Who knows, but whatever it is, you're reading it.

6 is just a number, perhaps the 6th file your process happened to open, it doesn't mean "pipe number 6". Closing #6 doesn't close everyone else's #6.

It's completely okay to have the same file open and used in many different processes, too -- that's how shells work. When you run echo a or cat, they receive copies of the shell's own open file descriptors. That's what fork() does -- creates an almost-perfect clone of the parent, right down to memory and open files. Then they run exec() to become a different program, but keep the same open files.

So echo, cat, et all don't have to tell the shell to write to the terminal -- they do so direct.

Pipes obviously know to wait until the process writing to them finishes before saying they're done. That works for more than one process too. If you have two processes with copies of the write-end and one process with the read end, the kernel will wait for both write ends to close before the pipe gives up -- even if you just left that one open by accident. The same logic goes for the read-end.

Every process is independent. Close everything you don't need.

fork() clones a new, independent process. Each process is its own separate little universe and the only thing linking it with any other is sockets, files, pipes, and/or mapped memory.

Threads are something else entirely. When you create a thread it works in the same process, literally sharing all the same memory, all the same files. Change it in one thread and it changes in all of them. That's why threads can be so tricky -- it's easy to rip the floor out from under your threads by altering something they're using simultaneously.

ab_tall · October 7, 2011, 6:21pm

corona688:

Nope. One pipe is one pipe, for a chain of 10 processes you need 9 pipes.

Not sure what you mean by "adjusting the FD". The FD's are just numbers, sure, but they're numbers representing things in the kernel's table of open files. Add three to it and the kernel won't give you pipe number three, it'll say "What? You never opened file #67."

6 is just a number, perhaps the 6th file your process happened to open, it doesn't mean "pipe number 6". Closing #6 doesn't close everyone else's #6.

It's completely okay to have the same file open and used in many different processes, too -- that's how shells work. When you run echo a or cat, they receive copies of the shell's own open file descriptors, and just write to those direct: The shell doesn't have to write it for them.

Pipes obviously know to wait until the process writing to them finishes before saying they're done, but that works for more than one process too. If you have two processes with copies of the write-end, the kernel will wait for both of them to die before the pipe gives up -- even if one of the open ends was just left open by accident.

Every process is independent. Close everything you don't need. fork() creates a new, independent process. Threads are something else entirely.

When you create a thread it works in the same process, literally sharing all the same memory, all the same files. Change it in one thread and it changes in all of them. That's why threads can be so tricky -- it's easy to rip the floor out from under your threads by altering something they're using simultaneously.

Regarding the last portion,

What I meant to say is I am having a hard time debugging the child process once it execs, as gdb is attached to the parent.
(I thought as there is no explicit concept of threads in Linux, it would be OK to call the child process a thread, but i guess that lead to confusion.)

This point is what I missed out. Will keep this in mind in the future.

Corona688 · October 7, 2011, 6:52pm

What system are you on? On linux, you can use strace, which lists all system calls your program and its children are making. (-f means 'follow children') Just system calls, only system calls, and nothing but system calls -- not line numbers or source code. But it's useful for clearing up mysteries like "why is my program freezing" -- it's stuck up on write(). I've also used it to track down where some silly programs were looking for config files in -- just hunt for open calls to see what files they're trying to open...

It ends up as an awful big list, but it's easy to cut down with grep.

$ gcc multipipe.c
$ strace ./a.out 2> log
<process runs and finishes>
$ egrep "(fork|execve|open|close|pipe|dup|read|write)\(" log
execve("./a.out", ["./a.out"], [/* 28 vars */]) = 0
open("/etc/ld.so.cache", O_RDONLY)      = 3
close(3)                                = 0
open("/lib/libc.so.6", O_RDONLY)        = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\20m\1\0004\0\0\0"..., 512) = 512
close(3)                                = 0
pipe([3, 4])                            = 0
[pid  9380] close(4 <unfinished ...>
[pid  9381] close(4 <unfinished ...>
[pid  9380] pipe( <unfinished ...>
[pid  9381] close(3 <unfinished ...>
[pid  9381] execve("/usr/local/bin/ls", ["ls"], [/* 28 vars */] <unfinished ...>
[pid  9380] close(5 <unfinished ...>
[pid  9381] execve("/usr/bin/ls", ["ls"], [/* 28 vars */] <unfinished ...>
[pid  9380] close(3 <unfinished ...>
[pid  9381] execve("/bin/ls", ["ls"], [/* 28 vars */] <unfinished ...>
[pid  9382] close(3Process 9383 attached
[pid  9380] close(4 <unfinished ...>
[pid  9382] close(5 <unfinished ...>
[pid  9382] close(4 <unfinished ...>
[pid  9382] execve("/usr/local/bin/tac", ["tac"], [/* 28 vars */] <unfinished ...>
[pid  9381] open("/etc/ld.so.cache", O_RDONLY <unfinished ...>
[pid  9382] execve("/usr/bin/tac", ["tac"], [/* 28 vars */] <unfinished ...>
[pid  9383] close(4 <unfinished ...>
[pid  9381] close(3 <unfinished ...>
[pid  9381] open("/lib/librt.so.1", O_RDONLY <unfinished ...>
[pid  9381] read(3,  <unfinished ...>
[pid  9382] open("/etc/ld.so.cache", O_RDONLY <unfinished ...>
[pid  9382] close(3 <unfinished ...>
[pid  9381] close(3 <unfinished ...>
[pid  9382] open("/lib/libc.so.6", O_RDONLY <unfinished ...>
[pid  9383] execve("/usr/local/bin/less", ["less"], [/* 28 vars */] <unfinished ...>
[pid  9382] read(3,  <unfinished ...>
[pid  9383] execve("/usr/bin/less", ["less"], [/* 28 vars */] <unfinished ...>
[pid  9381] open("/lib/libacl.so.1", O_RDONLY) = 3
[pid  9381] read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0000\32\0\0004\0\0\0"..., 512) = 512
[pid  9382] close(3 <unfinished ...>
[pid  9383] open("/etc/ld.so.cache", O_RDONLY <unfinished ...>
[pid  9383] close(3 <unfinished ...>
[pid  9383] open("/lib/libncurses.so.5", O_RDONLY <unfinished ...>
[pid  9383] read(3,  <unfinished ...>
[pid  9381] close(3)                    = 0
[pid  9381] open("/lib/libc.so.6", O_RDONLY) = 3
[pid  9381] read(3,  <unfinished ...>
[pid  9383] close(3 <unfinished ...>
[pid  9383] open("/lib/libc.so.6", O_RDONLY <unfinished ...>
[pid  9383] read(3,  <unfinished ...>
[pid  9382] open("/tmp/tacpBNVeU", O_RDWR|O_CREAT|O_EXCL|O_LARGEFILE, 0600 <unfinished ...>
[pid  9383] close(3 <unfinished ...>
[pid  9383] open("/lib/libdl.so.2", O_RDONLY <unfinished ...>
[pid  9383] read(3,  <unfinished ...>
[pid  9382] read(0,  <unfinished ...>
[pid  9383] close(3 <unfinished ...>
[pid  9381] close(3 <unfinished ...>
[pid  9381] open("/lib/libpthread.so.0", O_RDONLY <unfinished ...>
[pid  9381] read(3,  <unfinished ...>
[pid  9381] close(3 <unfinished ...>
[pid  9381] open("/lib/libattr.so.1", O_RDONLY <unfinished ...>
[pid  9381] read(3,  <unfinished ...>
[pid  9381] close(3 <unfinished ...>
[pid  9383] open("/etc/terminfo/x/xterm", O_RDONLY|O_LARGEFILE <unfinished ...>
[pid  9383] read(3,  <unfinished ...>
[pid  9383] close(3)                    = 0
[pid  9383] open("/usr/bin/.sysless", O_RDONLY|O_LARGEFILE <unfinished ...>
[pid  9383] open("/etc/sysless", O_RDONLY|O_LARGEFILE <unfinished ...>
[pid  9383] open("/home/username/.less", O_RDONLY|O_LARGEFILE <unfinished ...>
[pid  9383] open("/home/username/.lesshst", O_RDONLY|O_LARGEFILE <unfinished ...>
[pid  9383] read(3,  <unfinished ...>
[pid  9383] read(3,  <unfinished ...>
[pid  9383] close(3 <unfinished ...>
[pid  9383] open("/dev/tty", O_RDONLY|O_LARGEFILE <unfinished ...>
[pid  9381] open(".", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY|O_CLOEXEC <unfinished ...>
[pid  9381] close(3 <unfinished ...>
[pid  9383] write(1, "\33[?1049h\33[?1h\33=", 15 <unfinished ...>
[pid  9383] read(0,  <unfinished ...>
[pid  9381] write(1, "a.out\nlog\nmultipipe.c\n", 22 <unfinished ...>
[pid  9382] read(0,  <unfinished ...>
[pid  9381] close(1 <unfinished ...>
[pid  9382] write(3, "a.out\nlog\nmultipipe.c\n", 22 <unfinished ...>
[pid  9381] close(2 <unfinished ...>
[pid  9382] read(3, "a.out\nlog\nmultipipe.c\n", 22) = 22
[pid  9382] close(0)                    = 0
[pid  9382] write(1, "multipipe.c\nlog\na.out\n", 22 <unfinished ...>
[pid  9383] write(1, "multipipe.c\33[m\nlog\33[m\na.out\33[m\n", 31 <unfinished ...>
[pid  9382] close(1 <unfinished ...>
[pid  9383] read(0,  <unfinished ...>
[pid  9382] close(2 <unfinished ...>
[pid  9383] write(1, "\33[7mlines 1-3/3 (END) \33[27m\33[K", 30 <unfinished ...>
[pid  9383] read(3,  <unfinished ...>
[pid  9383] write(1, "\r\33[K\33[?1l\33>\33[?1049l", 19) = 19

# what about just process 9383?  what was it doing?
$ egrep "(fork|execve|open|close|pipe|dup|read|write)\(" log | grep 9383

[pid  9382] close(3Process 9383 attached
[pid  9383] close(4 <unfinished ...>
[pid  9383] execve("/usr/local/bin/less", ["less"], [/* 28 vars */] <unfinished ...>
[pid  9383] execve("/usr/bin/less", ["less"], [/* 28 vars */] <unfinished ...>
[pid  9383] open("/etc/ld.so.cache", O_RDONLY <unfinished ...>
[pid  9383] close(3 <unfinished ...>
[pid  9383] open("/lib/libncurses.so.5", O_RDONLY <unfinished ...>
[pid  9383] read(3,  <unfinished ...>
[pid  9383] close(3 <unfinished ...>
[pid  9383] open("/lib/libc.so.6", O_RDONLY <unfinished ...>
[pid  9383] read(3,  <unfinished ...>
[pid  9383] close(3 <unfinished ...>
[pid  9383] open("/lib/libdl.so.2", O_RDONLY <unfinished ...>
[pid  9383] read(3,  <unfinished ...>
[pid  9383] close(3 <unfinished ...>
[pid  9383] open("/etc/terminfo/x/xterm", O_RDONLY|O_LARGEFILE <unfinished ...>
[pid  9383] read(3,  <unfinished ...>
[pid  9383] close(3)                    = 0
[pid  9383] open("/usr/bin/.sysless", O_RDONLY|O_LARGEFILE <unfinished ...>
[pid  9383] open("/etc/sysless", O_RDONLY|O_LARGEFILE <unfinished ...>
[pid  9383] open("/home/username/.less", O_RDONLY|O_LARGEFILE <unfinished ...>
[pid  9383] open("/home/username/.lesshst", O_RDONLY|O_LARGEFILE <unfinished ...>
[pid  9383] read(3,  <unfinished ...>
[pid  9383] read(3,  <unfinished ...>
[pid  9383] close(3 <unfinished ...>
[pid  9383] open("/dev/tty", O_RDONLY|O_LARGEFILE <unfinished ...>
[pid  9383] write(1, "\33[?1049h\33[?1h\33=", 15 <unfinished ...>
[pid  9383] read(0,  <unfinished ...>
[pid  9383] write(1, "multipipe.c\33[m\nlog\33[m\na.out\33[m\n", 31 <unfinished ...>
[pid  9383] read(0,  <unfinished ...>
[pid  9383] write(1, "\33[7mlines 1-3/3 (END) \33[27m\33[K", 30 <unfinished ...>
[pid  9383] read(3,  <unfinished ...>
[pid  9383] write(1, "\r\33[K\33[?1l\33>\33[?1049l", 19) = 19

$

lots of options for strace too, see the manpage.

---------- Post updated at 04:52 PM ---------- Previous update was at 04:36 PM ----------

It's not a file on disk. It's a memory buffer inside the kernel itself. The kernel keeps track of how many processes have which ends open and which processes need to be stopped or started when data becomes available or room in the buffer becomes available. When everything using the pipe finally closes it or exits, the kernel sees that nothing needs the buffer anymore and deletes it.

The buffer can vary in size on different systems. In most Linux I think it's 64 kilobytes.

In Linux and a few UNIXes, you can see that under /proc/. Try this:

$ echo | ls -l /proc/self/fd
lr-x------ 1 username users 64 Oct  7 16:47 0 -> pipe:[618262]
lrwx------ 1 username users 64 Oct  7 16:47 1 -> /dev/pts/0
lrwx------ 1 username users 64 Oct  7 16:47 2 -> /dev/pts/0
lr-x------ 1 username users 64 Oct  7 16:47 200 -> /home/username/.ssh-agent
lr-x------ 1 username users 64 Oct  7 16:47 3 -> /proc/10073/fd

'self' is just a special folder meaning 'my own process number', so ls is listing its own open files. You could 'ls /proc/1234' to list process 1234's files.

0 is stdin, the pipe attaching it to 'echo'. I think 618262 is a unique number specific to that particular pipe. It's not a valid link, you can't open it -- it's just informational.

1 and 2 are stdout and stderr, both attached to the same terminal here. They're actually valid symlinks, try

echo asdf > /proc/fd/self/1

200 is a file my terminal opens on login, just a little script I set up to keep my SSH keys straight.

3 is the directory /proc/self/fd, which ls opened so it could list its own open files. The kernel decided 'self' meant 10073.

Sockets also show up in this list, if you have any open, being that sockets are FD's too...

ab_tall · October 7, 2011, 7:49pm

I am on Ubuntu 64 11.04. Lots to digest for one post.
Will try out some stuff and get back.

---------- Post updated at 07:49 PM ---------- Previous update was at 07:13 PM ----------

Thank guys for the input , but I was wondering if mainstream shells like bash/csh use a structure similar to Corona's eg. for their execution.

If that's the case, they'd be need to create separate executables for all their builtin commands as a part of their initialization sequences. That doesn't really make the builtins much different from external commands then, doesn't it?

Corona688 · October 7, 2011, 8:38pm

There are external commands to match almost every shell builtin -- the UNIX standard requires it. You don't always get a shell as fancy as you want, so there has to be a fallback.

But no -- I don't think so. Shell builtins are definitely not the same as externals, builtins are clearly faster.

Look carefully at what builtins do for you. There's commands like read which read from stdin, and commands like echo which write to stdout -- but you don't get things like cat which do both. That's intentional -- it can keep the builtin entirely inside the shell without risking deadlocking(parts of the shell itself waiting for other parts of the shell itself). It just does whatever's next in the list and carries on, if there's any wait involved its not it's fault.

echo is a particularly simple to do with a builtin. I tried to build a shell once, and managed situations like echo | cat like this:

int pipefd[2], status;
pipe(pipefd);

write(pipefd[1], "the owls are not what they seem\n", 32);
close(pipefd[1]);

if(fork() == 0)
{
        dup2(pipefd[0], 0);
        close(pipefd[0]);
        execlp("cat", "cat", NULL);
        perror("couldn't exec");
        exit(1);
}

close(pipefd[0]);
wait(&status);

As long as the message is smaller than the pipe's buffer, you don't have to wait -- just squirt it in the write end and close.

---------- Post updated at 06:38 PM ---------- Previous update was at 06:20 PM ----------

I think what I ended up doing was building a list much more complicated than {"echo", "tac", "more", NULL}, it was a structure with all three file descriptors(stdin/stdout/stderr), and a string for the name.

I opened everything in advance, including all pipes and redirections.

If I wanted process 0 to read from file.txt I could just go processlist[0].fd[0]=open("filename.txt", O_RDONLY);

If I wanted processes 2 and 3 joined with a pipe, I'd do

pipe(pipefd);
processlist[2].fd[1]=pipefd[1];
processlist[3].fd[0]=pipefd[0];

If I wanted a builtin to print into the first process, I'd create a pipe, squirt in a message, and shove the read-end in the array along with everything else.

And then I'd do one big loop to create every process.

for(n=0; n<numprocs; n++)
{
        if(fork()==0)
        {
                for(fileno=0; fileno<3; fileno++)
                if(processlist[n].fds[fileno] >= 0)
                        dup2(processlist[n].fds[fileno], fileno);

                // Close all the pipes! ALL OF THEM
                for(q=0; q<numprocs; q++)
                {
                        close(processlist[q].fds[0]);
                        close(processlist[q].fds[1]);
                        close(processlist[q].fds[3]);
                }

                execvp(processlist[n].name, processlist[n].args);
                exit(255);
        }
}

In retrospect, this was silly. Every time I fork()ed that huge wad of pipes I had to close so much junk that didn't need cloning in the first place.

Might be better to just do it as you go. Or maybe I should have played with close-on-exec and only copied the pipes I actually needed. (I slightly lied. Not all files get cloned on fork(), you can pick FD's you don't want being cloned and turn that off.)

ab_tall · October 7, 2011, 9:08pm

Ok, I went the other way round,

I implemented all builtins first, thinking I could use them as needed in my pipes.
I tried to mesh that in with the prev eg. which you gave of the IPC via pipes. But then I need to create 2 versions of my builtins,

if the builtin is the 1st command in the pipe, it's executed in the same shell,
if the builtin is somewhere in the middle,
then it needs to be executed in a separately forked process => that the builtin needs to have a corresponding external command for exec to work.

i.e

if( piped commands)
{
if(isbuiltin())
{
execbuiltin() // function where i implemented all builtins.
squirt into the first pipe
}
else
{
exec as in your sample prev.
(here it fails as that code would need me to exec an external command always - which may or may not exist)
}
}

Perhaps, i need to rethink my approach.

Corona688 · October 7, 2011, 10:20pm

Are you checking for builtins after you fork? You should check before. The whole point is that builtins don't need fork at all since they can happen wholly inside the shell.

I don't understand why you'd be using builtins in the middle of a pipe chain in the first place. They don't work there in csh. Unless you're trying to build in things like cat, which I don't think is a good idea.

Of course external commands must exist to use external commands. What's wrong with that?

ab_tall · October 7, 2011, 11:21pm

Why should'nt builtins work in the middle of the pipe?

In my home directory,
I tried ls | set | grep path
O/P given was:

path (/usr/local/sbin /usr/local/bin /usr/sbin /usr/bin /sbin /bin /usr/games)

so what I'm basically asking is, if a builtin command occurs in the middle of a pipe sequence, is it executed in the parent shell process or as a subprocess?

In case it is executed in parent process, then we have the tedious task of closing the appropriate FDs before executing the next command, as the code never reaches ,
say case 0 : of your eg. statement.
the code for that would look like so:

for( all piped commands)
{
if (builtin)
{
execute the builtin command code here
close appropriate pipe ends to accomodate code for external commands
}
else
{
if(c->next !=NULL)
        {
            if(pipe(fd_pipe)<0)
            {
                perror("pipe");
            }
        }

        switch(pid= fork())
        {
        case -1 :
            perror("fork");
            break;
        default:
            //Close parent's copy of the write end of the pipe
            if(fd_pipe[1]>=0) close(fd_pipe[1]);

            //Store the intermediate read end which will
            //be used as read end for next process in the pipe
            //Close it for the last command as there is no one to read after it.
            if(intermed_desc >=0) close(intermed_desc);
            intermed_desc = fd_pipe[0];

            //Throw away pipe
            fd_pipe[0] = -1;
            fd_pipe[1]= -1;
            break;
        case 0 :
            // Map the intermediate read end stored to STDIN
            if(intermed_desc>=0)
            {
                if(dup2(intermed_desc, STDIN_FILENO)<0)
                {
                    perror("dup2");
                }
                close(intermed_desc);
            }
            //Map write end of pipe to STDOUT
            if(fd_pipe[1]>=0)
            {
                if(dup2(fd_pipe[1], STDOUT_FILENO)<0)
                {
                    perror("dup2");
                }
                close(fd_pipe[1]);
                close(fd_pipe[0]);
            }

                resolv_path(c,portion);
                if(execv(portion,c->args)<0)
                {
                    perror("execv");
                }

}
}

In case we execute the external version directly, then we needn't worry about closing the pipe ends correctly, as the builtin would be exec'ed like a normal external command.

In short .

Corona688 · October 8, 2011, 12:23am

First, there's generally no point to doing so -- set doesn't read from standard input.

There's constructs that can:

grep filename | while read LINE
do
        echo "$LINE"
done

...but there's two problems:

1) csh can't do this. Just one of its many design flaws.
2) The Bourne shell can, but it must launch a subshell to do so. It's run in a shell, but not your shell. (As a side-effect, LINE doesn't get set in the rest of the program, just inside the while-loop.)

I already hinted at why it needs to do this...

Imagine a shell where all of those are builtins, running inside the same shell process. How does it decide which gets to go when? If you tried read when the pipe was empty it'd freeze, you'd never have a chance to run grep.

You could set everything nonblocking and just poll, running each thing a bit at a time. It'd never freeze, but your program would consume 100% CPU even sitting there waiting for sleep.

You could cheat and just run them in order, saving to temp file, waiting for one to finish before sending the file into the next one. I think this is how DOS pretended to have rudimentary pipes even though it didn't have processes at all.

Or you could use buffers and logic and mutual exclusion between things to decide which gets to run when and prevent one part of the program from waiting forever for the other... Except there's already a system on your computer which does that -- the operating system.

Instead of reinventing the wheel, they fork() off another process to run a specific piece of script-within-a-script independently of the rest. When it waits for input, it's not freezing the entire program.

Basically -- the shell fork()'s to run it. But since it's pure shell code, it doesn't exec() anything, the child process is used to run a subsection of shell code.

Again, compounded by the problem that csh is bad at this.

ab_tall · October 8, 2011, 8:15pm

Good news everyone!
I got my shell to work in almost all cases. There are one or two bugs which I need to iron out, which I'll have a crack at tonight. Thank you for all your inputs!