tee into 2 named pipes

zzol · December 11, 2006, 1:47pm

The following code does not work (zsh, Solaris), but works without the first line (files instead of pipes)

mkfifo p1 p2
echo "Hello" | tee p1 > p2 &
paste p1 p2

I would high appreciate any help to fix it.

Corona688 · December 11, 2006, 2:13pm

As you probably know, pipes block when you open one end of them until something opens the other end. As it's a redirection, p2 is opened by the shell before tee is even run -- and since the other end isn't open yet, it waits in the background. Our paste commands apparently open files in the order given, so it opens p1 first, which hasn't been opened since the shell's still waiting on p2, and deadlocks.

tyler@mecgentoo ~/code/sh/fifo $ mkfifo p1 p2
tyler@mecgentoo ~/code/sh/fifo $ echo "Hello" | tee p1 > p2 &
[1] 6368
tyler@mecgentoo ~/code/sh/fifo $ paste p1 p2
deadlocked

If you reverse the order in paste -- paste p2 p1 instead of paste p1 p2 -- then it works.

tyler@mecgentoo ~/code/sh/fifo $ mkfifo p1 p2
tyler@mecgentoo ~/code/sh/fifo $ echo "Hello" | tee p1 > p2 &
[1] 6351
tyler@mecgentoo ~/code/sh/fifo $ paste p2 p1
Hello   Hello
[1]+  Done                    echo "Hello" | tee p1 > p2
tyler@mecgentoo ~/code/sh/fifo $

zzol · December 11, 2006, 3:06pm

Thank you very much!

zzol · December 11, 2006, 9:40pm

Sorry,
I tryed to apply it to a little bit more complicated example

mkfifo p1 p2
echo "Hello" | tee p1 > p2 &
paste =(cut -c1-3 p2) p1

and failed again; at this time changing order p1 and p2 in the third line did not save me :((

Perderabo · December 11, 2006, 10:28pm

What is this supposed to do? Could you mean
paste $(cut -c1-3 p2) p1

Ignoring the deadlock issue, that inner cut would return "Hel". Then you try
paste Hel p1
Do you have a file named "Hel"? You need to clarify what you want to have happen here.

Also I assume that you realize that a simple sed command can handle everything that you seem to be trying...
$ echo Hello | sed 's/$...$$.*$$/\1 \1\2/'
Hel Hello

zzol · December 11, 2006, 11:11pm

Thanks, the line

paste =(cut -c1-3 p2) p1

in zsh returns "Hel Hello", if p1, p2 are files rather than pipes.

What I really need is to use "join" to join 2 text files by the fieild in fixed width position: e.g. to select all lines from zipped file A where field "cut -c30-40 A" exists in file B "cut -c40-50 B" and I'd like to use named pipes to maximize performance. I understand how to make "Hel Hello" using sed or awk, but I think it could be done with paste and tee with two pipes, but can't escape the deadlock.

Perderabo · December 11, 2006, 11:24pm

Hmmm... interesting about zsh. You need to specify stuff like zsh use, not everyone uses zsh to say the least. As for your problem, is file B also zipped? Roughly how many lines in file B?

zzol · December 11, 2006, 11:54pm

I specified zsh in the first line of thread but if a code works in bash or ksh it usually works in zsh.
File A could be ~1e7 - 1e8 lines (in reality it's a set of files), B ~ e6 lines; really instead of
echo "Hello" | tee p1 > p2 &
it could be something as
uncompress -c A_2006*.Z | tee p1 > p2 &
or cycle as
for f in A_2006*.Z
{uncompress -c $f | tee p1 > p2 &
...

Perderabo · December 12, 2006, 5:34am

I would write a lot of this in C. If that is not an option, then perl. This program would read A style records and would write them if they passed the test. To do that, it would first preprocess file B. I would read file B and build an array. 1,000,000 elements, each of which is 11 bytes is a very large array but not prohibitive. I would explore reducing the 11 bytes though. Then I would sort the array and have a binary search function for it. To validate each A record, the program then just does one binary search on the arrray. Ideally, enough physical memory should be available that the program fits into core without paging.

A shell script would unzip the data files and feed them to the program. Ideally multiple cpu's would be available. This would allow the unzip and the c program to settle into dedicated cpu's.

zzol · December 12, 2006, 7:10pm

Thanks, it's a better way than that I tryed to do, because for use join I have to sort file A originally and it's ~ N log(N) operations, so using join is good if A is sorted.