UNIX Pipe -Exit when there are no bytes to read

mr_manii · April 6, 2018, 6:40am

Hi All,

I'm creating a program which reads millions of bytes from the PIPE and do some processing. As the data is more, the idea is to read the pipe parallely.

Sun Solaris 8
See the code below:

#!/bin/sh
MAXTHREAD=30
awk '{print $1}' metadata.csv > nvpipe &
while [ $c -le $MAXTHREAD ]
do
   ${BIN}/parallel_wot.sh &
   PID=$!
   sleep 3
   c=`expr $c + 1`
STRING=$STRING","$PID
done

parallel_wot.sh

cd $PIPEDIR/
while IFS=',' read DIR1 DIR2
do
 echo starting at `date`
  ${COMPUSETBIN}/prg1.sh prg2.sh $DIR1 $DIR2
  echo ending at `date`
  echo ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
done < nvpipe

Now the problem is, as the pipe is read in parallel, once the pipe is emptied all other processes are waiting to read from the pipe except 1. The questions I now have is , how to exit from reading the pipe if there are no records.

Scrutinizer · April 6, 2018, 7:36am

At any rate you need a wait statement after done in the first script..
Also, no pipe is being created and removed in these scripts.
This code would create a file called "nvpipe", rather than a named pipe..

rbatte1 · April 6, 2018, 12:54pm

I will assume that you already made a pipe file with something like mknod /$PIPEDIR/nvpipe p however I would be concerned that you have no idea which thread is reading the (now) input at any time.

You might find that the first reading process locks up the pipe, I'm not sure. It might be more sensible to ignore the pipe altogether and do something more like this:-

Split metadata.csv into 30 roughly equal files
Fire off 30 processes that read a separate input file each to do whatever processing you need

If you can get the number of lines in your file, you should be able to get the require line-count like this:-

#!/bin/bash
threads=30                                            # How many threads you want to work with

all_lines=$(wc -l < metadata.csv)                     # Count all the lines in your full input file
req_lines=$(printf $all_lines / $threads +1 | bc      # Get the lines required in each split file
                                                      # Rules of BIDMAS apply so the 1 is added after the divide

split -ld $req_lines metadata.csv metadata.           # Note the trailing dot. This will generate $threads files
                                                      # of up to $req_lines each in the format metadata.nn
                                                      # so up to 100 threads if you choose
for split_file in metadata.??
do
   ( while IFS=',' read DIR1 DIR2
   do
     printf "Start at $(date)\n"
     ${COMPUSETBIN}/prg1.sh prg2.sh $DIR1 $DIR2
     printf "Ending at $(date)\n"
     printf "++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\n"
   done < $split_file
done ) &

wait               #  All threads must complete before this script will exit

It's untested, but does it get you started a bit better?

Probably there are better ways to do this in a single awk. What does your prg1.sh & prg2.sh actually do?

Kind regards,
Robin

Corona688 · April 6, 2018, 2:11pm

Named pipes aren't shared. To have 30 different readers you need 30 different writers, and reopening the same pipe 30 times in the same process doesn't count - it all still goes to the first. That's why your program does what it does.

To make a pipe see "end of file", you close the writing end. The reading end will then signal end-of-file to whatever's reading it.