Shells, forks, subprocesses... oh my

all,
i've been reading to try and get an abstract idea of the process effeciency of commands , sed, bash, perl, awk, find, grep, etc

which processes will spawn?, fork?, launch subshell?, etc and under what conditions?
how do you know which commands have the faster and better stdio implementation?

and so am looking for some guru advice instead of running thousands of use cases for different configurations.

example: finding a specific line in a multiple files spanning a volume

i can use something like this

sed 'LINENOq;d' $dir/$filename

which seems very fast for searching many(60,000+) of files <10 kb ascii, UTF-8 but one could also use

tail -n+LINENO $dir/$filename | head -n1

which seems fairly fast as well, one could also probably come up with a few one liners in perl.

sed, bash, perl, awk, find, grep are all processes. A subshell is a process. A fork is a fork is a fork.

Whether any of these are faster or slower than other ways to solve your problem, really depends on your problem, and the algorithm you use to solve it. So "one solution to solve everything, forever" may be out the window.

There's some cardinal sins to avoid:

  • Don't reprocess the same file repeatedly. You can almost always do everything in one pass that you could do in two.
  • Don't launch whole processes to process tiny amounts of data. echo "a b c" | awk '{ print $1 }' is a tragic waste, this is when shell builtins would be thousands of times more efficient.
  • Running your innermost loop in the shell will be slow. A while read loop line by line over a file will be slower than awk '{ something }' filename . Shell is for the high level things, not the nitty gritty bulk work. This is when externals would be thousands of times more efficient.
  • If you're doing cat | awk | sed | cut | tr | kitchen | sink, put it all in one awk. awk is a programming language which is capable of replacing all of these with some near-trivial code, and one awk call will be faster than ten anything else.
  • Useless Use of Cat. Don't do that. Nothing needs a cat | in front of it to read a file.
1 Like

Many thanks. This is what I was looking for, general rule of thumb.