How to use an input pipe ?

LeNouveau · August 16, 2011, 4:53pm

Hi all,

I would like to use properly an input pipe, like this :

cat myFile.txt | myCommand.sh

I always find this solution :

while read line; do ...; done

but I have a great lost of performance !

On a big file, with a simple grep, I can spend 2400 times more time ! oO
(from 0,023sec to 1m)

So, what did I miss ? ^^'
Thx a lot !

Corona688 · August 16, 2011, 6:35pm

That is a useless use of cat, which may account for a little (but probably not most) of the lost performance.

I don't know -- I can't see the contents of this myCommand.sh from here. There may well be things that can be improved if you show us what you actually did. Frequent mistakes include the use of cut/awk/sed and so forth many times per loop instead of shell builtins, burdensomely long pipe chains, etc, etc, etc.

In any case, grep is a purpose-built utility made in raw C -- of course it's faster.

LeNouveau · August 17, 2011, 3:25am

the command line "cat myFile.txt | ..." was only for an example and was present in my both tests, for an equality. ^^

For example, here the test 1 :

cat myFile.txt | grep -o "toto"

And here the second test :

cat myFyle.txt | ./test2.sh

with in test2.sh :

while read line
  do echo $line | grep -o "toto"
done

So, between those tests, there is the "while" and the "echo" in difference.
Sure, the "echo" imply a lost of performances, but I don't think it's very important. So, it would come from the "while".

Of course, grep is faster because it is written in C, but it is in both case. So it should not change so much...

So, what did I miss ? ^^

---------- Post updated at 02:25 AM ---------- Previous update was at 02:07 AM ----------

So, i made some other tests :

Test 1 :

while read line
  do echo $line; grep "coucou"
done

This is so long !

Test2 :

while read line
  do echo $line; grep "coucou"
  echo $line; grep "hello"
done

Twice longer !

Test 3 :

while read line
  do grep "coucou"
done

Almost as quick as a normal grep.

So, my problem would not come from my pipe, but how I use it.
Two solutions :

"echo" slows my script.
the use of the var $line slows my script.

I think it's the second solution. And you ? ^^'

Corona688 · August 17, 2011, 11:49am

All of your examples could have been done without cat. you didn't need to use cat to make them "equal".

Of course running grep once is faster than running the same program once for each individual line :wall:

No. Running grep for each individual line has a ton of overhead. You're not supposed to do that.

It doesn't feed the first line into grep at all. You ran grep without a pipe or parameter, so it tries to read from standard input, running grep once, slurping up all the data, causing the loop to break.

Well, it would be. It only runs grep once. Without the pipe, where do you think it's reading from? If not the echo, it's reading from standard input, slurping in all the data in one go, hitting EOF, causing read to fail the next loop, and quitting.

It also ignores the first line because the while loop got to it first.

Please don't take offense, but you have no clue what you're doing. Substituting semicolons for pipes doesn't just make it "faster", it totally alters the meaning of your program. You should learn how to use a shell before complaining that they're slow.

There is a high cost to running and quitting an external shell utility. It has to load and map quite a few files, and for all that work you only feed it a handful of bytes before starting over. This is why constructs like echo single-line | grep | sed | awk | cut are wasteful -- they create and destroy tons of processes per loop, wasting tons of time loading files and writing between pipes that could've been used to do actual work. Imagine only being allowed to say one word per telephone call and calling over and over.

---------- Post updated at 09:49 AM ---------- Previous update was at 09:04 AM ----------

An example of everything that happens when you pipe "echo asdf" into grep. Overhead is red, useful work is green.

execve("/bin/grep", ["grep", "asdf"], [/* 27 vars */]) = 0
brk(0)                                  = 0x871f000
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb78bf000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY)      = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=36795, ...}) = 0
mmap2(NULL, 36795, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb78b6000
close(3)                                = 0
open("/lib/libpcre.so.0", O_RDONLY)     = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\260\22\0\0004\0\0\0"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=201952, ...}) = 0
mmap2(NULL, 204928, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb7883000
mmap2(0xb78b4000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x30) = 0xb78b4000
close(3)                                = 0
open("/lib/libc.so.6", O_RDONLY)        = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\20m\1\0004\0\0\0"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=1339676, ...}) = 0
mmap2(NULL, 1345832, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb773a000
mmap2(0xb787d000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x143) = 0xb787d000
mmap2(0xb7880000, 10536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb7880000
close(3)                                = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7739000
set_thread_area({entry_number:-1 -> 6, base_addr:0xb77396c0, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0
mprotect(0xb787d000, 8192, PROT_READ)   = 0
mprotect(0xb78b4000, 4096, PROT_READ)   = 0
mprotect(0x805b000, 4096, PROT_READ)    = 0
mprotect(0xb78de000, 4096, PROT_READ)   = 0
munmap(0xb78b6000, 36795)               = 0
brk(0)                                  = 0x871f000
brk(0x8740000)                          = 0x8740000
read(0, "asdf\n", 32768)                = 5
fstat64(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 2), ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb78be000
write(1, "asdf\n", 5)                   = 5
read(0, "", 32768)                      = 0
close(1)                                = 0
munmap(0xb78be000, 4096)                = 0
exit_group(0)                           = ?

It can work really fast once it's loaded, but it takes a bit of work to get there. So it makes sense to run external programs only when you have a reasonable amount of work for them to do.

LeNouveau · August 17, 2011, 2:52pm

I understood where was my mistake on the third example.
I did not understand what really happened. Thx.

And I did not realise there were so many actions for a simple pipe (which is not so simple ^^').
Now, I understand why I slowed my script.
Thx a lot !