Preserve output order when redirecting stdout and stderr

Boemm · December 22, 2009, 6:26am

Hi,

I already searched through the forum and tried to find a answer for my problem but I didn't found a full working solution, thats way I start this new thread and hope, some can help out.

I wonder that I'm not able to find a working solution for the following scenario:

Working in bash I have some standard script or executable which outputs some messages to stdout and some to stderr.
I try to get write the output to 2 separate files and want the complete output in the order it's printed by the called script to console.

The redirect into 2 files and the additional output to console works with some nice redirections and/or sub shells, but in every case the order of the printed lines are afterwards mixed up ...
If using sub shells, the output is printed more or less randomly, depending on how long each sub shell needs to get executed.
If using redirections, then either all the stdout lines will be printed first and afterwards all the stderr lines or vice versa (depending on which of them are logged first with tee).

I hacked those 2 examples which shows the explained problems.
The first is the one with sub shells, the second the one with redirects:

1:

./test.sh 2> >(while read l; do echo $l >> stderr.log; echo err: $l; done) 1> >(while read l; do echo $l >> stdout.log; echo out: $l; done)

2:

(./test.sh 2>&1 1>&3 | tee errors.log) 3>&1 1>&2 | tee output.log

If you try it, you will see, both prints all the output to console and write them to the separate files, but the output is not in the right order some times ...

Can somebody help me out and provide a working solution for this annoying problem!?
I would also be glad about something in perl or some other usual scripting language.

Thanks in advantage!
Steffen

radoulov · December 23, 2009, 3:18am

Ok,
the problem is that when connected to a pipe the standard output is buffered while the standard error is never buffered.

I suppose the only way to achieve what you're asking for is to modify your program/script like this, not use pipes at all and use temporary files, sort and grep.

#!/bin/bash

for i in {1..5}; do
  printf '%s:%s\n' "$((++c))" "out $i"
  printf '%s:%s\n' "$((++c))" "err $i" >&2
done

You want to avoid this behavior:

$ ./s
1:out 1
2:err 1
3:out 2
4:err 2
5:out 3
6:err 3
7:out 4
8:err 4
9:out 5
10:err 5

$ ./s | sort -n
2:err 1
4:err 2
6:err 3
8:err 4
10:err 5
1:out 1
3:out 2
5:out 3
7:out 4
9:out 5

You could do something like this:

$ ./s > output_all 2>&1
$ sort -n output_all
1:out 1
2:err 1
3:out 2
4:err 2
5:out 3
6:err 3
7:out 4
8:err 4
9:out 5
10:err 5

And then use grep to pull out what you need:

$ fgrep :err output_all | sort -n
2:err 1
4:err 2
6:err 3
8:err 4
10:err 5

Another option is to use expect like this (notice that you still need to modify your program to distinguish error from output).

% cat s
#!/usr/bin/bash

for i in {1..5}; do
  printf '%s\n' "out $i"
  printf '%s\n' "err $i" >&2
done

% cat unbuffer 
#!/bin/sh
#
# Description: unbuffer stdout/stderr of a program
# Author: Don Libes, NIST
#
# Option: -p  allow run program to read from stdin (for simplification)
#
# Note that expect can 'continue' a comment line, so the follow re-runs this
# as a expect command regardless of its PATH location.
# \
exec expect -- "$0" ${1+"$@"}

if {[string compare [lindex $argv 0] "-p"] == 0} {
   # pipeline
   set stty_init "-echo"
   eval spawn -noecho [lrange $argv 1 end]
   interact
} else {
   set stty_init "-opost"
   eval spawn -noecho $argv
   set timeout -1
   expect
}

You can run your program unbuffered and then filter whatever you need:

% unbuffer ./s
out 1
err 1
out 2
err 2
out 3
err 3
out 4
err 4
out 5
err 5
% unbuffer ./s | fgrep err
err 1
err 2
err 3
err 4
err 5
% unbuffer ./s | fgrep out
out 1
out 2
out 3
out 4
out 5

With Perl you can control the buffering of what Perl is reading,
I don't believe you can control the buffering that the standard libc IO does because of the shell pipe.

There is also a program called pty that can do what the above expect script does.

I'm curious too if there are other solutions.

Boemm · December 23, 2009, 10:31am

Hi radoulov,

thanks for you reply.

I tried the version with expect, but this didn't work for all aspects.
If tried both of my commands executing by "unbuffer" but the error.log was never filled.
For me those great tools was new, but I searched around for infos about and the reason for the empty error.log is, that expect merges the stdout and stderr together internal and afterwards the information, where the output comes from (stderr or stdout) is gone away.
So one cannot separate while catching each separate line with expect to whether write it to output.log or to error.log.
Was a nice try and maybe this tools helps me out with other works but on this particular problem I found no solution with this.

The other idea, to prepend some line numbering to each line, was also in my mind.
The problem is, I cannot add this for each line, since in my special case, I execute some php scripts and if php throws some fatal errors there is no possibility to catch those errors to prepend some number.

The same problem would be with some binary executables.
There is also no possibility to prepend line numbers before each output line ...

So if there is no way to use some wrapper "around" the binary or php script, which adds the leading numbers and preserves the channel where the output was written (so that stdout still is stdout and stderr still is stderr) this is not a working solution for my problem ...

Do you know such a wrapper somewhere in the wild!?

radoulov · December 24, 2009, 4:30am

Yes,
that's why I said you'll still need to modify your program to distinguish error from output.
The idea is to "mark" the error with "error:" for example, and output with "output:"

Well,
I don't know php, may be you should avoid using external commands and shell pipes in those cases.
I believe the only solution is to modify the libc you're using (I'm joking :)).

Could you explain why you need such a thing?
There could be a different, better way to solve your problem.

---------- Post updated 12-24-09 at 10:30 AM ---------- Previous update was 12-23-09 at 04:46 PM ----------

Well,
now that I think again, with shells that support process substitution you can do something like this:

prog 2> >(
  awk '{ 
    c++ || "date +%N" | getline sdt
    $0 = c ":" sdt ":ERR:" $0
    print > "err.log"
    }1' 
  ) > >(
    awk '{
      c++ || "date +%N" | getline sdt
      $0 = c ":" sdt ":OUT:" $0
      print > "out.log"
      }1'
    ) | 
      sort -n

Notice that here I'm using the N format specifier which is not portable. If your date implementation does not support it,
you can use Perl's Time::HiRes gettimeofday. We need it to get only the start time in order to be able to sort the output properly.

For example:

$ ./s 2> >(
  awk '{
    c++ || "date +%N" | getline sdt
    $0 = c ":" sdt ":ERR:" $0
print > "err.log"
}1'
  ) > >(
    awk '{
  c++ || "date +%N" | getline sdt
  $0 = c ":" sdt ":OUT:" $0
  print > "out.log"
  }1'
) |    sort -n > out_and_err.log
$ head *log
==> err.log <==
1:694103195:ERR:err 1
2:694103195:ERR:err 2
3:694103195:ERR:err 3
4:694103195:ERR:err 4
5:694103195:ERR:err 5

==> out_and_err.log <==
1:693535244:OUT:out 1
1:694103195:ERR:err 1
2:693535244:OUT:out 2
2:694103195:ERR:err 2
3:693535244:OUT:out 3
3:694103195:ERR:err 3
4:693535244:OUT:out 4
4:694103195:ERR:err 4
5:693535244:OUT:out 5
5:694103195:ERR:err 5

==> out.log <==
1:693535244:OUT:out 1
2:693535244:OUT:out 2
3:693535244:OUT:out 3
4:693535244:OUT:out 4
5:693535244:OUT:out 5
$

Here I'm using AWK, but you can use Perl or whatever scripting language you like.

Boemm · December 29, 2009, 10:03am

Hi radoulov,

nice solution with the subshells :), but unfortunately this does also not work at all.
This works well if you have outputs to stdout and stderr alternating, but if there is more then one lines written to one of stdout or stderr, the numbering increases for this one, but not for the other one ...

Just tried it out ...

But I will explain, why I was searching for a solution for this:
It's pretty simple and thats way very annoying that this not works with some standard shell features or some additional command.

I have to execute some php script on command line where some messages are written to stdout and some messages are written to stderr from within the script itself.
Furthermore there maybe is some, warning, notice, fatal error or exception thrown from the php interpreter itself, which cannot be catched within php, thats a limitation of php.

And I just want to get both stdour and stderr written from the script to separate files and to console just in the order like they are written ...
The problem is a general one which each other script language or binary, where on cannot catch all errors or cannot modify the binary ...

Since this seems not to go in any way, I decided to solve this problem as smart as I can, even if this is not really the solution I wished :).

Thanks for your help and the nice ideas ... I learned some new stuff even if the problem is not solved ... but learning new stuff is always good :D.

Bye
Steffen

methyl · December 29, 2009, 10:18am

This works for me where "2>&1" doesn't retain the order of the messages.
For example with unix "traceroute".

traceroute 10.10.10.10 2>>logfile.log 1>>logfile.log

radoulov · December 29, 2009, 10:54am

The original poster wants two separate logs with stdout and stderr respectively and one with both of them in the correct order.

---------- Post updated at 04:54 PM ---------- Previous update was at 04:40 PM ----------

You're right.
It just doesn't work.

methyl · December 29, 2009, 11:14am

Thanks radolouv.
Having now read the original post a little more carefully all I can suggest is to build a list of either "known good" or "know error" messages and use this to filter the combined log. By my method the combined log should be in chronological order.

radoulov · December 29, 2009, 11:22am

Yes,
I don't see any alternative ways for now too.