Stdout and stderr combined and separate in single command

Greetings all,

I am new to bash scripting. I am running Linux Mint. I am tearing my hair out trying to get the following to work:
All I want to do is have the stdout and stderr outputted to separate files, and also combined into a single file, in the same command. I have experimented with tee, xargs, and redirection, but I haven’t been able to get it work.

Can someone help?

Assuming you want the combined output interleaved?

Welcome!
The following shell command should do it:

{ { yourcommand | tee stdout.txt; } 2>&1 1>&3 | tee stderr.txt; } 3>both.txt >&3

A test command in the shell:

{ { { echo okay; echo error >&2; echo ok2; } | tee stdout.txt; } 2>&1 1>&3 | tee stderr.txt; } 3>both.txt >&3

tee buffers; it's hard to predict the order in both.txt

Explanation:
the outer { } redirects descriptors 3 and 1 to both.txt,
the inner { } redirects descriptor 2 to where 1 goes, then descriptor 1 to where descriptor 3 goes.

Would stdbuf –output=L resolve any issue with the combined tees? I’m guessing that tee does not use stdio, so probably not.

Rats: the man page for stdbuf notes that some common utilities (including tee, dd and cat) bypass or adjust stdio buffering. Maybe busybox or awk (with fflush) would take care of the issue.

suck it and see :smiley:

It may be fairly hard to test, and possibly be inconclusive. As the stdout is (generally) block buffered, we need to have the stdout and stderr streams to be larger than one block to force any kind of interleaving. We may even need to build in some disruptive timing anomalies, and maybe a test to highlight interleaving. I started by examining the buffering by itself, sending it 50 lines of plain text (6866 bytes).

head -n 50 leipzig1M.txt | strace stdbuf -o L tee zigFoo > zig1M.txt 2> zig1M.log

tee reads 8192 bytes at a time, and writes the same size to both files. stdbuf has no effect.

head -n 50 leipzig1M.txt |
    strace awk '{ print; print > "zig.tee"; }' > zig1M.txt 2> zig1M.log

awk reads 4096 bytes at a time, and writes that same size to both files.

head -n 50 leipzig1M.txt |
   strace stdbuf -o L awk '{ print; print > "zig.tee"; }' > zig1M.txt 2> zig1M.log

That reads stdin 4096 bytes at a time. It writes stdout line-buffered, but zig.tee is block-buffered.

head -n 50 leipzig1M.txt | strace awk '{ print; print > "zig.tee"; fflush (); }' > zig1M.txt 2> zig1M.log

That line-buffers both stdout and zig.tee. Other awks than GNU/awk 5.1.0 may not work the same way, and the performance may be woeful.

I will attempt to merge this into the solution proposed by @MadeInGermany, and then provide a convincing test strategy.

I felt that any issues with buffering would appear at block boundaries, and my data should be a well-defined format so I could write a verification script. I have a bulk text file of a million lines, and I massaged that into a million lines with a serial number, random text, and a tag:

500249 The credits are blends of development aid and commercial exp STDOUT
500259 Investors who purchase biotechnology sh STDERR

The STDOUT lines are adjusted to 97 bytes and randomly 70% of the data, and the STDERR lines are 73 bytes and 30%. My substitute YourCommand is an awk that writes lines to stdout or stderr as tagged, with default block buffering.

I replaced the standard tee command with a GNU/awk script, and tested three flavours of buffering, where Tee is the name of the additional file, and then closely checked the outputs.

awk -v Tee="${1}" '{ print; print > Tee; }'

That mangled the data – truncating and combining lines. diff found 38,777 lines that were not in the line input format. That confirms there is a serious issue when combining block buffering.

stdbuf -o L awk -v Tee="${1}" '{ print; print > Tee; }'

That preserves the text lines exactly – evidently Awk uses stdio and does not override the buffering. In the Both file, the stdout and strerr are each separately ordered as in the input, but they are mingled arbitrarily because the original command block buffering is not synchronised.

This stdbuf version does has some issues, though.

(a) It only changes line-buffering for stdout and stderr, so is somewhat restrictive.

(b) It unnecessarily line-buffers the Tee files: only the Both file needs line-buffering.

awk -v Tee="${1}" '{ print; print > Tee; fflush("/dev/stdout"); }'

This runs about twice as fast as the stdbuf version, as the Teed files are still block-buffered. It also preserves the data exactly. It could also flush any combination of output files, not just the standard ones.

Hi, hello, hi: I’ve made an account for the sole purpose of replying to this post:
I believe the command the following:

command 2>/path/to/stderr.txt 1>/path/to/stdout.txt
# File descriptor 2 refers to /dev/stderr, while fd1 is standard output.
# To use both:
command 2>&1 >/dev/null # Replace /dev/null for whatever file.
# A fun use:
printf 'error: %s\n' "Invalid argument!" >/dev/stderr
# Since the output is sent to stderr, one can ignore the errors of the program:
sh script-test.sh # -> "error: Invalid argument!"
sh script-test.sh 2>/dev/null # Outputs nothing, since stderr is sent to null.

A repetition of the command is not a solution.
The second time the command can produce a different result.

The order for combined writing is

command >filename 2>&1

Redirect stdout to the file, redirect stderr to where stdout goes (to the file).

The second time the command can produce a different result.

Yes, I just noticed my blunder. Didn't read right, sorry.

Well, I suggest to append stderr to a stdout copy:

# 2>stderr.txt 1>stdout.txt command # Guessed command ran.
# <stdout.txt >all.txt # If your shell supports it, do use: no need for an external command, and no subprocessee!
# cp stdout all.txt # This works too.
cat stdout.txt >all.txt
cat stderr.txt >>all.txt

We don’t know a whole lot about this requirement: for example, whether the primary program can be modified, whether its buffering can be changed with stdbuf, or even why the merged output is required. I’m assuming that the program runs for a long time, and we are attempting to get the stderr messages close to any corresponding stdout messages. Appending them when the execution ends does not achieve that.

As things are, the two initial block-buffered outputs grow at different rates, so one of the 4 KB buffers (probably the stderr one) gets written later than the corresponding stdout lines.

Even if we line-buffer those outputs, the two pipes are also re-buffered (up to 64 KB), and the two Tee processes are separately scheduled so they read their respective pipes out of sync.

I can see three ways of fixing this to get the lines properly ordered, but they all require modifying the primary program – once any lines get buffered by stdio it is too late.

(a) Timestamp or sequence-mark all the outputs, write the two files, and use tails and merge to write the Both file in the required order. That might require an extreme amount of buffering.

(b) Mark each line Out or Err, write those to the Both file, and have a tail that uses those markers to write the Out and Err files. There might be some way to identify Out and Err lines with patterns, but that would be highly data-dependent.

(c) Simplest: have the primary program write all three files itself.

(d) Even simpler: script -c yourcommand log.txt captures the output of yourcommand to log.txt.

Do you need stdout and stderr captured in separate files? Or can you parse the information out of log.txt?

Greetings all,

I got this response on the Linux Mint fourms recently:

{ { command 2>&1 >&3 3>&- | tee stderr.txt; } 3>&1 >&2 | tee stdout.txt; } >combined.txt 2>&1

What do you think?

Test the response and you tell us what it is worth, :wink:

Greetings all,

I created a script with 100 records, with errors and outputs randomly ordered, but each with a number so I will know whether they are in order or not in the output files:

$ head randomdata.script
echo error 1 >&2
echo output 1 >&1
echo error 2 >&2
echo error 3 >&2
echo output 2 >&1
echo error 4 >&2
echo output 3 >&1
echo error 5 >&2
echo output 4 >&1
echo output 5 >&1

From that, I created a text file to use for comparison:

$ ./randomdata.script >randomdata.out 2>&1

$ head randomdata.out
error 1
output 1
error 2
error 3
output 2
error 4
output 3
error 5
output 4
output 5

I tested this:

{ { ./randomdata.script 2>&1 >&3 3>&- | tee stderr.txt; } 3>&1 >&2 | tee stdout.txt; } >combined.txt 2>&1

The stderr.txt and stdout.txt files are in order, but the “combined.txt” file is not:

$ head stderr.txt
error 1
error 2
error 3
error 4
error 5
error 6
error 7
error 8
error 9
error 10

$ head stdout.txt
output 1
output 2
output 3
output 4
output 5
output 6
output 7
output 8
output 9
output 10

$ diff combined.txt randomdata.out
1d0
< output 1
2a2
< output 1
5a6
< error 4
6a8
< error 5
9,11d10
< output 6
< error 4
< error 5
13a13
< output 6

I got this response on linuxquestions dot org. It works, as presented:

{ echo ok; this is an error; } 2> >(tee stderr.txt) 1> >(tee stdout.txt) | cat > both.txt

$ cat stderr.txt
Command 'this' not found, did you mean:
command 'thin' from deb thin (1.8.1-2ubuntu1)
Try: sudo apt install

$ cat stdout.txt
ok

$ cat both.txt
ok
Command 'this' not found, did you mean:
command 'thin' from deb thin (1.8.1-2ubuntu1)
Try: sudo apt install

However, when I substitute my script into the command, as follows:

{ ./randomdata.script } 2> >(tee stderr.txt) 1> >(tee stdout.txt) | cat > both.txt

I get a prompt:

>

and nothing happens. I don’t know what to do at this point. Can someone help?

Change the semi-colon to a hyphen, or enclose it in quotes. The shell is interpreting it as the end of a statement.

There must be a semicolon or a newline before the closing brace.

{ ./randomdata.script; } 2> >(tee stderr.txt) 1> >(tee stdout.txt) > both.txt

But I also see an ambiguous redirection of the stdout, regardless if there is a pipe to cat or not.

The fundamental problem lies in the buffering of stdout and strerr. Each stream is only written to when the buffered output reaches 4096 bytes (on almost any Linux system), or the stream is closed. That 4096 count completely ignores record boundaries (newlines) by the way, so rows will generally be split across blocks arbitrarily.

If you only have 100 (short) records, neither buffer will ever fill. So when the command terminates, it will close stdout and write all those records, and then close stderr and write all those records. The tee and cat commands do their own (non stdio) buffering, which serves only to further obfuscate the problem.

Essentially, the outputs are pre-sorted by file descriptor. Any further downstream processes cannot determine the order in which the records were originally buffered. Nor can they reconstruct the records that will have been split across blocks. All that sequence data has been lost, long before the merge of the two streams happens.

To make a valid test case, you need a minimum of three blocks worth of data in each of stdout and stderr. My test (as described 15 days ago) sent 700,000 lines (68 MB) to stdout and 300,000 lines (22 MB) to stderr. The degree of chaos in the output is directly related to the volume of data.

I can upload my code that demonstrates a test with adequate data volumes, and verifies the results. It is too large to post here, I think.

Hi Paul,

It appears that what I want to do is impossible. That really sucks, because having stdout and stderr in separate files, and a single, correctly ordered file as well, would be extremely useful.

Maybe there is a way to prepend a timestamp to each entry in stdout and stderr. I've tried to pipe stdout and stderr separately, but 2>| doesn't work.

I'm not exactly sure what a socket is, but I think it routes what it receives as input into a specified program. I wonder if such could be used to prepend a maximum resolution timestamp to each stdout and stderr entry. I could then combine the two files and sort.

What do you think?

Adding time-stamps to the outputs and sorting by them cannot work, for precisely the same reasons that cause all the other attempted methods to fail.

The data can sit in the output buffers of the original “yourcommand” for an arbitrarily long time, until the respective buffer is full and gets auto-flushed. So the process inserting the timestamps can only timestamp the rows with the time at which it gets the 4KB block. It cannot know what period has elapsed since the text was originally appended to each buffer.

And all those late times will most likely be identical as well as deferred (because you got a whole lot of lines at once). So sorting them with an identical time key (where the sort generally sorts on the rest of the line by default) will actually unsort all the lines in that block.

Also, the 4KB block will have part-lines that overlap another block, so when the files are merged the file will still contain many discontinuities.

You might revisit my post #7, dated Aug 31. It is impossible (both theoretically and practically) to solve this problem unless you can line-buffer both stdout and stderr as written by “yourcommand”.

As I have no idea what “yourcommand” actually does, we do not know how to make it line-buffer. You would need to run that under strace, and experiment with stdbuf, and post the results. That is what I did for post #7.

(a) If you have the source and it is low-level (C or C++) it is a one-line code fix. I don’t know how you would fix it in Python or Java, for example.

(b) If it uses stdio for every write on stdout and stderr, and does not mess with the buffering itself, then it should be effective to run it under control of stdbuf -o L. Every other process on the path to both.txt also must be line-buffered, for exactly the same reasons.

(c) You could have “yourcommand” insert timestamps on every line it wrote to either stdout or stderr. Actually, you cannot sort that output, because a sort requires all the data, in case the lowest key happens to be the last output.

In our situation, we know the times are monotonically ascending, so we can merge the two streams in time order. But that solution also has a sting in the tail.

Suppose the stderr only gets two lines written to it: Started at ymdhms, and Ended at ymdhms. That buffer perhaps only gets flushed after a six-hour run.

As we have not seen the Started record, we don’t know its timestamp. So we have to hold back the entire stdout data (all 55GB of it) until we see the Started record; put that out first; see the Ended record; put out the 55GB (because all those timestamps are earlier than Ended); put out the Ended record.

OK, that is an extreme case, but it illustrates the difficulty. If we could line-buffer the outputs, we would be able to merge the streams: but then if we could line-buffer the outputs, we would not need the timestamps either.

I don’t know how to explain this any more clearly, so here is an analogy.

There is a school trip, and you arrange for them to walk in single file, girls in one line, boys in the other, and they form an orderly queue. That should be tidy.

Next time, it is raining, so you send them down in several school busses. And each bus unloads all its pupils at the door, one bus at a time. So the busses form the queue, and the order that the children enter the event is nothing like the order in which they originally got on the busses.