I knew there would be a better way, most likely awk.
Here, I try to explain how it works...
awk '
NR==FNR {_[NR]=$0}
# NR is total number of lines read in both files.
# FNR is number of lines read, reset for each file.
# _[NR] is an array named _ with NR as index.
# This loads array with contents of file1,
# and ignores file2.
NR!=FNR {print _[FNR] "\n" $0 "\n---"}
# This part ignores file1.
# For each line in file 2,
# Print the corresponding line from the array,
# using FNR as index.
# then a newline, then the line from file2,
# another newline, then the dashes, then another newline.
' file1 file2
Hopefully, this is a correct explanation.
I was initially confused by the _[NL] thing, thinking this was some kind of special awk reserved variable or something, but then I realized it's just an array named _
I guess that if file1 were super huge, you could run out of memory because it gets loaded into an array. There's probably a way to do this in perl that can handle huge files without slurping into an array...
_ has no special meaning. It's just a fast and, I confess, obfuscated way of writing array names.
Well, maybe. However, awk never hit the memory barrier on my systems even on large test files (1 Mega lines and more). But you are probably right. You could do that by parallel looping through two file handles in perl.
This may be conceptually simpler, but requires a feature of the bash shell that is not often used. Essentially, we cause lines to be read in sets, one from each data file. The third file is not a file, but a process -- i.e. a running program.:
#!/usr/bin/env bash
# @(#) s4 Demonstrate gathering, perfect shuffle, with paste.
echo
set +o nounset
LC_ALL=C ; LANG=C ; export LC_ALL LANG
echo "Environment: LC_ALL = $LC_ALL, LANG = $LANG"
echo "(Versions displayed with local utility \"version\")"
version >/dev/null 2>&1 && version "=o" $(_eat $0 $1) paste jot
set -o nounset
echo
FILE1=data1
FILE2=data2
echo
echo " Data file $FILE1:"
cat $FILE1
echo
echo " Data file $FILE2:"
cat $FILE2
echo
echo " Results:"
n=$( wc -l $FILE1 )
paste -d '\n' $FILE1 $FILE2 <( jot -b "===" $n )
exit 0
Producing:
% ./s4
Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.11-x1, i686
Distribution : Xandros Desktop 3.0.3 Business
GNU bash 2.05b.0
paste (coreutils) 5.2.1
jot - ( /usr/bin/jot Aug 18 2003 )
Data file data1:
Aaa
Bbb
Ccc
Ddd
Eee
Fff
Data file data2:
Zzz
Yyy
Xxx
www
vvv
uuu
Results:
Aaa
Zzz
===
Bbb
Yyy
===
Ccc
Xxx
===
Ddd
www
===
Eee
vvv
===
Fff
uuu
===
The drawbacks are that a shell with that specific feature is needed, the utility jot is needed, and a file must be read to get the line count. However, it may be easier than thinking about awk, perl, tcl, etc ... cheers, drl
Here's a way to do this using only shell with no external commands.
It's nowhere near the elegance of the other solutions posted here.:o
It's an example of how to use file descriptors to read more than one file at a time.
If both files have same number of lines...
#!/bin/sh
exec 3< file1.txt # Open file1.txt for input on file descriptor 3
exec 4< file2.txt # Open file2.txt for input on file descriptor 4
while : ; do
read <&3 line1 # read next line from file1.txt into line1
[ $? -ne 0 ] && break # If finished reading file1, break out of loop.
echo "$line1" # otherwise print the line.
read <&4 line2 # do the same for file2.txt
echo "$line2"
echo "-----"
done
3<&- # Close the file descriptors
4<&-
This will output everything to stdout. Can redirect to a file if desired.
If files have unequal number of lines, and we want to print all lines of both files, we need a bit more code.
This will print the contents of both files, with the ----- separator, until the shorter file is finished.
Then it just prints the rest of the longer file, with no ----- separator.
#!/bin/sh
exec 3< file1.txt # Open file1.txt for input on file descriptor 3
exec 4< file2.txt # Open file2.txt for input on file descriptor 4
F1=0 F2=0 # Flags indicating file has more to read, 0 = true in shell.
while [ $F1 -eq 0 -o $F2 -eq 0 ]; do # At least one file has more to read
read <&3 line1 # read file1
F1=$? # Gets set to non-zero if finished reading
[ $F1 -eq 0 ] && echo "$line1" # don't print if we didn't read something.
read <&4 line2
F2=$?
[ $F2 -eq 0 ] && echo "$line2"
[ $F1 -eq 0 -a $F2 -eq 0 ] && echo "-----" # Print if both files are still reading
done
3<&- # Close the file descriptors
4<&-
Probably a way to do this with fewer tests...Oh well...