Need shell script to read two file at same time and print out in single file

sreedhargouda · March 7, 2009, 7:02pm

Need shell script to read two file at same time and print output in single file

Example I have two files 1) file1.txt 2) file2.txt

File1.txt contains

Aaa
Bbb
Ccc
Ddd
Eee
Fff

File2.txt contains

Zzz
Yyy
Xxx
www
vvv
uuu

I need output file like output.txt

Aaa
Zzz

Bbb
Yyy

Ccc
Xxx

Ddd
www

please help me any one

mjd_tech · March 7, 2009, 10:11pm

There are probably a million better ways to do this...
but here goes...

#!/bin/sh
linecount1=$(wc -l file1.txt | cut -d\  -f 1 )
linecount2=$(wc -l file2.txt | cut -d\  -f 1 )
i=1 j=1
while [ $i -le $linecount1 -a $j -le $linecount2 ]; do
    sed -n -e "${i}p" file1.txt >> output.txt
    sed -n -e "${j}p" file2.txt >> output.txt
    echo "-----" >> output.txt
    i=$(( i + 1 )); j=$(( j + 1))
done

I had to do the cut stuff when assigning linecount, because my wc -l prints the file name after the number of lines. (Debian)

Regards,
MD

ripat · March 8, 2009, 6:18am

With (g|n|m)awk

awk 'NR==FNR{_[NR]=$0}NR!=FNR{print _[FNR] "\n" $0 "\n---"}' file1 file2

mjd_tech · March 8, 2009, 11:37am

ripat,

I knew there would be a better way, most likely awk.
Here, I try to explain how it works...

awk '
NR==FNR {_[NR]=$0}  
                    # NR is total number of lines read in both files.
                    # FNR is number of lines read, reset for each file.
                    # _[NR] is an array named _ with NR as index.
                    # This loads array with contents of file1,
                    # and ignores file2.
                    
NR!=FNR {print _[FNR] "\n" $0 "\n---"}
                    # This part ignores file1.
                    # For each line in file 2,
                    # Print the corresponding line from the array,
                    # using FNR as index.
                    # then a newline, then the line from file2,
                    # another newline, then the dashes, then another newline.
' file1 file2

Hopefully, this is a correct explanation.
I was initially confused by the _[NL] thing, thinking this was some kind of special awk reserved variable or something, but then I realized it's just an array named _
I guess that if file1 were super huge, you could run out of memory because it gets loaded into an array. There's probably a way to do this in perl that can handle huge files without slurping into an array...

Regards,
MD

ripat · March 8, 2009, 12:18pm

That's exactly it.

_ has no special meaning. It's just a fast and, I confess, obfuscated way of writing array names.

Well, maybe. However, awk never hit the memory barrier on my systems even on large test files (1 Mega lines and more). But you are probably right. You could do that by parallel looping through two file handles in perl.

Franklin52 · March 8, 2009, 1:33pm

awk '{print;getline < "file1";print $0 "\n---"}' file2

Assuming the files have the same number of lines.

Regards

sreedhargouda · March 8, 2009, 4:03pm

thank you all for quick reply

drl · March 8, 2009, 4:33pm

Hi.

Here is a different view of the problem.

This may be conceptually simpler, but requires a feature of the bash shell that is not often used. Essentially, we cause lines to be read in sets, one from each data file. The third file is not a file, but a process -- i.e. a running program.:

#!/usr/bin/env bash

# @(#) s4       Demonstrate gathering, perfect shuffle, with paste.

echo
set +o nounset
LC_ALL=C ; LANG=C ; export LC_ALL LANG
echo "Environment: LC_ALL = $LC_ALL, LANG = $LANG"
echo "(Versions displayed with local utility \"version\")"
version >/dev/null 2>&1 && version "=o" $(_eat $0 $1) paste jot
set -o nounset
echo

FILE1=data1
FILE2=data2

echo
echo " Data file $FILE1:"
cat $FILE1

echo
echo " Data file $FILE2:"
cat $FILE2

echo
echo " Results:"
n=$( wc -l $FILE1 )
paste -d '\n' $FILE1 $FILE2 <( jot -b "===" $n )

exit 0

Producing:

% ./s4

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.11-x1, i686
Distribution        : Xandros Desktop 3.0.3 Business
GNU bash 2.05b.0
paste (coreutils) 5.2.1
jot - ( /usr/bin/jot Aug 18 2003 )


 Data file data1:
Aaa
Bbb
Ccc
Ddd
Eee
Fff

 Data file data2:
Zzz
Yyy
Xxx
www
vvv
uuu

 Results:
Aaa
Zzz
===
Bbb
Yyy
===
Ccc
Xxx
===
Ddd
www
===
Eee
vvv
===
Fff
uuu
===

The drawbacks are that a shell with that specific feature is needed, the utility jot is needed, and a file must be read to get the line count. However, it may be easier than thinking about awk, perl, tcl, etc ... cheers, drl

ShawnMilo · March 9, 2009, 3:27pm

I know it's too late, but I wanted to throw in a simple Python solution.

 $ cat test.py
#!/usr/bin/env python

file1 = open('f1', 'r')
file2 = open('f2', 'r')

for line in file1:
    print line,
    print file2.readline(),
    print "--"

summer_cherry · March 10, 2009, 4:17am

paste -d"\n" file1 file2 | awk '{print;if(NR%2==0){print "----"}}'

mjd_tech · March 16, 2009, 1:36pm

Here's a way to do this using only shell with no external commands.
It's nowhere near the elegance of the other solutions posted here.:o
It's an example of how to use file descriptors to read more than one file at a time.

If both files have same number of lines...

#!/bin/sh
 exec 3< file1.txt    # Open file1.txt for input on file descriptor 3
 exec 4< file2.txt    # Open file2.txt for input on file descriptor 4
 
 while : ; do    
     read <&3 line1        # read next line from file1.txt into line1
     [ $? -ne 0 ] && break # If finished reading file1, break out of loop.
     echo "$line1"         # otherwise print the line.
     read <&4 line2        # do the same for file2.txt
     echo "$line2"
     echo "-----"  
 done
 
 3<&-    # Close the file descriptors
 4<&-

This will output everything to stdout. Can redirect to a file if desired.

If files have unequal number of lines, and we want to print all lines of both files, we need a bit more code.
This will print the contents of both files, with the ----- separator, until the shorter file is finished.
Then it just prints the rest of the longer file, with no ----- separator.

#!/bin/sh
exec 3< file1.txt    # Open file1.txt for input on file descriptor 3
exec 4< file2.txt    # Open file2.txt for input on file descriptor 4
F1=0 F2=0            # Flags indicating file has more to read, 0 = true in shell.

while [ $F1 -eq 0 -o $F2 -eq 0 ]; do    # At least one file has more to read
   read <&3 line1        # read file1
   F1=$?                 # Gets set to non-zero if finished reading
   [ $F1 -eq 0 ] && echo "$line1"   # don't print if we didn't read something.
   
   read <&4 line2
   F2=$?
   [ $F2 -eq 0 ] && echo "$line2"
    
   [ $F1 -eq 0 -a $F2 -eq 0 ] && echo "-----"   # Print if both files are still reading

done

3<&-    # Close the file descriptors
4<&-

Probably a way to do this with fewer tests...Oh well...

-MD

Need shell script to read two file at same time and print out in single file

Aaa Zzz

Bbb Yyy

Ccc Xxx

Aaa
Zzz

Bbb
Yyy

Ccc
Xxx