Comparing two .txt files in shell scripting...

prakash123 · June 30, 2008, 9:04am

Hi,
I have two big .txt files.and i need to compare those two files and redirect it into some other file.
If any body wants to resolve this issue then i can send the two text files.
Need some quick responce.

Thanks,
prakash

pharos467 · June 30, 2008, 9:19am

please check "man cmp". Provide manual for compare

sysgate · June 30, 2008, 9:20am

Look at your man pages for "diff", but basically :

diff file1.txt file2.txt > someotherfile.txt

prakash123 · June 30, 2008, 9:50am

Hi,Thanks for the quick reply.In those two commands we can have the difference.But my files are very big in size and i need the comparison which will show only the difference part.for example:Test1-Hi i am prakashTest2-Hi i amThen the required shell script should show the diffrence as praksh.But the diff command will show both the files and comparison.as my files are very big the redirected file is even bigger.There fore i need a generalized shell script which will workwaiting for the response.ThanksPrakash

vgersh99 · June 30, 2008, 10:58am

look into 'man comm'

awk · June 30, 2008, 11:06am

AIX and HP-UX have

bdiff

.

prakash123 · June 30, 2008, 11:06am

it is also not working.it is only showing the common parts.

vgersh99 · June 30, 2008, 11:11am

once again - go through 'man comm'.

prakash123 · June 30, 2008, 11:50am

Hi,
comm is not working properly.for example take two small files like

file1.txt--My name is x
My name is Y

file2.txt--My name is

The command or shell script should be like that the result should be

x
Y

Thanks,
Prakas

drl · June 30, 2008, 8:46pm

Hi.

These are often found in Linux systems, but you may be able to find them available on other systems (but I didn't see them on my Solaris system):

#!/bin/bash -

# @(#) s1       Demonstrate word differences.

echo
echo "(Versions displayed with local utility \"version\")"
version >/dev/null 2>&1 && version =o $(_eat $0 $1) wdiff docdiff lynx
echo

echo " Input file data1:"
cat data1

echo
echo " Input file data2:"
cat data2

echo
echo " Results with wdiff:"
wdiff -n -3 -w "" -x "" -y "" -z "" data1 data2 > t1
lynx -force-html -dump t1

echo
echo " Results with docdiff:"
docdiff --digest --word data1 data2 > t1
lynx -force-html -dump t1

exit 0

Producing:

% ./s3

(Versions displayed with local utility "version")
Linux 2.6.11-x1
GNU bash 2.05b.0
GNU wdiff 0.5
0.3.2
Lynx Version 2.8.5rel.1 (04 Feb 2004)

 Input file data1:
My name is x
Now is the time
Ernie went for a walk in the garden

 Input file data2:
My name is y
Now in the time
Cookie left his house for a walk along the garden path

 Results with wdiff:

   ======================================================================
   x y
   ======================================================================
   is in
   ======================================================================
   Ernie went Cookie left his house
   ======================================================================
   in along
   ======================================================================
   path
   ======================================================================

 Results with docdiff:

       _____________________________________________________________

     * 1,1

     My name is [DEL: x :DEL] [INS: y :INS]
     Now
         _____________________________________________________________

     * 2,2

     Now [DEL: is :DEL] [INS: in :INS] the time
         _____________________________________________________________

     * 3,3

     the time
     [DEL: Ernie went :DEL] [INS: Cookie left his house :INS] for a walk
         _____________________________________________________________

     * 3,3

     for a walk [DEL: in :DEL] [INS: along :INS] the
         _____________________________________________________________

     * 3,3

     the [DEL: garden :DEL] [INS: garden path :INS]
         _____________________________________________________________

Shell scripting alone is unlikely to satisfy any requirement that has to do with processing individual lines in large files. If you don't have or don't like wdiff or docdiff, then I think you'll need to continue searching, or write something in awk or perl ... cheers, drl

WebKruncher · June 30, 2008, 9:36pm

There are so many ways to skin this cat, but my favorite is the stl. If you use two custom streams, and tweak your buffers to the rightt size, you can do just about any kind of file comparison you can imagine, and get pretty good speed out of it.
I have a sample if you're interested. Here's an example of the streams (not completely tested).

WebKruncher.com/speedstreams.h - It's used like any typical file streams, just create and call getline 'till eof or read in chunks for a binary comparisons.
-Jmt