Search and compare files from two paths

Optimus81 · March 27, 2013, 10:30am

Hi All,

I have a 2 path, one with oldfile path in which has several sub folders,each sub folders contains a config file(basically text file), likewise there will be another newfile path which will have sub folders, each sub folders contains a config file.

Need to read files from oldfile path sub folders and from newfile path sub folders and do sdiff command on those 2 files ie,

sdiff /oldfile path/sub folderA/fileA.txt /newfile path/sub folderA/fileA.txt |egrep '>|<|\|' > folderA_fileA_result.txt

but before using sdiff, need to sort the both the files ie,

sort /oldfile path/sub folderA/fileA.txt > fileA.txt
sort /newfile path/sub folderA/fileA.txt > fileA.txt

problem is not sure how search for same sub folder name from both oldfile path and newfile , if foldername matched then basically need to do below :

a. First use sort command to sort the content of each files

b. Compare files using sdiff command and save the sdiff result file in output folder

c. Need to archive the result files with some versioning, so that when there is new config files, then it has compare the oldfiles from archived folder with new files

Note : All the config files is basic text files having several lines.

---------- Post updated 27-03-13 at 09:29 AM ---------- Previous update was 26-03-13 at 03:50 PM ----------

ok, i tired comparing same file names from 2 different directories but am getting the below error, even though i have kept the same file in both folders. Am not sure how to skip . and .. file check

 
# Check the number of input parameters. If two parameters are given go ahead, else exit
if [ $# -eq 2 ]
then
OLDFILESDIR=$1
NEWFILESDIR=$2
else
echo "Usage: script.sh oldfilesdir newfilesdir"
exit
fi
# Validate from_directory
if [ ! -d "${OLDFILESDIR}" ]
then
echo "Directory ${OLDFILESDIR} does not exist!!"
exit
fi
# Validate to_directory
if [ ! -d "${NEWFILESDIR}" ]
then
echo "Directory ${NEWFILESDIR} does not exist!!"
exit
fi
cd ${OLDFILESDIR}
#for i in `find . -type f`
for i in `find . -name '*.txt'`
do
if [ ! -f ${NEWFILESDIR}/$i ]
then
echo "Same filename doesn't found in ${OLDFILESDIR}/$i and in ${NEWFILESDIR}/$i"
else
echo "Same filename found in ${OLDFILESDIR}/$i and in ${NEWFILESDIR}/$i"
sort ${OLDFILESDIR}/$i
sort ${NEWFILESDIR}/$i
sdiff ${OLDFILESDIR}/$i ${NEWFILESDIR}/$i |egrep '>|<|\|' > resultfile.txt
fi
done

Error:

Same filename doesn't found in oldfiles/./tcp.txt and in newfile/./tcp.txt

---------- Post updated at 09:30 AM ---------- Previous update was at 09:29 AM ----------

ok, i tired comparing same file names from 2 different directories but am getting the below error, even though i have kept the same file in both folders. Am not sure how to skip . and .. file check

 
# Check the number of input parameters. If two parameters are given go ahead, else exit
if [ $# -eq 2 ]
then
OLDFILESDIR=$1
NEWFILESDIR=$2
else
echo "Usage: script.sh oldfilesdir newfilesdir"
exit
fi
# Validate from_directory
if [ ! -d "${OLDFILESDIR}" ]
then
echo "Directory ${OLDFILESDIR} does not exist!!"
exit
fi
# Validate to_directory
if [ ! -d "${NEWFILESDIR}" ]
then
echo "Directory ${NEWFILESDIR} does not exist!!"
exit
fi
cd ${OLDFILESDIR}
#for i in `find . -type f`
for i in `find . -name '*.txt'`
do
if [ ! -f ${NEWFILESDIR}/$i ]
then
echo "Same filename doesn't found in ${OLDFILESDIR}/$i and in ${NEWFILESDIR}/$i"
else
echo "Same filename found in ${OLDFILESDIR}/$i and in ${NEWFILESDIR}/$i"
sort ${OLDFILESDIR}/$i
sort ${NEWFILESDIR}/$i
sdiff ${OLDFILESDIR}/$i ${NEWFILESDIR}/$i |egrep '>|<|\|' > resultfile.txt
fi
done

Error:

Same filename doesn't found in oldfiles/./tcp.txt and in newfile/./tcp.txt

DGPickett · March 27, 2013, 5:35pm

Seems like deja vu. Compare the file lists of the two directories, and the file content, using something like this:

diff -U0 <(
  cd head1
  find * -type f | sort | xargs -r cksum ) <(
  cd head2
  find * -type f | sort | xargs -r cksum ) | while read  diff_ind  cksum  sz  path
do
 case "$diff_ind" in
 (-)
  echo "Deleted file '$path'."
  ;;
 (+)
  echo "New file '$path':"
  cat head2/$path
  ;;
 (*)
  echo "Changed file '$path':"
  sdiff head1/$path head2/$path
  ;;
 esac
done

For stricter delete/new checking, use 'comm' not 'diff -U0' but no '| xargs -r cksum' until later, when you know both are present (no tab prefix is delete, one tab is new, two tabs is both). You can report new/delete on stderr and pipe others to stdout to cksum to another while read to compare cksums before running an sdiff.

Optimus81 · March 28, 2013, 7:54am

Thanks DGPickett. Am notsure what change needs to be done for the above script. I did tried them by changing head1 to oldfile path and head2 newfile path.

Could you please correct me.

diff -U0 <(
cd /usr/config_check/oldfiles/
find * -type f | sort ) <(
cd /usr/config_check/newfile/
find * -type f | sort ) | while read diff_ind cksum sz path
do
case "$diff_ind" in
(-)
echo "Deleted file '$path'."
;;
(+)
echo "New file '$path':"
cat head2/$path
;;
(*)
echo "Changed file '$path':"
sdiff /usr/config_check/oldfiles/$path /usr/config_check/newfile/$path
;;
esac
done

After running above script, i get this error

root@att02 # ./cfile.sh
Changed file '':
sdiff: Cannot open: /usr/config_check/oldfiles//newfile/

DGPickett · March 28, 2013, 3:17pm

Well, I do not have the test facility, so you need to check where the blank line is coming from. I assume diff -U0 will print only lines starting with +, -, |; so stick a tee after it, or "pg ;true |" and see what the first part delivers. Or put the word echo before sdiff to see what the whole command line is.

The idea is that the lists of files are identical, so any delete or add will be - or + and the checksum does not matter. If the files are identical, the checksum and size will be identical, and diff should toss them. If the files are different, sdiff should be able to show that.

The blank line from diff can be filtered if just a nuisance, using case (?) to process only not empty lines and ignoring empty lines that fit ().

The comm command is much stricter, as it is not designed for the eye but for bit by bit perfection. However, comm is just good for finding deleted and new; to get from both to changed still needs a comparison by cksum result compare or cmp. Comm demands sorted inputs, so if two files have different cksum, they would sort to non-adjacent places. So, a comm of file names is nice, followed by a cmp of files.

( export LC_ALL=C head1=... head2=... # LC_ALL controls sort order, some systems sort not binary by default
comm <(
  cd $head1
  find * -type f | sort
 ) <(
  cd $head2
  find * -type f | sort
 )| sed '\
  s/^\t\t/both /
  t
  s/^\t/add /
  t
  s/^/del /
 ' | while read stat fn
 do
  case $stat in
   (add)
    echo "New: $fn"
    cat $head2/$fn
    ;;
   (del)
    echo "Deleted: $fn"
    ;;
   (*)
    if [ "" != "$(cmp $head1/$fn $head2/$fn 2>&1)" ]
    then
     echo "Different: $fn"
     sdiff $head1/$fn $head2/$fn 2>&1
    fi
    ;;
   esac
  done
 )

The \t needs to be a real tab, above. The comm unifies the two file name lists and tells you if they are a only, b only or both by tabbing the lines as if to put them in three columns. You can remove columns in com using -1 (new and both), -2 (old and both), -23 (old only), -3 (old and new but no both), etc. It is robust set logic for shell scripting.

Optimus81 · March 29, 2013, 7:33pm

Thanks DGPickett. Sure will tryout what you said.

I tried some very simple solution to compare *.txt files from 2 different directories. I was able to compare them and generate sdiff result.

#!/bin/bash
# cmp_dir - program to compare two directories
# Check for required arguments
if [ $# -ne 2 ]; then
    echo "usage: $0 directory_1 directory_2" 1>&2
    exit 1
fi
# Make sure both arguments are directories
if [ ! -d $1 ]; then
    echo "$1 is not a directory!" 1>&2
    exit 1
fi
if [ ! -d $2 ]; then
    echo "$2 is not a directory!" 1>&2
    exit 1
fi
# Process each file in directory_1, comparing it to directory_2
find $1/ -name '*.txt' -print | while read src
do
for filename in $1/*.txt; do
    fn=$(basename "$filename")
    if [ -f "$filename" ]; then
        #if [ ! -f "$2/$fn" ]; then
            #echo "$fn is missing from $2"
            #missing=$((missing + 1))
        #fi
                sort $filename
                #echo $filename
                sort $2/$fn
                #echo $2/$fn
                sdiff $filename $2/$fn | egrep '>|<|\|' > resultfile.txt
    fi
done
done

when i execute the above script ie,
./filecomp.sh oldfiles newfile

I get the resultfile.txt which will have the sdiff output(with grep). now my problem is not sure how create separate resultfile as it reads files.

I have 2 different folders :
a. oldfiles - conatains several files(.txt)
b. newfile - contains several files(.txt)

Files in the two folders will have the same filenames. ie,
oldfiles folder -
aa.txt
bb.txt

newfiles folder -
aa.txt
bb.txt

so, what am trying to do in above script, is to read file aa.txt from oldfiles folder and aa.txt from newfile folder then do sort/sdiff command and then put the result file in output folder with filename aa_result.txt

ie, output folder will contain results
aa_result.txt
bb_result.txt

this where am struck, how to get the separate resultfile on each inputfile.

Anyhelp will be greatful.

Optimus81 · April 2, 2013, 6:45am

Hi All,

Anyone can please give me the idea/solution. am still struck with no clue how to go about.

DGPickett · April 5, 2013, 4:18pm

Does sdiff recurse like diff, for dirs ( sdiff -bw head1 head2 )?

Consider something very readable but not sdiff, like: diff -bwU99999 head1 head2