Help with selecting files from "diff" output

I have two directories Dir_A and Dir_A_Arc. Need help with a shell script.

The shell script needs to take the path to these two directories as parameters $1 and $2.

The script needs to check if any files exist in these directories and if either of the directories are empty then exit normally.

If files are present in both directories then run a diff something like

diff -r -q /path/to/Dir_A /path/to/Dir_A_Arc

From the comparison result

a. The files that are similar in both directories needs to deleted from Dir_A

b. Files that are new in Dir_A and files that differ in both directories needs to kept In Dir_A

Have the below script so far .

#!/usr/bin/bash
echo "$1 is the root directory "

echo "$2 is the archive directory"

if [ -e $1/*.sql ] && [ -e $2/*.sql ]

then 

echo "files exists in $1 and $2"

exit 0

else 

echo "no files exists"
fi

diff -r -q $1 $2 ;

rm -r `diff -sq  $1 $2 | awk '/are identical$/{print $2}'` ;

---------- Post updated at 04:56 PM ---------- Previous update was at 04:47 PM ----------

The output for the above script is:

dir/Dir_A is the root directory
dir/Dir_A_arc is the archive directory
Only in dir/Dir_A/sp: dev.txt 
Only in dir/Dir_A/sp: dev_22.txt
Only in dir/Dir_A/sp: dev_33.txt
Files dir/Dir_A/spp/Document.txt and dir/Dir_A_arc/spp/Document.txt differ
Files dir/Dir_A/spp/text.txt and dir/Dir_A_arc/spp/text.txt differ
Only in dir/Dir_A_arc/sp: dev_444.txt

Only in dir/Dir_A_arc/sp: dev_555.txt
rm: missing operand
Try `rm --help' for more information

The highlighted files are the ones I have to retain in Dir_A.

I would use comm, sort, find to compare file lists of each subtree, sed to separate the both from the a only and b only lists for futher processing or reporting, and cmp to decide if the both files differ:

comm <(
    cd xx
    export LC_ALL=C
    find * -type f|sort -u
  ) <(
    cd xx_Arc
    export LC_ALL=C
    find * -type f|sort -u
  )| sed '
    s/^\t\t//
    t
    s/^\t//
    t b
    w xx_only
    d
    :b
    w xx_Arc_only
    d
   ' | while read f
do
  if [ "" = "$(cmp xx/$f xx_Arc/$f 2>&1)" ]
  then
    ....
  fi
done

LC_ALL=C ensures a binary sort for comm. Diff is for humans, mostly. Bash, or ksh on /dev/fd/# systems, to get the <().

@DGPickett ...Thanks a lot for helping me out..!! :slight_smile:

I am having issues with the filenames with "space" in them can you help me out ?

Sorry, need more robust effort; use quotes around $f, like

if [ "" = "$(cmp xx/'$f' xx_Arc/'$f' 2>&1)" ]

Since the whole is already in double quotes, a single quote does not upset that quoting and it does not prevent expanding $f, since the single quote, in double quotes, is a literal for itself, not a live quote, yet. The expanded $f is in single quotes, so spaces are OK unless it gets passed through some shell again not properly quoted. For instance, a shell script should accept parameters as "$1" not barefoot, in case of meta-characters.

Now, if you have a file with a quote character in the name, anothe flavor of dealing with it is to convert anymets-characterslike space and quote into '?', a wild card for a single character but otherwise not a space or quote. In this case, just stick a '| tr ' ' '?' " before the "| while read " and all the spaces become '?'. Just make sure that the last use of it is barefoot so the shell can expand it and virtually quote it as $1 or whatever to the C program. In actuality, by that time it is converted to a null terminated string pointed to by some member of the argv[] array of character pointers. Quoting become start here and null terminate there in machine language. The C open() call and such can deal with the embedded spaces just fine, it is the shell that divides it into two or more arguments when it finds a $IFS character.

Hi.

There are standard (Linux) utilities to compare directories. Whether they will fulfill all your desires is up to you to find out.

See find same size file for a demonstration of fdupes and rdfind.

Best wishes ... cheers, drl