Bash script to diff two arrays

Hi,

I am quite scripting illiterate and have been trying to write a bash script to compare to two files which i have populated in two seperate arrays as below and confirmed that all the files are loaded into the array.

IFS=$'\n'
filea=($(find /var/tmp/dir1 -type f -follow -print))
fileb=($(find /var/tmp/dir2 -type f -follow -print))
unset IFS

both directories and files are the same, I want to run a loop over both arrays and then i then want to run the following command and put the output into a variable

diff -a --suppress-common-lines -y $filea $fileb

is this how i should be setting the variable or should i be doing something different?

diffout=($(diff -a --suppress-common-lines -y $filea $fileb))

i would then run an IF statement to check if the variable is empty

if [-n "$diffout" ];then
yes | cp -f $filea /var/tmp/dir3

my overall script so far

IFS=$'\n'		
			filea=($(find /var/tmp/dir1 -type f -follow -print))
			fileb=($(find /var/tmp/dir2 -type f -follow -print))
		unset IFS
		
		for afile in "{$filea[@]}"
		do
		for bfile in "{$fileb[@]}"
		do
		diffout=($(diff -a --suppress-common-lines -y $filea $fileb))
				
		done
		
		if [-n "$diffout" ];then
			yes | cp -f $filea /var/tmp/destination	
		
		done

There are a few errors, some syntax:

  • $filea instead of correct "$afile" (the name is wrong, but also the quoting is essential)
  • {$ instead of ${
  • if , but no closing then
  • [-n instead of [ -n (the sspppace must be there
  • diffout=(... creates and array, yet you test for "$diffout" a plain variable..
  • What is with the yes piping in to cp -f (which is force mode in itself)

am i any closer?

apologies i am really still learning how to do scripting in general

        IFS=$'\n'        
            filea=($(find /var/tmp/dir1 -type f -follow -print))
            fileb=($(find /var/tmp/dir2 -type f -follow -print))
        unset IFS
        
        for afile in "${filea[@]}"

        do

        for bfile in "${fileb[@]}"

        do
            diffout=($(diff -a --suppress-common-lines -y "$afile" "$bfile"))
                
        
        if [ ${#diffout[@]} -eq 0 ]; then

        fi
        
        else
            cp -f $afile /var/tmp/destination
        fi
        
        done

Still some syntax:

  • two fi s for one if only
  • one done only for two do s

Some semantics:

  • no statements in the then branch

Some logics:

  • the entire script fails if there's a single file disturbing the sequence of files in either directory
  • Unless needed elsewhere, the diffout array doesn't make sense. Try if diff ...
        IFS=$'\n'        
            filea=($(find /var/tmp/dir1 -type f -follow -print))
            fileb=($(find /var/tmp/dir2 -type f -follow -print))
        unset IFS
        
        for afile in "${filea[@]}"

        do

        for bfile in "${fileb[@]}"

        do
            diffout=($(diff -a --suppress-common-lines -y "$afile" "$bfile"))
                
        
        if [ ${#diffout[@]} -eq 0 ]; then

        echo "No Difference in Configuration Detected"

        fi
        
        else
            cp -f $afile /var/tmp/destination
        
        done
        done

Basically what im looking for differences in configuration files and this would just be text, i was hoping with the

diffout

to check if it was empty or not, if empty then goto the next file otherwise copy

$filea

to the specified directory, once copied goto the next file.

apologies if i am just not getting it but like i said i know just the basics of scripting and still yet to learn proper logic etc

Not sure what you want to test - files to be empty or the difference between two of them to disappear?

The residual fi needs to include the else (come after it).

Some plain demo syntax:

num=2
val=1
another_value=3

# Since we KNOW for 100% its all numbers, no quotes are required around those variables
# If there is the SLIGHTEST chance (parsing an unkown file for example) ALL the variables containing content of those unkown file/lines should be quoted!

if [ $num -eq $val ]
then
	echo "I'm a 'If condition'"
elif [ $num -eq $another_value ]
then
	echo "I'm in 'elif condition'"
else
	echo "I'm in 'else statement'"
fi

for ITEM in *;do
	echo "Found $ITEM in $PWD"
done

Example output from within my tempdir:

0 ~/tmp/build $ bash ../jlykke

I'm in 'else statement'
Found "kernel" in /home/sea/tmp/build

Hope this helps

In addition to what has already been said, you might also want to consider:

  1. There will never be any matches in the arrays created by the statements:
    text filea=($(find /var/tmp/dir1 -type f -follow -print)) fileb=($(find /var/tmp/dir2 -type f -follow -print))

    because every element in ${filea[@]} will start with /var/tmp/dir1 and every element in ${fileb[@]} will start with /var/tmp/dir2 . With your nested for loops, you will be comparing every file in ${filea[@]} to every file in ${fileb[@]} . One would assume that you really just want to compare files with the same pathnames relative to /var/tmp/dir[12] .
  2. And, if you're just trying to determine if two files have the same contents, checking the exit status of:
    text cmp -s "$afile" "$bfile"

    would be much more efficient than using:
    text diff -a --suppress-common-lines -y "$afile" "$bfile"

    and checking whether the output is an empty string.

Hi Don,

I am looking to compare/diff the internal contents of two files, one will be in a SVN repo and the other will be in a server.

Essentially if someone edits the server config file I want to just test for any differences and if one does exist no matter what it is then copy that file to the repository.

I am just struggling to get the loop to accept the values from both arrays as arguments for cmp or diff

If i echo the values of $afile and $bfile they output on screen but cmp/diff returns no output even when i know there is differences in the file

Appreciate your response it makes sense but i know i still have a lot of learning to do!

Cheers
Justin

I see no reason for two arrays (or even one for that matter). Presumably you have one master source file hierarchy and you want to update copies of the regular files in that master source file hierarchy in a slave file hierarchy for cases where the corresponding master and slave files are different (or the slave file is missing). That could be done with something like:

#!/bin/bash
MASTER='/var/tmp/dir1'
SLAVE='/var/tmp/dir2'

cd "$MASTER"
find . -type f -follow -print | while read file
do	if [ ! -x "$SLAVE/$file" ] || ! cmp -s "$file" "$SLAVE/$file"
	then	cp -f "$file" "$SLAVE/$file"
	fi
done

and on many systems could be replaced by a single invocation of rsync with appropriate arguments.

1 Like

I had to muck around with sed to get the "./" stripped off the front and added $MASTER to front of other strings just for my own sanity when troubleshooting/testing it out, but it all works as i want.

i just need to add some more script to automate the checkout and commit but that i have worked out

Really appreciate your help and keep up the good work! :slight_smile:

#!/bin/bash
MASTER='/var/tmp/dir1'
SLAVE='/var/tmp/dir2'

cd "$MASTER"
find . -type f -follow -print | sed 's/..//' | while read filea
do

if [ ! -x "$SLAVE/$file" ] || ! cmp -s "$MASTER/$filea" "$SLAVE/$filea"

        then
        cp -f $MASTER/$filea $SLAVE/$filea
        fi

        done

I assume that you are aware that all of the following pathnames name exactly the same file when the current working directory is /var/tmp/dir1 :

/var/tmp/dir1/./file
/var/tmp/dir1//file
/var/tmp/dir1/file
./file
file
/var/tmp/dir1/././././././file
/var/tmp/dir1/.///////////file

So, you can include sed in that pipeline if you want to. It will give you exactly the same results, except it will run slower with the sed than without it.

And, you can change "$file" to "$MASTER/$file" in that loop if you want to. It will give you exactly the same results, except it will run slightly slower.

But, PLEASE, do not change:

        cp -f "$filea" "$SLAVE/$filea"

to:

        cp -f $MASTER/$filea $SLAVE/$filea

With your current filenames it happens to work, but those double quotes are there to protect you against the possibility of filenames containing <space>s and <tab>s. Get into the habit of quoting the expansion of any variables that contain user supplied strings and the expansion of any variables containing pathnames not explicitly created by your script. Adding the quotes will never hurt you. :slight_smile: Leaving out the quotes will eventually lead to you getting an irate call from a customer somewhere around midnight on a three day weekend. :mad:

Appreciate the help Don!