Chaining together exec within find

I need to do the following with a find command on my AIX box

Find all files that are -type f

Then do the following steps:-

  • Take a listing of them, and write them to a log in /directory/backup/log
  • Tar them up in /directory/backup/tar
  • and remove the files.

Here is what I have so far:

find /directory/files -type f -exec ls -latr >> /directory/backup/log/2.log \; -exec tar -cvf * /directory/backup/tar/testtar.tar * \; -exec rm -f {} \;

When I have three files-

 
 Testfile1.text
 Testfile2.text
 Testfile3.text
 

It does the following-

Removes the first file
Tars up the second two
Writes the second two to a log in the directory the find statement is being executed in

What could I be doing wrong?

Thanks in advance

What -exec tar -cvf * /something/ * does only depends on your current directory, because the *'s are evaluated by the shell before it is run. exec is not a shell and would not handle the *'s if you escaped them, either.

What is that one in particular meant to do?

Thanks for the input.

I want it to tar up the results of the find command. There will be other parameters, such as size, age, etc.. But for this example I only included -type f

>> /directory/backup/log/2.log is similarly handled by shell, not exec, but in this case probably does close to what you want, except it will capture all stdout, not just ls. This may not matter if nothing else prints to stdout.

I still don't understand what that tar command is intended to do.

What filename is it supposed to create, and where?

Since tar may be run multiple times, you need to use the append option, not the create option.

How about:

$ tar -rf archive.tar # Create empty tar file to append to

$ find testout -type f -exec echo ls -latrd '{}' ';' -exec echo tar -rvf archive.tar '{}' ';' -exec echo rm '{}' ';'

ls -latrd testout/testfile1
tar -rvf archive.tar testout/testfile1
rm testout/testfile1
ls -latrd testout/testfile2
tar -rvf archive.tar testout/testfile2
rm testout/testfile2
ls -latrd testout/testfile3
tar -rvf archive.tar testout/testfile3
rm testout/testfile3

# remove echos to actually run these commands instead of printing them

If you have GNU find, you can use + instead of ; for increased efficiency as it will bundle several files into each call:

$ find testout -type f -exec echo ls -latrd '{}' '+' -exec echo tar -rvf /absolute/path/to/archive.tar '{}' '+' -exec echo rm '{}' '+'

ls -latrd testout/testfile1 testout/testfile2 testout/testfile3
tar -rvf /absolute/path/to/archive.tar testout/testfile1 testout/testfile2 testout/testfile3
rm testout/testfile1 testout/testfile2 testout/testfile3

$

Here is what I am trying to do, and I could be taking the wrong approach.

I have a bunch of directories that need to have their files purged, based on certain criteria. For example.

/folder/one <- Delete files that are 30 days old, and more than 2MB
/folder/two <- Delete files that are 7 days old, and have the extension PDF
/folder/three <- Delete files that are 7 days old, extension PDF, more than 2MB

We have a script that runs that does a basic find, and exec -rm -f, but we want to add logging, and take a compressed backup of the files, and throw them into a preserve directory for X days until we need them.

Any suggestions? Am I taking the right approach.

Thanks

I take it this script would run automatically at intervals. Given that, I think you could make your approach work. I'd break the archiving and deleting into two steps, so you can bail in case of error before files are trashed, rather than after.

Also, Once a tarball is created and compressed, it's essentially uneditable, so you have to compress it after you're finished appending to it, not during.

# Logfile for errors, >&2 and any errors printed by tar/gzip/etc
exec 2> /path/to/errorlog
# Logfile for files, captures default stdout
exec 1> /path/to/filelog

TSTAMP=$(date +%Y-%m-%d)
TARBALL=/path/to/folder/$TSTAMP-one.tar

echo "$(date '+%Y-%m-%d %H:%M:%S') $0 Beginning execution" >&2

echo "# Archiving to $TARBALL"

if [ -e "$TARBALL" ] || [ -e "$TARBALL".gz ]
then
        echo "$(date '+%Y-%m-%d %H:%M:%S') $TARBALL already exists, refusing to overwrite" >&2
        exit 1
fi

tar -rf "$TARBALL" # Create empty tar file to append to

if ! find one -type f -exec echo ls -latr '{}' '+' -exec echo tar -rvf /absolute/path/to/archive.tar '{}' '+'
then
        echo "$(date '+%Y-%m-%d %H:%M:%S') Creating archive failed" >&2
        rm -f "$TARBALL"
        exit 1
fi

if ! gzip "$TARBALL"
then
        echo "$(date '+%Y-%m-%d %H:%M:%S') Couldn't compress $TARBALL" >&2
        exit 1
fi

if ! find one -type f -exec echo rm '{}' '+'
then
        echo "$(date '+%Y-%m-%d %H:%M:%S') Error removing files" >&2
        exit 1
fi

echo "$(date '+%Y-%m-%d %H:%M:%S') $0 completed successfully"

No. Do not use -exec ... + in cases like this. If there are enough files to trigger an invocation of one of these -exec primaries before the find has processed the entire file hierarchy, the list of files processed by each -exec primary is likely to have a different set of operands that the other -exec primaries. For example, the 1st invocation of ls might process 100 files, the 1st invocation of tar might process 95 files, and the 1st invocation of rm might process 105 files. The 2nd invocations of ls and tar will then fail because the 1st invocation of rm will have removed some of the files before they were listed and archived.

If there aren't enough files in the file hierarchy being processed by find to trigger invocations of of those tree utilities until the entire file hierarchy has been traversed, all three utilities could be run in parallel again allowing rm to remove some or all of the files before they are listed and archived.

I would like to chain everything together using one find command. Here is what I am attempting, and my output.

 
 find /directory/toscan -type f -exec ls -latr '{}' '+' -exec tar -rvf /directory/foroutput/archive.tar '{}' '+' -exec rm '{}' '+'
 

I am getting this -

 
 ls: illegal option -- v
usage: ls [-1ACFHLNRSabcdefgiklmnopqrstuxEUX] [File...]

 

I am on aix -

 
  
 6100-09-06-1543

 
find /directory/toscan -type f -exec ls -latr "{}" \; -exec tar -rvf /directory/foroutput/archive.tar "{}" \; -exec rm "{}" \;

I see that you have chosen to ignore the problems I mentioned in post #9 in this thread. You do so at your own peril!

From the error you have shown us, we might guess that one or more of the files you are processing has a hyphen as the first character of the pathname that is being passed to ls by find . But, that can't be the case with the command line you have shown us since every pathname that find would pass to ls would have to start with /directory/toscan and the options you have find passing to ls do not include -v .

Are you absolutely positive that the diagnostic you have shown us from ls came from one of the invocations of ls in the find command above?

Then use one find command. But modify Corona's code as follows:

# Logfile for errors, >&2 and any errors printed by tar/gzip/etc
exec 2> /path/to/errorlog
# Logfile for files, captures default stdout
exec 1> /path/to/filelog

TSTAMP=$(date +%Y-%m-%d)
TARBALL=/path/to/folder/$TSTAMP-one.tar

echo "$(date '+%Y-%m-%d %H:%M:%S') $0 Beginning execution" >&2

echo "# Archiving to $TARBALL"

if [ -e "$TARBALL" ] || [ -e "$TARBALL".gz ]
then
        echo "$(date '+%Y-%m-%d %H:%M:%S') $TARBALL already exists, refusing to overwrite" >&2
        exit 1
fi

tar -rf "$TARBALL" # Create empty tar file to append to

echo ls -ltr "$@"

if ! echo tar -rvf /absolute/path/to/archive.tar "$@"
then
        echo "$(date '+%Y-%m-%d %H:%M:%S') Creating archive failed" >&2
        rm -f "$TARBALL"
        exit 1
fi

if ! gzip "$TARBALL"
then
        echo "$(date '+%Y-%m-%d %H:%M:%S') Couldn't compress $TARBALL" >&2
        exit 1
fi

if ! echo rm "$@"
then
        echo "$(date '+%Y-%m-%d %H:%M:%S') Error removing files" >&2
        exit 1
fi

echo "$(date '+%Y-%m-%d %H:%M:%S') $0 completed successfully"

amd call it something like archive_and_delete . If AIX has xargs use:

find .... -print | xargs archive_and_delete

Even better if you can use

find .... -print0 | xargs -0 archive_and_delete

If AIX does not have xargs use:

find .... -exec archive_and_delete {} '+' 

CAVEAT: I haven't tried the above script and I may even have introduced bugs into it with my edit. I am also assuming that you will continue to use the -type f directive to pass filenames rather than directory names.

Andrew

And this is a problem why?

Doesn't seem to work that way, and I can't imagine why it would. Why wouldn't all three execs get the exact same files?

Does this actually happen? find doesn't run things in parallel to my understanding.

This is a problem because rm may remove a file before it is listed by ls and archived by tar .

The -exec ... + primary gathers arguments for each invocation of the specified utility with the guarantee that the arg list used will not exceed the system's ARG_MAX limit. It does not use a fixed number of operands to be passed to a utility when it is invoked. Since the utility name and argument list for rm just includes rm before the list of pathname operands, the argument list for ls include the utility name and the options ( ls -latr ) before the pathname operands, and the argument list for tar is even longer ( tar -rvf /directory/foroutput/archive.tar ), there is a chance that the number of pathnames given to tar may be less than the number of pathnames given to ls which may also be less than the number of pathnames given to rm . Therefore, the first invocation of rm may remove one or more files before the second invocation of ls or tar have a chance to process them.

I don't know whether or not the implementation of find on the original poster's system does this or not. The standards say this about -exec ... + :

The text marked in red above clearly allows invocations of the three utilities in the three -exec primaries to be invoked in any order and sequentially or in parallel as long as each of the utilities that needs to be invoked more than once completes processing earlier sets of pathnames for that -exec primary before it is invoked again to process a later set of pathnames for that -exec primary.

1 Like

Probably it should collect them in parallel but execute them from left to right.
I have found different implementations of {} + , and some are buggy. I suspect that AIX find is buggy, too.
--
A method to run an 'embedded' shell script

find /directory/toscan -type f -exec bash -c '
ls -ltar "$@"
tar -rvf /directory/foroutput/archive.tar "$@"
rm "$@"
' bash {} +
1 Like

I haven't seen any reports about UNIX-branded implementations (including AIX) of find behaving contrary to the requirements of the standards in the last decade where the given command-line met the requirements stated by the standards. But, old systems and systems that aren't branded (or tested for conformance) do still exist.

On systems where find does meet the standard's requirements, your suggestion above looks like it should work as long as the code marked in red is removed, noting of course that the list of files produced will not be sorted in its entirety if the list of pathnames to be processed is too long to just invoke bash once.

But, if a file can't be archived because tar can't read it, the file may still be removed even though it wasn't archived. If the original poster wants to keep files that couldn't be listed and archived, you would need something more like:

find /directory/toscan -type f -exec bash -c '
ls -ltr "$@" &&
tar -rvf /directory/foroutput/archive.tar "$@" &&
rm "$@"
' {} +

to keep sets of files where one or more files in the list failed, or one of the two following suggestions:

find /directory/toscan -type f -exec bash -c '
for path in "$@"
do	ls -ltr "$path" &&
	tar -rvf /directory/foroutput/archive.tar "$path" &&
	rm "$path"
done
' {} +

or:

find /directory/toscan -type f -exec ls -ltr {} \; -exec tar -rvf /directory/foroutput/archive.tar {} \; -exec rm "$@" {} \;

to only keep individual files that weren't successfully archived, but, of course, these will run MUCH slower than the other suggestions and the list of files produced by these will be in the order in which they are found in the searched file hierarchy; not in reverse time order (even in subgroups in the 1st suggestion of these last two).

Note that there is no need for the ls -a option when regular filenames are given as operands (even if their name does start with a <period> character).

Might be simpler to have a function call? (keeps your code cleaner?) You can use xargs to ensure that they get processed something like this:-

#!/bin/bash
function process_one_file ()
{
  ls -l $@ &&
  tar -tvf /directory/foroutput/archive.tar "$path" &&
  rm "$path"
}

find /directory/toscan -type f | xargs process_one_file

Is that an option? It will ensure you files are sequentially processed but (might) avoid spawning a shell for each file found to run the commands.

I'm happy to be corrected if this has a flaw in it. One concern is how xargs would handle a file with spaces in the name.

Robin

@Don, the -exec requires to set the argv[0] for a script interpreter like bash .
This is certainly true for all LUnix - otherwise process names would always be the script interpreter (e.g. bash ).
For demonstration:

$ mkdir newdir
$ touch newdir/file{1,2,3}
$ find newdir -type f -exec bash -c '
echo "$@"
' {} +
 newdir/file2 newdir/file3
$ find newdir -type f -exec bash -c '
echo "$@"
' bash {} +
newdir/file1 newdir/file2 newdir/file3
$ 

In the first case the process name became newdir/file1.

2 Likes

Don't think xargs works that way. Not a shell internal, can't run shell functions.

1 Like