Cpio - input files (from list) are stored in different order inside cpio archive - why?

Due to budget constraints I have to reinvent an Enterprise backup system in a SPARC (sun4v) Solaris estate (10 & 11). (yep - reinvent wheel, fun but time consuming. Is this wise?! :confused: )

For each filesystem of interest, to try to capture a 'catalog' at the front of each cpio archive (for an easy scripted restore system that I will write later), I touch a file .${DIR}/.${OFILE}.fullscan that will hold a full filesystem scan. Then I run this to populate the first 'record' in the catalog file, so that the catalog file always sits at the head of the archive:

find .${DIR}/.${OFILE}.* -type f -mtime -1 -ls > .${DIR}/${OFILE}.fullscan

This populates our file list:

find .${DIR} -xdev -local -ls >> .${DIR}/.${OFILE}.fullscan

This then runs the actual cpio operation:

awk '{$1=$2=$3=$4=$5=$6=$7=$8=$9=$10=""; print $0}' .${DIR}/.${OFILE}.fullscan|\
  cut -c11- |cpio -oc 2>>/dev/null|gzip -qc1 ->${OUTFILE}

9 times out of 10 my ${OFILE}.fullscan appears in the first few files in the cpio archive. Occasionally it's a few files 'lower down' but always in the first 20. So good so far.

Today, on a Solaris 11.2 system I found the fullscan file over 1000 files into one of the cpio archives and another one over 6400 files into the archive. (In another they appeared at the end but I'm still checking that's not a 'code' issue!) :eek: Why?! Help!

I checked the text file content to make sure the top record was as expected for the examples that put the fullscan file much farther down the archive.

My worst case scenario is a 57M file filesystem (yep - source code repo) which generates a 9.5GB 'fullscan' file (over 11 hours) and due to other bits and pieces I need to do, I really, really don't want my catalogs appearing half way through that one. (This supersized filesystem backup will inevitably be broken up into smaller tasks but for now I'm just asking.)

Is this a multithreading effect? I don't believe awk or cut would reorder the list and so cpio would receive ${OFILE}.fullscan as the first argument. The file in question is being read but that wouldn't generate an exclusive lock to prevent access or anything like that from another read process. Note: for what it's worth these 2 anomalies occurred in /var of a guest Solaris zone that's visible from the global zone.

Can anyone think of a way to:

  1. Assuming there isn't a trivial answer to this - debug this easily for an explanation? (Bear in mind it's intermittent.)
  2. Workaround this? I want the catalog to always be quickly and easly accessed from (the top of?) many 100GB+ gzipped cpio archives! How can we persuade cpio to load files into the archive in exactly the order its file list is fed to it?

Your thoughts much appreciated.
Alex

I think your problem is that find takes the order in the directory.
In a simple file system initialy the order in the directory defaults to the order in which files are created. But if files are deleted, a new file can take the position of a deleted entry.
Not to speak of more complex file systems that use a hash table.

What is your file system type?
Perhaps you do not need to reinvent the wheel. There is ufsdump/ufsrestore for the ufs file system, and certainly another method for a zfs file system.