validate tar file on tape

dernsdorff · August 1, 2011, 5:19pm

I've got a KSH/AIX question that I haven't been able to figure out yet.

I've got a tape archive program that "tar's" data to a tape. After creating the archive, I'd like to somehow verify that the tape is actually good. So, what I'd like to do as a simple "sanity" check that I can read the tape is to view the first entry of the tar file listing, basically

tar -vtf/dev/rmt0 | head -1

If I can successfully read the first entry, I'll assume (hope) the whole archive is okay. If I get an error, then maybe the tape was bad or some other problem.

If I simply "tar -vtf/dev/rmt0", I'll immediatley see the first entry, as "stdout" is line buffered. However, when I pipe it to "head", stdout is fully buffered, so I don't see any output until the "tar" finishes processing the whole file.

I'd like to tar the file until I get the first entry (or an error) and then terminate the tar command.

I've tried numerous combinations of redirection, piping, background processes, coprocesses, etc, but I can't come up anything that works for me. I can redirect stdout to stderr (which isn't buffered) so that I can see the output immediately, but then I can't read the output to be able to terminate the tar command after the first entry is read.

Any ideas?

Thanks!

Don

DGPickett · August 2, 2011, 4:01pm

Write 2 tapes? Go to disk? Tape may read ok and then not read the next time, when its time has come. Some drives do a read-after-write check, so for them it should be good, but I expect most have tossed that as they went to video scan style writing.

You can put a test file on the tar, or memorize the name of the first file, and extract just that file on your test pass, doing a 'cmp' on the file on stdout pipe. At least you know the heads started out, and are now again, clean.

dernsdorff · August 3, 2011, 12:28pm

Extracting one file, even if it's the first file, still runs through the whole tar file (in case there are multiple copies of that file in the tar). I suppose I could do that in the background, compare the sizes of the original and extracted file, and if/when they are the same, terminate the background process. Kind of messy...

There must be some way to do the listing, get the first entry, and then terminate the listing. I just can't come up with the right combination of pipes/coprocesses/etc... I figured somebody would know how to do this.

Thanks.

Don

DGPickett · August 3, 2011, 1:05pm

You could head N characters from the file to tar xf - to stifle searching farther.

head -262144c <tar_file | tar xvf - first_file

dernsdorff · August 3, 2011, 2:51pm

Hmmm, it took some trying, but I think I got that to work. I had to calculate how many blocks (default is 20 512-byte records per block) I'd need to "head" to contain the first file and it's header. Then I used that block number * 10240 for the number of bytes to "head". That worked for me (at least on my various test tar files)! (If I just "head"ed a number larger then the file plus it's header, it didn't always work. In fact, for one of my test tar files, with the first file being 49 bytes long, heading 5000 bytes of the tar file extracted the file but hung up for some reason.)

Thanks DGPicket!! Guess I'll go with that! (I'd still rather do the other method somehow, but unless someone comes up with a solution, I guess this method will have to do.)

Don

DGPickett · August 3, 2011, 3:15pm

You gotta fill a lot of buffers in this life, eh?

Verify the whole thing if you have to move the tape. Get off of tape! I recommend a cheap PC with big, cheap drives on a back to back lan cable. Maybe even externalizing kits on cheap drives. Or Mozy -- off premises sooner!

If you could get a Hierarchial file system set up right, the stuff would just duplicate itself over the net to places far enough away, and the local copy could be deleted if not in use yet be available. I keep working on how to have a fluid pool of systems providing redundant storage using a mix of compressed and uncompressed mirror copies and autmoatic data migration based on backup spec (keep 3 copies 50 miles apart) and speed/size of storage device (low use data seeks slower, bigger (thus more chance of a use) devices). High use data gets replicated everywhere it is used. Every change after quiescence is a version, also saved. If a box went down, all files not adequately backed up are copied by machines with space from other machines. Take any machine any time. Just add machines or disks anywhere if space is low. Disk and net and even computers are cheap.

methyl · August 3, 2011, 6:22pm

A solution which I have used in 1980's backup solutions before there were much better commercial solutions.
Append archives on a tape by using the "no rewind" device. Make your first archive a simple text file containing the identity of the archive. Append your second and subsequent archives to the tape one-by-one with the "no rewind" device. Use the unix "mt" command to navigate the tape partitions. Read the first tape partition to check that you can access the the tape and that it is the correct tape.

Personally I would never use "tar" for any serious backup (but it has a use for cross-platform file copies).

If you don't have "large files" the unix "dump" and "restore" programs are what you should use if you don't have a proper commercial backup solution. These commands append backups of disc partitions to tape and allow restore of a whole partition or individual files. You can still have the first partition containing a simple text file to identify the tape.

To answer your original question you could use the unix "dd" command to read the first few blocks off the tape. The unix "head" command (on a tape device not the output from a "tar" archive contents list) is totally irrelevant because this is a tape device is not a text file.

Yeaboem · August 3, 2011, 10:03pm

Another tip from the 1980's ... adjust your file selection list fed to the tar command to include some "First" and "Last" file. You can then write a script to verify the integrity of your tapes (before sending off-site?) by reading back the "First" and "Last" files. If you accidentally overflowed your tape, or the tape is "bad", you'll be missing the "Last" file when your tar -x completes, prompting further investigation.

DGPickett · August 4, 2011, 5:15pm

I think the requirement was to avoid a second full pass on tape.

I figured head would stop tar and the tape when it wrote to EOF or got EPIPE: Not beautiful, but simple.

Yeaboem · August 4, 2011, 7:40pm

Sorry, I assumed the original poster wanted, in his own words, to verify that the tape is actually good. IMHO, Reading the first few blocks for the tar header and perhaps a portion of the first file is not sufficient. If the purpose of writing the tape is to have the assurance of a restorable backup, the time spent performing a full read pass over the tape is a small price to pay for the peace of mind that actually knowing a backup is valid provides.

That being said, the quick and dirty kludge would be to use:

$ dd if=/dev/rmt0 bs=20b count=1 2>/dev/null | tar -tvf - 2>/dev/null | head -1

..to get the first filename off of his tape.