Cpio back up on tape of 400GB file

Anku · January 31, 2024, 4:51am

is it possible to take backup of 400GB single file using cpio command ?

MadeInGermany · January 31, 2024, 6:31am

Yes.
Of course you can store a single file in a cpio archive.

Maybe a simple cp command would suffice?

A compressing command can make sense:
gzip < /srcdir/filename > /destdir/filename.gz

Or you can pipe the cpio output to gzip.

Anku · January 31, 2024, 6:39am

No actually its a single file of 400GB so it is throwing error that field width not sufficient because somehow , cpio have a limit of 4 GB to back up a single file .
So what i want to know is there any other way in cpio that can backup large file

command that i am using
find . -depth | cpio -ov | gzip > /dev/st0

MadeInGermany · January 31, 2024, 7:39am

Ah ok, cpio has a limit!
Then you cannot use it with your file

Just for my curiosity, why -depth? I think without it a restore would be a little faster.

Then try tar instead.
Wikipedia says the USTAR and GNU formats are without limits.

tar cvf - . | gzip > /dev/st0

List (table of contents):

gunzip < /dev/st0 | tar tvf -

Restore (extract):

gunzip < /dev/st0 | tar xvf -

Matt-Kita · January 31, 2024, 7:49am

$ find ./cache/
./cache/
./cache/dev
./cache/dev/example11n_1.fil
./cache/dev/example11n_2.fil
./cache/dev/example11n_3.fil
./cache/dev/example11n_4.fil
./cache/dev/renamer.cmd

$ find ./cache/ -depth
./cache/dev/example11n_1.fil
./cache/dev/example11n_2.fil
./cache/dev/example11n_3.fil
./cache/dev/example11n_4.fil
./cache/dev/renamer.cmd
./cache/dev
./cache/

Although I cannot think of a purpose for reverting the listing order of files/directories. Wouldn't it be better to get rid of all directory paths completely? As in "leave only file paths in the output, the directory structure should be reconstructed anyway" - or am I missing something?

MadeInGermany · January 31, 2024, 8:04am

Missing directories are created with some default owner/permission/time attributes.
The original attributes are archived with the explicit directories, and only then can be restored.

Without -depth the directories come first, and are created with the original attributes immediately.

Paul_Pedant · January 31, 2024, 7:55pm

Why would you use find for a single file ? Just use the file name !

Anku · February 2, 2024, 6:45am

Previously i was backing up the whole directory , that's why i have used find.

Anku · February 2, 2024, 8:26am

i understand that I can use tar but i want to know specifically does CPIO have any aspect or anything to backup a single file which is > 4GB ( i.e., 10GB )
Thanks

MadeInGermany · February 2, 2024, 9:20am

Well, Wikipedia says GNU cpio has a file size limit.

Paul_Pedant · February 2, 2024, 10:37am

That's clear, although it seems to have two limits for different output formats: 2GB for "bin" format (31 bits as in signed int) and 8GB for "ustar" format (33 bits as in what? ). Neither of those matches the 4GB experienced by Anku. Although the error message "field width not sufficient" suggests that the limitation is a 10-digit fixed-length field in a text header, not a binary limitation. (Mind you, tar headers also have fixed-width text fields, IIRC. AIX used to store numeric values left-justified with trailing spaces, and Unix right-justified with leading zeroes.)

man cpio itself also clearly states these limits, so they are not exactly obscure. There are five different formats, with three different limits on the size of an individual archive member.

I guess in 1977 (first release) 4GB would be enough to back up a complete disk. At that time, my mainframe manufacturer made the jump from 30 MB to 60 MB media. The removable disc cartridge looked like a cake-tray in a posh cafe -- 11 platters, about 15 inches diameter and 9 inches high, and the driver mechanism the size of a washing machine (but much less reliable). Head crashes were monthly occurrences, and usually dug a spiral furrow across the whole surface: the Unload mechanism often forgot to lift the flying heads off the disk before it retracted the heads out of the can.

cpio is not even Posix, has a clunky option set, and has ridiculous size limits (commensurate with its 47-year-old specification). I also see:

newc: The new (SVR4) portable format, which supports file systems having more than 65536 i-nodes.

Who would have thought you could ever have more then 64K files in a file system?

Paul_Pedant · February 2, 2024, 10:55am

You could use the split command to separate the 400 GB file into 100+ separate files, and then save those 4GB files as a multifile archive.

Of course, that would need another 400GB free disc space for the intermediate files. I thought you might compress the split files as you went, but split does not seem to have that option.

I have a split script somewhere that does not need that extra 400GB. It works by splitting off a specified size (e.g. 4GB) starting at the end of the file, and then truncating the original file to free up space for the next 4GB. That could easily have the compress stage added for each section.

If cpio has limitations, and tar is out of your comfort zone, you could just use dd to write the 400GB to a tape. You probably want to experiment with that: you may have to deal with distinct tape devices for rewind and non-rewind modes, and add your own tape-marks and labels.

Anku · February 2, 2024, 11:07am

if i am right you are talking about this split
split -b ( size ) filename

MadeInGermany · February 2, 2024, 11:59am

Yes.
But isn't it cumbersome?
GNU tar has built-in compression; the z option will use the gzip/gunzip automatically.
Backup the current directory:

tar -cvzf /dev/st0 .

List (table of contents):

tar -tvzf /dev/st0

Restore (extract):

tar -xvzf /dev/st0

Paul_Pedant · February 2, 2024, 12:18pm

That is the standard command that comes up with man split. Of course, you never revealed what Distro you are using, or what hardware, so YMMV. I would also suggest you allow a margin on the size rather than use the precise limit in the man page. Maybe:

split --bytes=4GB -d --suffix-length=3 myHugeFile 'Data_'

which will generate names like Data_000 up to Data_115 or so. Note 4GB means 4,000,000,000, not 4,294,967,295. Insanely, 4G would fail in the cpio, because it is ONE byte too many. The split files are numbered consecutively so you can recombine them later with a wildcard that sorts nicely: cat > myHugeRestore Data_*

Remember, you need extra space for the new files, which all have to be created before you can run cpio. And all that costs extra runtime too, to read and write the temporary files, which you then have to delete safely.

I have to admit, I do not understand why you insist on using cpio, which is from the dark ages. And remember, if you want to recover this file, you will have to reload all the split fragments from the tape, and cat than all together to rebuilt the original file, and then remove the temporaries again. And you need a spare 400GB to do all that in, too.

OK, on second thoughts, you do not need 400GB of free disc to retrieve the data. You can just append each file and delete, as you go. Something like:

: > myHugeRestore        #.. Create or empty the file to be restored.
for Part in Data_*; do
    cat "${Part}" >> myHugeRestore
    rm "${Part}"
done

Consider this:

tar cvfz /dev/rmt0 myHugeFile

would create a tape copy of your file on your tape device (you need to find out what this is actually called on your hardware), compressed for size (and improves speed -- tapes are slow), without any extra steps, without a spare 400GB being required. You might try a test save/restore first. I have not used a tape this century.

Neo · February 2, 2024, 2:46pm

Maybe you can search the source code and find how / where the limit is set and recompile without the limit or change the limit?

See for example:

Paul_Pedant · February 2, 2024, 5:59pm

I suspect the limit is in the representation of the file size in the archive member header. Typically, that would be useful to skip multiple tape blocks past a member that is not to be retrieved. Extending the field would then create a non-standard header that could only be read back by the same edited version of the cpio binary.