Inserting Header to another file

Need your help in appending header(file1 contains header ) to my file2. I am using KSH AIX OS.
I know how to do with taking temporary files.

cat file1  >temp
 cat file2 >>temp
  mv temp  file2
 

Is there way to append directly to a file in ksh.

I don't find Sed -i option on my unix flavor.

You can use awk , cat , ed , ex , sed (with or without -i ), or lots of other methods. All of them are going to use a temp file to prepend data into an existing file. Some of them will make the temp file obvious; others will use the temp file under the covers.

Can you please share me the method without using temp files. The problem I had is my data files are in GB's (20GB). My process takes more time to move 20GB of data into tmp file instead what I taught is adding a single line to 20GB data is much faster.

As I said before, there is no way to add data to the start of a file without copying the data at least once. If the file you're modifying has more than one link and you don't want to break the links or the temp file is on a different filesystem than the file you're modifying, you'll have to copy the data twice.

Even if you have more than 20B of memory you can allocate to a program to edit your file, you still have to read the entire 20GB into memory and write the entire 20+GB of data back into your file.

Of course you could create a new filesystem type that allows you to add data to either end of a file without moving existing data and create a new system call to write data into a file at negative offsets. (But, before you ask, I'll warn you that I'm not going to volunteer to design either of those for you for free in a forum like this!)

Is that "Inserting Header to another file" an exercise on its own or is it done in preparation of another processing step? If the latter, can't you "add the header" in that step? e.g. awk '...' file1 file2 or cat file1 file2

http://www.unix.com/answers-to-frequently-asked-questions/202403-edit-file-place-overwriting-original.html seems to be relevant here :cool:

Now that you know sed -i would create a "invisible" temp file too and your sed version does not support the -i option, you might want to try following Perl equivalent:

From perlfaq5 - perldoc.perl.org :

You can even add a line to the beginning of a file, since the current line prints at the end of the loop:

    perl -pi -e 'print "Put before first line\n" if $. == 1' inFile.txt

How about our good ole ex utility...

ex -sc '0r file1 | x' file2

---------- Post updated at 12:01 PM ---------- Previous update was at 11:57 AM ----------

AFAIK ed and ex would use an internal buffer instead of a temporary file...

Files do not work that way, you can't insert in the middle. That's why we have text editors to do it for us.

If whatever's using this huge file can read from standard input, you can skip editing the file, just add it to the stream -- print it first, then the rest of the file.

(       echo "header line"
        exec cat hugefile ) | programwhichuseshugefile
1 Like

This is implementation dependent.

$ strace ed file.txt 2>trace
q
$ grep "/tmp/" strace
open("/tmp/tmpfpHh7ZE", O_RDWR|O_CREAT|O_EXCL, 0600) = 3
unlink("/tmp/tmpfpHh7ZE")               = 0
$

Personally I would expect it to at least sometimes use a temp file -- it's effectively memory if the system is unloaded, and it's either that or crash on large files...

1 Like

Yes that might be implementation dependent depending on the file size...big files might need an on disk scratchpad whereas small files could be loaded into one of the internal buffers...

In some implementations they use a temp file, in others they use memory if the file will fit in memory and a temp file if it won't, and in others they use memory and die on an ENOSPC error if the file won't fit into memory.

Even in cases where ed and ex edit a file entirely in memory, the entire 20GB file being discussed in this thread has to be read into memory and the updated file has to be written back to the file. There just plain is absolutely no way to add text to the start of a file without reading the entire file at least once and writing the entire contents of the updated file at least once with current BSD, UNIX, Linux, or Windows filesystems.