merging two files

rameshonline · June 28, 2004, 10:34am

Hi everyone,

I have two files which will be exactly same at first. After sometime there will be inserts in one file. My problem is how to reflect these changes in second file also.

I found out that any compare and merge utility would do the job like, GNU " sdiff " command. But the problem with sdiff is, it is an interactive tool. we have to tell the command what to do each time it finds a difference in the files. (Either merge the change or discard it by specifying 'l' , 'r' ..options)

do any one of you know how to automate this process.... or any other utility is available for doing this?

(sort with -m option is not what i want ..because i dont want to lose the order in the files)

Thanks in advance

Perderabo · June 28, 2004, 11:15am

cp firstfile secondfile
will ensure that any changes made to the first file are reflected in the second file.

rameshonline · June 28, 2004, 11:45am

hi perdarabo,

i am sorry to miss out an important point above.. My second file also can have inserts at the end but not in the middle...so if we do cp we will loose second file changes.

(to put it differently, i am looking at two files which are same at first but after sometime both will change)

dkaplowitz · June 28, 2004, 12:09pm

cp firstfile secondfile

will "clobber" the 2nd file. (clobber = completely replace/overwrite everything in "secondfile").

If you want to append data from firstfile, try

firstfile >> secondfile

, which appends the contents of firstfile to secondfile. Make sure there are two ">>" b/c only one ">" does the same thing as the above cp command. This is explained in good shell programming books under "input/output redirection".

PaulC · June 29, 2004, 3:16am

Surely all that is going to do is attach the entire content of firstfile onto the end of second file rather than just what has been updated.

Doesn't he want to take two data files that start with the same content, update both files with different sets of data and merge those 2 files into 1 file that contains all the data without replicating anything ?? (If you follow what I mean..)

I'm not sure if there are any UNIX commands that could do it, but I would look at using Perl or similar...

dkaplowitz · June 29, 2004, 3:33am

You're right. I didn't read his post clearly enough. The rsync utility can help: http://rsync.samba.org (is usually avail in most distros as well) It works locally as well as remotely and can use ssh for added security.

Ygor · June 29, 2004, 5:40am

Perhaps take a look at comm.

comm -23 prints only lines in the first file but not in the second

zazzybob · June 29, 2004, 5:49am

firstfile >> secondfile

That's going to do naff all unless firstfile is an executable, in which case the output (stdout) from firstfile will be appended to secondfile.

You'll need

cat firstfile >> secondfile

Cheers
ZB

rameshonline · June 29, 2004, 11:08am

Thanks guys for all of your help.

what PaulC is exactly what i want.... I am giving an example data set here ... Ygor ur comm seems to point me in right direction but its not what i want..

At first both files contain

this is mango
this is apple
this is orange

After sometime first file will have inserts anywhere in the file (middle, last, etc..) and second file will have inserts only at the end of the file. so after sometime files might look like

file1

this is mango
this is banana
this is apple
this is grape
this is pineapple
this is orange
this is watermelon

file2

this is mango
this is apple
this is orange
this is cherry
this is lemon

OUTPU should be

this is mango
this is banana
this is apple
this is grape
this is pineapple
this is orange
this is watermelon
this is cherry
this is lemon

comm command seems to work differntly...more over it needs data to be sorted which i dont want to do.

So if any of you has any idea please pass it to me...thanks guys once again.

zazzybob · June 29, 2004, 11:14am

I think comm is what you want.

If you do the following

comm -12 firstfile secondfile >> thirdfile
comm -23 firstfile secondfile >> thirdfile
comm -13 firstfile secondfile >> thirdfile

That will give you lines common in both files, followed by lines
appearing only in the first file, followed by lines appearing only
in the second file.

Cheers
ZB

Ygor · June 29, 2004, 1:34pm

Perhaps use diff, but reformat the output...

diff file1 file2 | awk '/^[<>]/{print subst($0,3)}'

milhan · June 29, 2004, 3:17pm

if you do not mind outfile being in sorted order you can try :

cat file? | sort | uniq > outfile

peace...

Gary_Dunn · June 29, 2004, 5:35pm

hello,

there are many ways to skin a cat. All good suggestions, however, the file cannot be sorted. One of the files will only have appended data , meaning the data is attached to the end of the file only. The other file will have data inserts that could appear all over the file. From the example given:

file1

this is mango
this is banana
this is apple
this is grape
this is pineapple
this is orange
this is watermelon

file2

this is mango
this is apple
this is orange
this is cherry
this is lemon

OUTPUT should be

this is mango
this is banana
this is apple
this is grape
this is pineapple
this is orange
this is watermelon
this is cherry
this is lemon

----------------------------------

the one solution given by appending the second file appears to be what you want. That is if this example holds true. The implementation needs a little going over.

If only one file, file1 will have modified data inserts all over its contents. This can be your primary file.
The second file does not really need to start off with any content since the only data modifications to be performed are to be done on the EOF. New data appended to back of file.
If you want to keep a copy of the original file then simply have three (3) files instead of two (2)

When the event happens that you merge these files say a specific time sceduled in cron or by size or what ever it is you simply take the 3rd files contents , which are the records originally appended to the back of the secondary file. ( which now is just a copy of the original file ) a 3rd file has the new records.
then simply append file2 to file1.

--------------------
original data
----
this is mango
this is apple
this is orange

--------------------
file 1 & file 2 look like

this is mango
this is apple
this is orange
-------------------------
file 3
null
-------------------------
file1 after inserts

this is mango
this is banana
this is apple
this is grape
this is pineapple
this is orange
this is watermelon

file2-----------------
simply a copy or dummy file showing appends

don't really need except for archive purposes

this is mango
this is apple
this is orange
[this is cherry]
[this is lemon]

-------------------------------------
3rd / ( or 2nd ) file
this is cherry
this is lemon

OUTPUT file ( cat file2 >> file1 )

this is mango
this is banana
this is apple
this is grape
this is pineapple
this is orange
this is watermelon
this is cherry
this is lemon

rameshonline · June 29, 2004, 6:06pm

Thanks a lot guys ... for so many replies.

Gary solutiioon is a good one. I think for now it works....

But i have a question what if we have been already given files in that format meaning we dont have control over the files from start...so then how anybody is gonna solve....

zazzybob solution also should work hypothetically but for some reason comm command is behavnig strangly... Its not giving what it should...

Once again thanks for all suggestions.

ruchirmayank · April 6, 2009, 1:20am

Hi,

U can simply do firstfile >> secondfile; sort -u secondfile
So it will copy all the contents of firstfile to second file and then eliminate any duplicate entry in secondfile that may have come form the fist file