merge 2 files (without repeating any lines)

I need to add the content of file1 to file2 - all lines but not those existing in file2 already, so the "cat file1 >> file2" doesn't work.

For example,
file1:
100 xxxxxx str1
102 xxxxxx str2

File2:
50 xxxxxxx xxx
30 xxxxxxxxxxx
102 xxxxxx str2 xxxx
......

the result:
50 xxxxxxx xxx
30 xxxxxxxxxxx
102 xxxxxx str2 xxxx
.....
100 xxxxxx str1

Also, the second line in file1 & third in file2 can either be completely same or with the same patern - starting with a same string and having another same string anywhere in the line.

Please help!
Thank you so much.
(it's a bourne-sh)

Hi,
I think you can use cat first to join these two files together, then sort them, and then use the uniq to delete the duplicated lines.

input:

a>
1
2
3
4
5
6
b>
3
1
2
3
5
342
45
234
2
3

output:

1
2
234
3
342
4
45
5
6

code:

cat a>>b | cat b | sort | uniq

Hi summer_cherry

Theoretically, this would work. However, since the files involved are system config files, with certain lines grouped together for a reason, so I'd hate to sort them. BTW, it also means all the empty lines or empty lines starting # between sections would be gone- as they would be meaningless anyway.

I was thinking to circle through the file(2) for each line (from f1) to be addded -but surely there'd be a better solution (& I don't know enough about the utilities to figure it out)

Thank you.

Hey,

You can get the first line from the first file and find is it there in second file ? If yes then dont append otherwise append the line in the second file.

YOu can use grep to search the first line from first file and awk to use the same line as pattern to be searched in the second file.

varungupta:
This is actually what I'm doing now- I had some special key words added in a comment line indicating the start of the file1 & before adding anything I check file2 if that line exists already or not - however I don't think it is safe, as nothing prevents that line from getting deleted in file2 over time...

summer_cherry's solution would work perfectly however re-ordering the file is not acceptible.

Perhaps I need a merge function, to do a diff and add lines from file1 that are not in file2?

It'd be tricky though how to remove the lines that have been added from file2 later.

Any thoughts would be appreciated!

It is good practice to store configurations files that you change in RCS somewhere, so if you need to backtrack or do problem determination you can refer to previous revisions.

Hi porter,
I do back up all the config files during the installation. The thing is, I can't simply restore those files during the uninstall. This is because of the fact that some of the files may have been changed over the time. The customers would have to lose their data if we restore the original files. If at the unisntall time we remove exactly what we added during the install, then it'd be safe.

Bluemoon

awk '!x[$0]++' file2 file1

Dear cfajohnson:

It works perfectly.... would you elaberate a little how it works? I did try, but as a newbie....:frowning:
thanks a lot!

It is the same as:

awk '{
       if ( x[$0] == 0 )
        {
           x[$0] = 1
           print
        }
      }' file2 file1

In other words, if the line has not been seen before, it will be printed.