Match and Merge two file

Hi All,

I have two file . I need to merge both file based on a match.

File 1:

Column1 
column2
column3
column4

File 2:

column3
column5

I need to combine the two file based on match , Which in my case is column3 and combine the file as below

Combined file

Column1 
column2
column3
column4
column5

I may have non match record on column3 between file1 and file1 . I need to ignore those and combine only matched record.

Any help please

thanks in Advance

I have read through post #1 in this thread several times and still have no idea what you are trying to do.

You have one line between your two input files that match (the line in each file that contains the text column3 ). And, it appears that when you find that those lines do match, you ignore that fact completely. Then you say:

but there is only one column in file1 and what is a non match between a single file and itself?

If you are trying to produce a sorted list of the lines in two files and remove duplicates, try:

sort -u file1 file2

That produces the output you said you want, but it doesn't fit the description of the problem you presented.

Hi,
If I have understand your post, you may try with this:

sort -m file1 file2 > combined_file

It is the merge option of the sort command. It gives the desired output for the particular case indicated.

Regards!

Not quite. That does not give the output requested in post #1; the merge option to sort does not remove duplicates.

1 Like

Yes, I agree. The command must be

sort -mu file1 file2 > combined_file

I wonder if always is it the same that

sort -u file1 file2 > combined_file

Regards.

Post Data:

Perhaps the difference lies in performance. While sort -mu only combines the files already ordered, sort -u reorders the union of all, so which it does extra work.

File file1 is in sorted order and file2 is in sorted order, then, and only then, the output from the commands:

sort -mu file1 file2 > combined_file
sort -u file1 file2 > combined_file

will be identical, but the 1st will run faster. If both input files are not in sorted order; the 1st command may produce unsorted output, may contain duplicated lines, and/or may terminate with a diagnostic message instead of producing the desired output.

1 Like

Yes, I agree, the merge option requieres previous sorted files.

Regards.

1 Like

Sorry If it was confusing . let me explain with scenario

You need to clarify your own purposes.

Just say us how you proceeded manually to get the third file.

Regards.

The scenario doesn't help. I see absolutely no correlation between the example you showed us in post #1 and the example you showed us in post #8.

Instead of just showing us two input files and an output file, explain in English what in your two input files are being matched, merged, sorted, joined, pasted, or whatever it is that you are doing and explain how those operations are being used to create the output that you want.

Define the terms that you use with extremely clear examples. The term merge can mean very different things depending on the context. The same applies to other terms you may be used to using.

Hmm - throwing dice - mumble mumble - throwing bones - waffle waffle - consulting crystal ball - cogitate - reading coffee grounds ...
How about (applying utter fantasy)

awk '{getline TMP < FN; if (substr(TMP, 1, 1) == substr ($0, length)) print $0 substr (TMP, 2) }' FN="file2" file1
1234
1237

If you consider the complex heuristic process above, how about considering and applying Don Cragun's hints?