using join command for files

Hi,
I have to join 2 files based on two key field columns.
But join command allows to give only one key column for fetching.So kindly update me whether i can compare 2 files based on two key columns or more than one key column.If so what is the syntax to compare it.
With thanks in advance,
subra.

You can compare two files using diff, comm, cmp commands...
use man pages for further details...

if your file is large then u can also use bdiff..

Is this what you are looking for???

-Nisha

I don�t think you�re right, nisha, cause he like to join two files based on two fields and not only want�s to see the difference between those files for what you�re using the diff/compare command.
But I�ve no idea how to help - probably awk or sed should help let me have a look :slight_smile:

BRGDS

Frank

Hi,
I am specifically looking for the join command only...I've got the results from the diff utility and now i want to seperate the results obtained from it based on some conditions..So i want to know whether join can be used for more the one key column ...
with thanks
subra

Hi,
I think i did not state my query properly.This is the actual scenario....
Consider that there are two files. Each file contains data stored in a table in a database. Assume that the respective tables have primary key defined on atleast two columns. Is it possible to join the two files on these primary key columns?
with thanks in advance,
subra.

Join can only be used for only one key column - so we have to find out an other solution.

greetz,
frank

Hi,
Thanks to everyone for ur prompt replies.Can u suggest me the other solution or kindly post a reply to this query when u come across an alternative to this...
Thanks in advance,
subra

Hi Subra,

could you post back the content of the files you like to join.

TIA

Frank

You can acheive this with a combination of awk and join.

i.e.
File_one has 4 columns "first name, lastname, age, height"
File_two has 3 columns "first name, lastname, weight"

You want the output to have "first name, lastname, age, height, weight" - for only those records with a weight present - and your key is first and last name.

awk '
BEGIN { OFS = "," } { print $1 $2 , $1, $2, $3 ,$4 } 
' File_one > temp_File_one 
#
awk '
BEGIN { OFS = "," } { print $1 $2 , $1, $2, $3 } 
' File_two> temp_File_two
#
join -t ","  -j1 1 -j2 1 -o 1.2 1.3 1.4 1.5 2.4  temp_File_one temp_File_two > File_three

Peter,

The above code is working perfectly...

I have heard of IFS(Internal Field Seperator)... Could you please tell me what OFS is????

And also i am new to this join command....

can u explain this to me....

join -t "," -j1 1 -j2 1 -o 1.2 1.3 1.4 1.5 2.4 temp_File_one temp_File_two

The join command works even like this..

join -t "," -o 1.2 1.3 1.4 1.5 2.4 temp_File_one temp_File_two

A few other questions...

  1. Why do i need two temp files at all??
  2. Why can't I directly join the file_one and file_two?
  3. Why should there be a statement like this ?
    BEGIN { OFS = "," } { print $1 $2, $1, $2, $3 ,$4 }
    instead of
    BEGIN { OFS = "," } { print $1, $2, $3 ,$4 }

Thanks,
Nisha

Hi Nisha,

one of your other questions I think I�m able to answer.

  1. Why can't I directly join the file_one and file_two?

You can Join them directly but then you can only join them by one column.

So what Peter did was to concatenate the "firstname" & "lastname" column for both files

(And this explains the first of your other questions

  1. Why do i need two temp files at all?? )

The third questionI �ll try to answer as well
3. Why should there be a statement like this ?
BEGIN { OFS = "," } { print $1 $2, $1, $2, $3 ,$4 }

this lines output will be

BEGIN { OFS = "," } { print $1, $2, $3 ,$4 }

|firstnamelastname|firstname|lastname|age|hight|

After that he was able to join those both files based on two columns, whats not exactly correct, cause he joined them based on one column in what the keycolumns has benn concatenated.

the Output of the second statement is
|firstname|lastname|age|hight|

Did I get that right?

Directly from the man page

OFS output field separator (default blank)

The reason for the { print $1 $2, $1,$2,$3,$4 } and the reason for having 2 temp files is the same. Join will only allow you to join on one field. In this instance we want to join on two fields.

So we make a composite key by concatenating the two fields into one string and apending that at the start of the each line - we do this for both files.

Then we can join using one field - being the composite key.

i.e. To join where column 1 and 5 are the same in both files

File1 before is
"aaa,bbb,111,222,333".
"aaa,bbb,111,222,xxx".

File2 before is "
"aaa,cat,dog,hen,333"
"123,345,234,123,345"

The files with the {print $1 $5, $0} (you know what $0 is too...)
become

File1 before is
"aaa333,aaa,bbb,111,222,333".
"aaaxxx,aaa,bbb,111,222,xxx".

File2 before is "
"aaa333,aaa,cat,dog,hen,333"
"123345,123,345,234,123,345"

So we have the composite key matching the aaa333 in both files - and we can join on this with and make sure in our output there's not 1.1 or 2.1 (which would be the first fields from each file - as they are the fields we created and don't want to see again)

Just seen your reply while entering mine isacs!

Thanks Peter and Isacs for the replies...