using join command for files

subra · August 13, 2002, 5:47am

Hi,
I have to join 2 files based on two key field columns.
But join command allows to give only one key column for fetching.So kindly update me whether i can compare 2 files based on two key columns or more than one key column.If so what is the syntax to compare it.
With thanks in advance,
subra.

Nisha · August 13, 2002, 5:58am

You can compare two files using diff, comm, cmp commands...
use man pages for further details...

if your file is large then u can also use bdiff..

Is this what you are looking for???

-Nisha

isacs · August 13, 2002, 6:07am

I don�t think you�re right, nisha, cause he like to join two files based on two fields and not only want�s to see the difference between those files for what you�re using the diff/compare command.
But I�ve no idea how to help - probably awk or sed should help let me have a look

BRGDS

Frank

subra · August 13, 2002, 6:11am

Hi,
I am specifically looking for the join command only...I've got the results from the diff utility and now i want to seperate the results obtained from it based on some conditions..So i want to know whether join can be used for more the one key column ...
with thanks
subra

subra · August 13, 2002, 6:17am

Hi,
I think i did not state my query properly.This is the actual scenario....
Consider that there are two files. Each file contains data stored in a table in a database. Assume that the respective tables have primary key defined on atleast two columns. Is it possible to join the two files on these primary key columns?
with thanks in advance,
subra.

isacs · August 13, 2002, 6:21am

Join can only be used for only one key column - so we have to find out an other solution.

greetz,
frank

subra · August 13, 2002, 6:44am

Hi,
Thanks to everyone for ur prompt replies.Can u suggest me the other solution or kindly post a reply to this query when u come across an alternative to this...
Thanks in advance,
subra

isacs · August 13, 2002, 7:19am

Hi Subra,

could you post back the content of the files you like to join.

TIA

Frank

peter.herlihy · August 13, 2002, 6:19pm

You can acheive this with a combination of awk and join.

i.e.
File_one has 4 columns "first name, lastname, age, height"
File_two has 3 columns "first name, lastname, weight"

You want the output to have "first name, lastname, age, height, weight" - for only those records with a weight present - and your key is first and last name.

awk '
BEGIN { OFS = "," } { print $1 $2 , $1, $2, $3 ,$4 } 
' File_one > temp_File_one 
#
awk '
BEGIN { OFS = "," } { print $1 $2 , $1, $2, $3 } 
' File_two> temp_File_two
#
join -t ","  -j1 1 -j2 1 -o 1.2 1.3 1.4 1.5 2.4  temp_File_one temp_File_two > File_three

Nisha · August 14, 2002, 1:56am

Peter,

The above code is working perfectly...

I have heard of IFS(Internal Field Seperator)... Could you please tell me what OFS is????

And also i am new to this join command....

can u explain this to me....

join -t "," -j1 1 -j2 1 -o 1.2 1.3 1.4 1.5 2.4 temp_File_one temp_File_two

The join command works even like this..

join -t "," -o 1.2 1.3 1.4 1.5 2.4 temp_File_one temp_File_two

A few other questions...

Why do i need two temp files at all??
Why can't I directly join the file_one and file_two?
Why should there be a statement like this ?
BEGIN { OFS = "," } { print $1 $2, $1, $2, $3 ,$4 }
instead of
BEGIN { OFS = "," } { print $1, $2, $3 ,$4 }

Thanks,
Nisha

isacs · August 14, 2002, 3:24am

Hi Nisha,

one of your other questions I think I�m able to answer.

Why can't I directly join the file_one and file_two?

You can Join them directly but then you can only join them by one column.

So what Peter did was to concatenate the "firstname" & "lastname" column for both files

(And this explains the first of your other questions

Why do i need two temp files at all?? )

The third questionI �ll try to answer as well
3. Why should there be a statement like this ?
BEGIN { OFS = "," } { print $1 $2, $1, $2, $3 ,$4 }

this lines output will be

BEGIN { OFS = "," } { print $1, $2, $3 ,$4 }

After that he was able to join those both files based on two columns, whats not exactly correct, cause he joined them based on one column in what the keycolumns has benn concatenated.

Did I get that right?

peter.herlihy · August 14, 2002, 3:42am

Directly from the man page

OFS output field separator (default blank)

The reason for the { print $1 $2, $1,$2,$3,$4 } and the reason for having 2 temp files is the same. Join will only allow you to join on one field. In this instance we want to join on two fields.

So we make a composite key by concatenating the two fields into one string and apending that at the start of the each line - we do this for both files.

Then we can join using one field - being the composite key.

i.e. To join where column 1 and 5 are the same in both files

File1 before is
"aaa,bbb,111,222,333".
"aaa,bbb,111,222,xxx".

File2 before is "
"aaa,cat,dog,hen,333"
"123,345,234,123,345"

The files with the {print $1 $5, $0} (you know what $0 is too...)
become

File1 before is
"aaa333,aaa,bbb,111,222,333".
"aaaxxx,aaa,bbb,111,222,xxx".

File2 before is "
"aaa333,aaa,cat,dog,hen,333"
"123345,123,345,234,123,345"

So we have the composite key matching the aaa333 in both files - and we can join on this with and make sure in our output there's not 1.1 or 2.1 (which would be the first fields from each file - as they are the fields we created and don't want to see again)

Just seen your reply while entering mine isacs!

Nisha · August 14, 2002, 4:09am

Thanks Peter and Isacs for the replies...