I want to join two files , with file 1 col 3 and file 2 col 1 as key.
The join command is erratic for some reason. File 2 is a master file having all the names, and file 1 has some values. I want to add the names from fil2 in file 1. If I use the original master file, some output is missing.
For example Medtr1g004990 exists in the master file but does not come in the output.
However, if just use a truncated master file that has 5 records including Medtr1g004990, the output seems to be correct.
I have used sorted files also , same problem. Please help solve this, if join doesnt work,please let me know if some similar command to join would.
I have attached the original master file and pasted the truncated one.
File 1
# more Medtr1g006600.exp
XLOC_000005 XLOC_000005 Medtr1g004990 chr1:35909-40554 q1 q2 OK 0.520378 6.91484 3.73206 6.85797 5e-05 0.000126299 yes down
XLOC_000006 XLOC_000006 Medtr1g006490 chr1:44429-46280 q1 q2 OK 16.1083 122.606 2.92814 10.2969 5e-05 0.000126299 yes down
XLOC_000008 XLOC_000008 Medtr1g006600 chr1:51360-54977 q1 q2 OK 6.94505 3.84361 -0.853525 -2.49824 0.0001 0.000244358 yes up
XLOC_000010 XLOC_000010 Medtr1g006660 chr1:70777-71741 q1 q2 OK 1.15476 2.47771 1.10142 2.07776 0.0045 0.00841718 yes down
XLOC_000014 XLOC_000014 Medtr1g006975 chr1:129007-136403 q1 q2 OK 0.389401 0.166262 -1.2278 -2.00092 0.0017 0.00343409 yes up
File 2 (truncated from attached file)
# more Medtr1g006600.annot
Medtr1g004990 casein kinase
Medtr1g006490 major intrinsic protein %28MIP%29 family transporter
Medtr1g006590 tonoplast intrinsic protein
Medtr1g006600 exostosin family protein
Medtr1g006605 hypothetical protein
Medtr1g006660 AP2 domain class transcription factor
Command and output with master file
# join -a1 -1 3 -2 1 Medtr1g006600.exp mt4.genenames.txt
Medtr1g004990 XLOC_000005 XLOC_000005 chr1:35909-40554 q1 q2 OK 0.520378 6.91484 3.73206 6.85797 5e-05 0.000126299 yes down
Medtr1g006490 XLOC_000006 XLOC_000006 chr1:44429-46280 q1 q2 OK 16.1083 122.606 2.92814 10.2969 5e-05 0.000126299 yes down
Medtr1g006600 XLOC_000008 XLOC_000008 chr1:51360-54977 q1 q2 OK 6.94505 3.84361 -0.853525 -2.49824 0.0001 0.000244358 yes up exostosin family protein
Medtr1g006660 XLOC_000010 XLOC_000010 chr1:70777-71741 q1 q2 OK 1.15476 2.47771 1.10142 2.07776 0.0045 0.00841718 yes down AP2 domain class transcription factor
Medtr1g006975 XLOC_000014 XLOC_000014 chr1:129007-136403 q1 q2 OK 0.389401 0.166262 -1.2278 -2.00092 0.0017 0.00343409 yes up disease resistance protein %28CC-NBS-LRR class%29 family protein
Command and output with truncated file
# join -a1 -1 3 -2 1 Medtr1g006600.exp Medtr1g006600.annot
Medtr1g004990 XLOC_000005 XLOC_000005 chr1:35909-40554 q1 q2 OK 0.520378 6.91484 3.73206 6.85797 5e-05 0.000126299 yes down casein kinase
Medtr1g006490 XLOC_000006 XLOC_000006 chr1:44429-46280 q1 q2 OK 16.1083 122.606 2.92814 10.2969 5e-05 0.000126299 yes down major intrinsic protein %28MIP%29 family transporter
Medtr1g006600 XLOC_000008 XLOC_000008 chr1:51360-54977 q1 q2 OK 6.94505 3.84361 -0.853525 -2.49824 0.0001 0.000244358 yes up exostosin family protein
Medtr1g006660 XLOC_000010 XLOC_000010 chr1:70777-71741 q1 q2 OK 1.15476 2.47771 1.10142 2.07776 0.0045 0.00841718 yes down AP2 domain class transcription factor
Medtr1g006975 XLOC_000014 XLOC_000014 chr1:129007-136403 q1 q2 OK 0.389401 0.166262 -1.2278 -2.00092 0.0017 0.00343409 yes up