Alternative to join command

Ubuntu, Bash 4.3.48

Hi,

I have 2 files and I want to join them (line by line if the start of the lines is the same, like a ID)

INPUT FILE 1 (tab delimited)

aa_12_12_v_c aaa,asf,afgas,eg
bb_12_43_a_d dad,ada,adaf,afa
cc_56_75_d_f asd,thh,ert,rtertet

INPUT FILE 2 (tab delimited)

aa_12_12_v_c 1:1:1:1:1
cc_56_75_d_f 2:2:2:2:2

INPUT FILE 3 (tab delimited)

bb_12_43_a_d 3:3:3:3:3

Using join

join -t "`echo -e "\t"`" -a1  FILE1 FILE2 > OUTPUT1

OUTPUT1 (tab delimited)

aa_12_12_v_c aaa,asf,afgas,eg 1:1:1:1:1
bb_12_43_a_d dad,ada,adaf,afa
cc_56_75_d_f asd,thh,ert,rtertet 2:2:2:2:2

Considering that in my case -e ND doesn't work :confused: I have to do this

awk 'FNR==NR{if(m<NF)m=NF;next}{for(i=NF;i<m;i++)$(i+1)="ND"}1' OUTPUT1 OUTPUT1 > XFILE; sed 's/ /\t/g' XFILE > OUTPUT2

OUTPUT2 (tab delimited)

aa_12_12_v_c aaa,asf,afgas,eg 1:1:1:1:1
bb_12_43_a_d dad,ada,adaf,afa ND
cc_56_75_d_f asd,thh,ert,rtertet 2:2:2:2:2

Then for the 3th file...

join -t "`echo -e "\t"`" -a1  OUTPUT2 FILE3 > OUTPUT3

OUTPUT3 (tab delimited)

aa_12_12_v_c aaa,asf,afgas,eg 1:1:1:1:1 
bb_12_43_a_d dad,ada,adaf,afa ND 3:3:3:3:3
cc_56_75_d_f asd,thh,ert,rtertet 2:2:2:2:2

Considering that in my case -e ND doesn't work :confused: I have to do this

awk  'FNR==NR{if(m<NF)m=NF;next}{for(i=NF;i<m;i++)$(i+1)="ND"}1'  OUTPUT3 OUTPUT3 > XFILE; sed 's/ /\t/g' XFILE > OUTPUT4

OUTPUT4 (tab delimited)

aa_12_12_v_c aaa,asf,afgas,eg 1:1:1:1:1 ND
bb_12_43_a_d dad,ada,adaf,afa ND 3:3:3:3:3
cc_56_75_d_f asd,thh,ert,rtertet 2:2:2:2:2 ND

--- --- ---

The point is that seem a little complicate my code... then, ofthe but not always I have problem with sorting... some time I have errors about sorting, when I apply the join command. I read that if I'm sure that my files are sorted I can bypass this sort-control-step of join command... but I want a new code without warnings...

Do you know any other command? Any help! commands, codes, script :slight_smile:
Having N files I want to create a loop...

Many thanks!
echo manolis

How about

join -t"       " -a1 -o"1.1 1.2 1.3 2.2" -eND <(join -t"       " -a1 -o"1.1 1.2 2.2" -eND file[12]) file3
aa_12_12_v_c    aaa,asf,afgas,eg	1:1:1:1:1	ND
bb_12_43_a_d    dad,ada,adaf,afa	ND	3:3:3:3:3
cc_56_75_d_f    asd,thh,ert,rtertet	2:2:2:2:2	ND

Thank you Rudic!

but I have the same sorting error in my original files. The files are sorted !!!

echo manolis

---------- Post updated at 09:21 AM ---------- Previous update was at 08:56 AM ----------

I saw several awk codes... I have to put my data in an array using an ID and match the line if they have the same ID... also to add in the output the lines that don't match

any help?

Please, someone could let me know!

Best
echo manolis

When you tell us you have a sorting problem with your input files and tell us that your input files are sorted, you leave us wondering:

  1. What is your sorting problem?
  2. Why do you think there is a sorting problem?
  3. What output did you get from your attempts to use join that led you to believe that you had a sorting problem?

As with all threads in this forum, you know that knowing what operating system you're using and what shell you're using helps us help you. And, you have not told us either of these key bits of information.

If echo -e doesn't work on your system (or in your shell), why not just use a literal tab character when specifying the field delimiter in your join commands? If you're afraid that someone reading your code won't be able to tell the difference between a <space> and a <tab>, why not include a comment in your code explaining that the delimiter is a <tab> character entered literally? If writing comments is unacceptable for you for some reason, why not use a command substitution that is portable:

join -t "$(printf '\t')" -a1  OUTPUT2 FILE3 > OUTPUT3

or, if you're using a pure Bourne shell:

join -t "`printf '\t'`" -a1  OUTPUT2 FILE3 > OUTPUT3

instead of using echo -e which is clearly not portable?

Why wait until you update post #3 in this thread to tell us what your real requirements are? Why not spend the time when you first started your thread to explain what you were trying to do? Why when you changed your requirements in post #3 didn't you include sample input and output that would help us understand what you're trying to do?

When you don't tell us what OS and shell you're using, don't clearly explain what you're trying to do, and don't show us sample input and corresponding output for the problem you're trying to solve; you make it hard for anyone to get interested in trying to help you.

When you give us details about your environment, give us a clear specification of what you're trying to do, show us sample inputs and outputs that correspond to that specification, and show us code that you have attempted to use to solve your problem on your own; you will be much more likely to get help.