join two files

Hi,

File1.txt

SNP2 1
SNP3 1
SNP5 1
SNP4 1

File2.txt

SNP3 2
SNP1 2
SNP5 2
SNP4 2
SNP2 2

I want output like this.

SNP1 NA 2
SNP2 1 2
SNP3 1 2
SNP4 1 2
SNP5 1 2

Ive used join -a1 -a2 file1 and file2, but it doesnt quite do the job.
help me :slight_smile: thanks!

Usually Join command is used to merge the similar items from the two tables.. So probably it will not work to get this output!!

The files need to be sorted first before they can be passed to join...

yeah ive sorted them

---------- Post updated at 03:07 PM ---------- Previous update was at 03:06 PM ----------

i just dont know how to get NA

In that case it should be straightforward...

join -a1 -a2 -e "NA" -o 2.1 1.2 2.2 file1sorted file2sorted

what does -e "NA" -o mean?

awk 'NR==FNR{a[$1]=$2;next;}{a[$1]=(a[$1]==""?"NA":a[$1]) " " $2;}END{for (i in a) print i,a}' file1 file2
2 Likes

this works! wonderful!

Lookup the man page of join...

-e s        Replace empty output fields by string s.

-o list     Each output line comprises the fields specified in list,
            each element of which has the form n.m, where n is a file
            number and m is a field number.  The common field is not
            printed unless specifically requested.

if i had more than 2 files, 6 files in my case
I used

join -a1 -a2 -e "NA" -o 2.1 1.2 2.2 ${metabo}/p.adjust_${i}_toast1 ${metabo}/p.adjust_${i}_toast2 \
  | join -a1 -a2 -e "NA" -o 2.1 1.2 2.2 - ${metabo}/p.adjust_${i}_toast3 \
  | join -a1 -a2 -e "NA" -o 2.1 1.2 2.2 - ${metabo}/p.adjust_${i}_toast4 \
  | join -a1 -a2 -e "NA" -o 2.1 1.2 2.2 - ${metabo}/p.adjust_${i}_ICH \
  | join -a1 -a2 -e "NA" -o 2.1 1.2 2.2 - ${metabo}/p.adjust_${i}_ICH_SV > ${metabo}/BH_${i}

and it doesnt work.
ignore ${i

this command will miss NA output if the record is in file1, but not in file2.