Join multiple files

Hi there,

I am trying to join 24 files (i showed example of 3 files below). They all have 2 columns. The first columns is common to all. The files are tab delimited eg

file 1

rs0001      100e-34
rs0003      2.8e-01
rs008        1.9e-90

[/CODE][/CODE]

file 2

rs0001      1.98e-22
rs0004      3.77e-10
rs0003      2.8e-01
rs008        1.9e-90

[/CODE]

file 3

rs0001      81.98e-22
rs0003      42.8e-01

Desire Output

                 file1              file2             file3
rs0001      100e-34         1.98e-22       81.98e-22
rs0004       NA                3.77e-10       NA
rs0003      2.8e-01          2.8e-01        42.8e-01
rs008        1.9e-90          1.9e-90         NA

I have tried the script but something is definetly wrong

awk '{ a[$1]=$2; s[$1]=$1; next } {
  s[$1] = s[$1] " | " $2 - a[$1]; a[$1]=$2
}
END{for(i in s) {print s}}' file*.txt

Thanks for your help

Try

$ awk '{A[$1] = A[$1] ? A[$1] OFS $2 : $2}END{for(i in A)print i, A}' file*  | sort

--edit--

Oops looks like desired o/p modified :slight_smile:

$ awk 'BEGIN{printf "\t"}{A[$1] = A[$1] ?  A[$1] OFS $2 : $2}FNR==1{printf FILENAME OFS}END{printf RS; for(i in A)print i, A}' OFS='\t' file* | sort

          file1      file2       file3    
rs0001    100e-34    1.98e-22    81.98e-22
rs0003    2.8e-01    2.8e-01     42.8e-01
rs0004    3.77e-10
rs008     1.9e-90    1.9e-90
1 Like

Nice one Akshay.

I made a modification to allow sorting within the awk statement:

awk '{A[$1]=A[$1] ? A[$1] OFS $2 : $2}END{for(i in A)print i, A | "sort"}' file*
rs0001 100e-34 1.98e-22 81.98e-22
rs0003 2.8e-01 2.8e-01 42.8e-01
rs0004 3.77e-10
rs008 1.9e-90 1.9e-90
1 Like

Hello,

Just some modification to Akshay's code, thanks Akshay for great code.

awk 'FNR==1{j=j OFS FILENAME} NR==FNR{a[$1];} {a[$1]=a[$1]?a[$1] OFS $2:$2} END{gsub(/^ /,X,j); print j;{for(i in a){print i OFS a}}}' check_file_check_file121213*

Output will be as follows.

check_file_check_file1212134 check_file_check_file1212135 check_file_check_file1212136
rs0001 100e-34 1.98e-22 81.98e-22
rs0003 2.8e-01 2.8e-01 42.8e-01
rs0004 3.77e-10
rs008 1.9e-90 1.9e-90

Note: Where files check_file_check_file1212134, check_file_check_file1212135 and check_file_check_file1212136 are files file1, file2 and file3 respectively.

Thanks,
R. Singh

import os
import collections
d=collections.defaultdict(list)
cnt=0
for f in sorted(os.listdir('dir')): 
 with open('dir/'+f) as file:
  for line in file:
   t=line.split(" ")
   if t[0] not in d and cnt==1:
    d[t[0]]=['NA']*cnt    
   d[t[0]].append(t[1]) 
 cnt+=1

for i in sorted(d):
 print(i," ".join(map(lambda x: x.replace("\n",""), d)),end=" ")
 if len(d)<cnt:
  for i in range(cnt-len(d)):
   print('NA',end="")
 print("")