Append data by looking up 2 tables for multiple files

ritakadm · March 13, 2015, 9:17am

I want to lookup values from two different tables based on common columns and append. The trick is the column to be looked up is not fixed and varies , so it has to be detected from the header. How can I achieve this at once, for multiple data files, but lookup tables fixed.

The two lookup tables are ftable and ttable. For ftable I want cols 2 and 3 by matching 1.

For ttable, I want just col 2 based on col1.

ftable

fin,evalue,count2
abc,e1,7
xyz,e7,2

ttable

tin,count
ghi,11
fgr,22

in the input files , we must detect the column with header rid , that matches with ftable and the column header tid which matches with ttable

1st file from Loc B

mid,rid,tid,xid
1,abc,23,ghi,d
2,abc,,fgr,t

2nd File from expA

fid,mid,rid,tid,xid
001,1,abc,23,ghi,d
002,2,xyz,,ghi,t
003,3,xyz,1,fgr,34

Desired outputs
1st file from Loc B

mid,rid,tid,xid,evalue,count2,count
1,abc,23,ghi,d,e1,7,11
2,abc,,fgr,t,e1,7,22

2nd File from expA

fid,mid,rid,tid,xid,evalue,count2,count
001,1,abc,23,ghi,d,e1,7,11
002,2,xyz,,ghi,t,e7,2,11
003,3,xyz,1,fgr,34,e7,2,22

My try only for the 1st lookup

awk -F"," 'NR==FNR{f[$1]=$2","$3;next} NR==1 { (for i=1;i<=NF;i++) { if (i=="rid" ) { ftab=i } } print $0",evalue,count2" } NR!=1 && $ftab in f {print $0","f[$ftab]}' ftable  "1st file from Loc B"

vgersh99 · March 13, 2015, 9:32am

Hmmm... this is a bit confusing to parse this explanation.
Could you try it one more time with just ONE example and a desired output, pls.
I'm failing to see what keys you're using to find a match between 2 files...

ritakadm · March 13, 2015, 9:57am

Sorry about the confusion.

Round 2

The keys are rid with ttable (tin or col1) , and fid with ftable (fin or col1) ...the issue is we have to detect which columns rid and tid are in the data since, the column numbers are not fixed in the data.

The keys in the lookup tables are always in column 1.

Another example,

ttable (with key in col 1)

tin,evalue,count2
rty,e8,9
wer,e9,23

ftable (with key in col 1)

fin,count
sdf,1
fds,2

Input data with key fields rid and fid as col2 and col3 respectively. However in another input rid and fid can be in colx and coly. But they will still be called rid and tid.
So when we read the header line of the input file, we can tell which column is rid and which column is tid.

mid,rid,fid
1,rty,sdf
2,rty,fds
3,wer,fds

Desired output

mid,rid,fid,evalue, count2, count
1,rty,sdf,e8,9,1
2,rty,fds,e8,9,2
3,wer,fds,e9,23,2

Please let me know if this makes better sense.

RudiC · March 13, 2015, 11:00am

It is difficult to match column names "fid" with "fin" and almost impossible to match "rid" with "tin". So I discontinued that.
Given the column names match (fid=fid, rid=rid), try

awk     'FILENAME == "file1" ||
         FILENAME == "file2"    {X=$1; sub ($1 FS, ""); TR[X]=$0; next}

         FNR==1         {n=split (ID, TMP); for (i=1; i<=n; i++) IDar[TMP]
                         for (n=1; n<=NF; n++) if ($n in IDar) IDCOL[n]
                        }

                        {printf "%s", $0; for (i in IDCOL) printf ",%s", TR[$i]; printf "\n"}

        ' ID="rid,fid"  FS=","  file1 file2 file4
mid,rid,fid,evalue,count2,count
1,rty,sdf,e8,9,1
2,rty,fds,e8,9,2
3,wer,fds,e9,23,2

vgersh99 · March 13, 2015, 11:05am

this is still confusing....
Looks like you're using 4 diff names: tin/fin for the first 2 tables AND rid/fid for the last table.
I must be thick-skulled today, but I don't get it.
Sorry, maybe somebody else could help you.

ritakadm · March 13, 2015, 11:31am

those columns are present as different names in different files..that is the confusion...thank you, I will try to work with the solution provided.