Matrix multiplication

Akang · March 3, 2016, 9:24pm

I have two files. Row id in File1 matches the column id in file2 (starting from column7 )except the last 2 characters. File1 has 50 rows and File 2 has 56 columns. If the id matches I want to multiply the value in column3 of File1 to the entire column in File2. and in the final output print only Column2 and column7 onwards from file2. Any awk or R suggestions?

File1
P1  A   -0.468018   -3.49806    
P2  A   0.0903727   0.675471    
P3  C   0.441187    3.29752 
P4  C   0.240075    1.79437 


File2
ID1 ID2 ID3 ID4 ID5 ID6 P1_A P2_A P3_C........
0 A01 0 0 0 0 0 2 1 
0 A04 0 0 0 0 1 1 0 
0 E05 0 0 0 0 0 1 2 
0 G06 0 0 0 0 2 0 2 

Output(I need the multiplication values to be printed to the final output file. Like in row2 & column2 of output file 0*-0.468018=0, so i want 0 to be printed and so on.)
ID2  P1 P2 P3........
A01  0*-0.468018 2*0.0903727 ....
A04  1*-0.468018 1*0.0903727...
E05  0*-0.468018 1*0.0903727....
G06  2*-0.468018 0*0.0903727...

This is what i have tried in R. But it doesn't give the desired output.I'll appreciate any help. TIA!

for(i in 1:nrow(file2)){
file2[i,2:6]<-file2[,2:6]*file1[match(substr(colnames(file2),1,2),file1[,1]),3]
}
file2

file2

RudiC · March 4, 2016, 1:57am

What's wrong with the solution presented here ? After SMALL adaption, it results in

ID2 P1_A P2_A P3_C........
A01 0 0.180745 0.441187
A04 -0.468018 0.0903727 0
E05 0 0.0903727 0.882374
G06 -0.936036 0 0.882374

Akang · March 4, 2016, 2:06am

@RudiC Nothing wrong..but I am pretty new to programming. So am struggling with adapting the awk script to my current need.

RudiC · March 4, 2016, 2:10am

Any attempts/ideas/thoughts? What would be the logics needed?

Akang · March 4, 2016, 2:19am

Partial id matching, like i tried in R using match(substr)
if id matches,store the value file1$3
3.Multiply F[file1]*C[file2]
4.Print
5.Repeat till the end of both files

That's what I can think of.

RudiC · March 4, 2016, 2:32am

I was talking of ideas how to mofdify that quoted script.
Use $1 without suffix for indexing T in the first file, and use a substr of $i in the second.

Akang · March 4, 2016, 3:55am

I changed

T[$1 SFX] = $3 to T[$1] = $3

but I dont understand
use a

substr  of $i

in the second

---------- Post updated at 05:42 PM ---------- Previous update was at 05:28 PM ----------

I changed

T[$1 SFX] = $3 to T[$1] = $3

but I dont understand
use a

substr  of $i

in the second

---------- Post updated at 05:48 PM ---------- Previous update was at 05:42 PM ----------

I changed

T[$1 SFX] = $3 to T[$1] = $3

but I dont understand
use a

substr  of $i

in the second

---------- Post updated at 05:55 PM ---------- Previous update was at 05:48 PM ----------

I changed

T[$1 SFX] = $3 to T[$1] = $3

but I dont understand
use a

substr  of $i

in the second

RudiC · March 4, 2016, 12:16pm

As you dont't want to use suffixes, and your headers/categories/whatever you call it are 2 chars in length, for the second file create a variable, say, X = substr ($i, 1, 2) and use that for indexing. That's what I did, and you see the results above.

Akang · March 6, 2016, 10:21pm

@RudiC Is this what you mean?

awk '
FNR == NR {T[$1] = $3
next
}
{printf "%s", $COL
}
X = substr($i, 1, 2)
FNR == 1 {for (X=COL+1; X<=NF; X++)
{if ($X in T) {C[++CNT] = X
F[CNT] = T[$X]
printf " %s", $X
}               
}
printf RS
next
}   
{for (i=1; i<=CNT; i++) printf "%s%s", FS, $C * F[X]
print ""  
}' COL=2 File2 File1

RudiC · March 7, 2016, 3:12am

Right direction. A bit of overkill.

This should do:

awk '
FNR == NR       {T[$1] = $3
                 next      
                }
                {printf "%s", $COL
                }
FNR == 1        {for (i=COL+1; i<=NF; i++)      {X = substr ($i,1,2)
                                                 if (X in T)   {C[++CNT] = i
                                                                 F[CNT] = T[X]
                                                                 printf " %s", $i
                                                                }
                                                }
                 printf RS
                 next
                }
                {for (i=1; i<=CNT; i++) printf "%s%s", FS, $C * F
                 print ""
                }
' COL=2  file1 file2

Akang · March 7, 2016, 3:51am

I think substr is not working. In the final result i only get row names and column names.

RudiC · March 7, 2016, 4:02am

A bold statement. Hint: you can test it with a simpler script and limited data.

And, above DOES work as proven in post#2.