thank you for letting me join this forum, lots of learning opportunities looks like.
Myself a biologist, very new into unix, so please excuse if I use incorrect language. I am using cygwin on windows, it can run perl
, awk
, sed
etc.
I have 2 files, the first sample sheet, tells which parent and children are in which sample. Parents are represented as P1, P2, P3 and corresponding children groups are represented as P1/P2 , P2/P3 etc.
index,line,sample
1,p1,s1
2,p2,s2
3,p1/p2,s3
4,p1/p2,s4
5,p1/p2,s5
6,p1/p2,s6
7,p1/p3,s7
8,p1/p3,s8
9,p1/p3,s9
10,p1/p3,s10
11,p2/p3,s11
12,p2/p3,s12
13,p2/p3,s13
14,p2/p3,s14
15,p3,s15
The second file contains data, having sample number, variable name and value. The parents always can be aa,tt,gg,cc (same character repeated twice)
sample,var,value
s1,v1,aa
s1,v2,tt
s1,v3,aa
s1,v4,gg
s2,v1,tt
s2,v2,aa
s2,v3,aa
s2,v4,gg
s3,v1,at
s3,v3,aa
s3,v4,tt
s4,v1,tt
s4,v2,at
s4,v3,aa
s4,v4,gt
s5,v1,aa
s5,v2,tt
s5,v3,aa
s5,v4,gt
s6,v1,aa
s6,v2,aa
s6,v3,aa
s6,v4,tt
s7,v1,aa
s7,v2,aa
s7,v3,at
s7,v4,ag
s8,v1,aa
s8,v2,tt
s8,v3,at
s8,v4,ag
s9,v1,aa
s9,v2,at
s9,v3,tt
s9,v4,gg
s10,v1,aa
s10,v2,at
s10,v3,aa
s10,v4,ag
s11,v1,aa
s11,v2,aa
s11,v3,tt
s11,v4,gg
s12,v1,tt
s12,v2,tt
s12,v3,tt
s12,v4,ag
s13,v1,aa
s13,v2,at
s13,v3,aa
s13,v4,ag
s14,v1,at
s14,v2,aa
s14,v3,at
s14,v4,aa
s15,v1,aa
s15,v2,aa
s15,v3,tt
s15,v4,aa
I am only interested in variables in which a pair of parents dont match. If parents have same value, that variable is not considered in the output, also if one/both parents are absent for a variable, I dont want to consider that one.
What I need to do is create new files for all sets of children with same parents, and assign the variables values a (if matching first parent) , b (if matching second parent) and m (mixture of both) . If data is missing in child variable, hyphen (-) can be used.
So my desired output are 3 files, all in matrix form.
file p1_p2
s3 s4 s5 s6
v1 m b a a
v2 - m a b
file p1_p3
s7 s8 s9 s10
v2 b a m m
v3 m m b a
v4 m m a m
file p2_p3
s11 s12 s13 s14
v1 b a b m
v3 b b a m
v4 a m m b
I`m ready to answer questions that you may have. please guide me to achieve the output.