Hi All,
I need to convert an exel spreadsheet into a SAS dataset,
and the following format change is needed. Please help, this is too complex
for a biologist.
Let me describe the input.
1st row is generation.1st column in keyword 'generation', starting 2nd column there are 5 generations in my actual data,
I have shown 3 here, namely G1,G2 and PAR.
2nd row is family name,1st column is keyword 'gene', from 2nd column format is NameParent1-NameParent2-RandomNumber_Replicate.
If 'Parent' is present in the name, then format is NameParent-Parent-RandomNumber_Replicate and it should be included in the output.
The characters upto the second '-' is the family name, replicate (can be 1,2 or 3) is the number after ''.
Random number (between second '-' and '') can be ignored.
Starting 3rd row are gene names(column 1) and values(column 2 through 9).
There are 30000 genes in my actual data set.
The gene values need to repeat according to nested format shown below,
Only if the generation is PAR (stands for Parent), replace the generation 'PAR' by name of the parent
specified by the characters upto first '-'. For example if P1-Parent-9_1 is the column name and the
corresponding generation is PAR, the output should have 'P1' instead of 'PAR' in the generations column.
Sample input
generation G1 G1 G1 G2 G2 G2 PAR PAR PAR PAR PAR PAR
gene P1-P2-24_3 P1-H-56_2 P1-P2-84_1 P1-P2-34_3 P1-P2-33_1 P1-P2-99_2 P1-Parent-9_1 P1-Parent-43_2 P1-Parent-45_3 P2-Parent-62_1 P2-Parent-43_2 P2-Parent-11_3
gene1 0 0 0 0 0 0 0 0 0 3 3 7
gene2 2 1 1 2 6 4 6 7 8 67 6 66
Expected Output
family rep generation gene value
P1-P2 3 G1 gene1 0
P1-P2 2 G1 gene1 0
P1-P2 1 G1 gene1 0
P1-P2 3 G2 gene1 0
P1-P2 1 G2 gene1 0
P1-P2 2 G2 gene1 0
P1-P2 1 P1 gene1 0
P1-P2 2 P1 gene1 0
P1-P2 3 P1 gene1 0
P1-P2 1 P2 gene1 3
P1-P2 2 P2 gene1 3
P1-P2 3 P2 gene1 7
P1-P2 3 G1 gene2 2
P1-P2 2 G1 gene2 1
P1-P2 1 G1 gene2 1
P1-P2 3 G2 gene2 2
P1-P2 1 G2 gene2 6
P1-P2 2 G2 gene2 4
P1-P2 1 P1 gene2 6
P1-P2 2 P1 gene2 7
P1-P2 3 P1 gene2 8
P1-P2 1 P2 gene2 67
P1-P2 2 P2 gene2 6
P1-P2 3 P2 gene2 66
Thanks