I have a genotype.bim file where it contains information about SNPs and genotype. As a hypothetical example, let's say
genotype.bim
snp1 ... A G
snp2 ... G T
snp3 ... G T
snp4 ... G A
...
snpN ... C G
where first column identifies each SNP and 5th and 6th column has genotype information.
First step is todesignate the first allele of each SNP from the bim file and recode it as 0, then recode the second allele as 1.
So for the snp1, A=0, G=1, for snp2, G=0,T=1, for snp3, G=0,T=1, so forth.
Then we apply these designations to genotype.ped file.
genotype.ped
id1 id1 A A G T T G G A C C
id2 id2 A G T T G T G A G C
..
idN idN A A T T G T G A G G
first two columns are id numbers (they are identical).
suceeding two columns (3rd,4th) correpond to snp1, (5th,6th) correpond to snp2, etc; each snp contains two columns of genotype information in the ped file.
now I want to recode the allele in the same way it was done for the bim file.
so for snp1, A=0, G=1, so the 3rd,4th column of the first row will be 0 0 (A A)
and 5th,6th column will be 0 1 (G T) because for snp2, G=0,T=1,
then the desirable ouput will look like
id1 id1 0 0 0 1 1 0 0 1 0 0
id2 id2 0 1 1 1 0 1 0 1 1 0
..
idn idn 0 0 1 1 0 1 0 1 1 1
If you can contribute your idea as to how to write a generalized script for this problem (I have thousands of Snps and individuals), your help will be really appreciated.
Thanks in advance!