Hello, I am trying to find a solution to problem that's proving to be beyond my newbie skills. The below files comes from a genetics study. File 1 describes a position on the genome and file 2 does the same but is formatted differently and has more information. I am trying to match all lines in file 1 with the right line in file 2, and then print the full line from file 2. The first column in file 1 corresponds to the first and third column in file 2. File 1 has a colon as separator whereas file 2 has tab.
I tried to come up with a grep solution that would not require reformatting of files. I tried to come up with some easy manipulation of File 1 or file 2 to make the matching easier. Both failed. Any suggestions would be more than welcome.
Edit: Sorry if I made the first example look confusing. There are matches. Please consider the below example instead:
The first column in file 1 corresponds to the first and third column in file 2
.
I don't see the correlation in your sample files...
I don't follow - can you elaborate?
And maybe provide a desired result based on the samples provided.
Thanks -this produces exactly what I was looking for. Next, I'll try and understand the code and customize to other similar data -you've given me a great starting point.
Example output below:
1:40370176:G_GT g gt -0.0103 0.0222 0.6434 ???????+??+- rs564192510
19:4197562:A_AGAAT a agaat 0.1321 0.0121 1.019e-27 ?++++??+++-+ rs79305507
19:4210401:C_CA ca c 0.2934 0.0137 3.00e-101 ?++++??+++++ rs6683453
19:4259184:T_TTC t ttc 0.5397 0.0186 3.27e-185 ++++????++?? rs12092368
19:4268341:A_G a g -0.6392 0.0106 1.95e-798 ----?------- rs12090408
4:38765867:A_G a g -0.0641 0.0116 3.516e-08 ------------ rs1778050