Dear community, I am facing a problem and I kindly ask your help:
I have 4 different data sets consisted from 3 different types of array.
On each file, column 1 is chromosome position, column 2 is SNP id etc... Lets say I have the following (bim) datasets:
x2014:
1 rs3094315 0 752566 G A
1 rs3131972 0 752721 G A
....more 550.000
x2016:
0 200610-10 0 0 G A
0 200610-108 0 0 G A
...
x2017
0 200610-10 0 0 G A
0 200610-108 0 0 G A
...
x2018:
0 200610-10 0 0 G A
0 200610-108 0 0 G A
.....more 550K rows
How can I merge all files together, without having any duplicate values based on the 2nd column (rs_id)?