Kindly help me with this problem:
My data looks like this:
SNPfile.txt
CHR_A BP_A SNP_A CHR_B BP_B SNP_B R2 p-SNP_A p-SNP_B
4 172575323 rs17056855 4 172601079 rs11945883 0.119414 0.049972656 0.031050345
4 172575323 rs17056855 4 172603701 rs7662060 0.11978 0.049972656 0.034664046
4 172575323 rs17056855 4 172604186 rs17056919 0.382896 0.049972656 0.037136752
4 172575323 rs17056855 4 172604350 rs4692884 0.11925 0.049972656 0.034632158
4 172575323 rs17056855 4 172605017 rs17056925 0.373343 0.049972656 0.037902134
5 148767855 rs12652986 5 148773386 rs353285 0.0727606 0.049972812 0.01748838
5 148767855 rs12652986 5 148781294 rs353299 0.0692342 0.049972812 0.018621813
5 148767855 rs12652986 5 148781964 rs353297 0.0324627 0.049972812 0.033439399
5 148767855 rs12652986 5 148831342 rs414582 0.00996126 0.049972812 0.035560327
5 148767855 rs12652986 5 148854047 rs10057083 0.00890368 0.049972812 0.03527856
5 148767855 rs12652986 5 148866716 rs3756505 0.0106361 0.049972812 0.042519464
21 19804594 rs7280435 21 19816817 rs456482 0.0613113 0.049972845 0.040731593
21 19804594 rs7280435 21 19833893 rs128365 0.347322 0.049972845 0.01136657
21 19804594 rs7280435 21 19865079 rs2226396 0.0193921 0.049972845 0.012638692
21 19804594 rs7280435 21 19865343 rs2825653 0.0189789 0.049972845 0.018289668
6 9690024 rs17797675 6 9703199 rs7749303 0.369563 0.049973276 0.018423278
1 220966050 rs17532708 1 220970921 rs4240934 0.270007 0.049975232 0.010602066
1 220966050 rs17532708 1 220972176 rs2378605 0.270007 0.049975232 0.010644871
18 32233667 rs8092959 18 32264921 rs8087319 0.00315182 0.049975447 0.043287658
12 2440796 rs4765937 12 2456906 rs2239087 0.416573 0.049978648 0.005179932
- I want to sort this first on col8 and then col9 in ascending order
- I also want to filter this list by taking the first two occurrence of row-value in col3 and write to another file with tab delimited format.
Output
CHR_A BP_A SNP_A CHR_B BP_B SNP_B R2 p-SNP_A p-SNP_B
4 172575323 rs17056855 4 172601079 rs11945883 0.119414 0.049972656 0.031050345
4 172575323 rs17056855 4 172604350 rs4692884 0.11925 0.049972656 0.034632158
5 148767855 rs12652986 5 148773386 rs353285 0.0727606 0.049972812 0.01748838
5 148767855 rs12652986 5 148781294 rs353299 0.0692342 0.049972812 0.018621813
21 19804594 rs7280435 21 19833893 rs128365 0.347322 0.049972845 0.01136657
21 19804594 rs7280435 21 19865079 rs2226396 0.0193921 0.049972845 0.012638692
6 9690024 rs17797675 6 9703199 rs7749303 0.369563 0.049973276 0.018423278
1 220966050 rs17532708 1 220970921 rs4240934 0.270007 0.049975232 0.010602066
1 220966050 rs17532708 1 220972176 rs2378605 0.270007 0.049975232 0.010644871
18 32233667 rs8092959 18 32264921 rs8087319 0.00315182 0.049975447 0.043287658
12 2440796 rs4765937 12 2456906 rs2239087 0.416573 0.049978648 0.005179932
What I would have done if I wanted a unique list is to take only the col3 and then sort for uniq. However, I need two values per instance and I don't know how to do that.
In some cases, the values are present only once, adding to the problem.
I am looking for a solution in awk, since I am learning as it goes by posting some of the issues here.
I also want to thank many of you, who have helped me so far.
Sincere thanks
~GH