Inconsistent results using sort function

Could you please advise on the following: I have two space-delimited files with 9 and 10 columns, respectively, with exactly the same values in column 1. However, the order of column 1 differs between the two files, so I want to sort both files by column 1, so that I can align them and concatenate them into a 19 column file.

If I want to sort by column 1, I usually use "sort -k 1,1 -g". I have done this hundreds of times and I have never had a problem with it.
This is the first time ever that the sort function has given a different output for the two files, despite using identical commands:

sort -k 1,1 -g file1.txt | head

rs1000000 12 126890980 G A 0.772687 0.999152 -6.53289e-05 0.000341777  
rs10000003 4 57561647 A G 0.298872 0.997534 -0.000308206 0.000313536 
rs10000005 4 85161558 G A 0.468352 0.994261 0.000392384 0.000287513 
rs10000010 4 21618674 T C 0.517001 0.986406 -0.000387116 0.000288364  
rs10000011 4 138223055 C T 0.957162 0.987603 -0.000466108 0.000710431  
rs10000012 4 1357325 C G 0.85952 0.999131 -0.000544182 0.000412222  
rs10000017 4 84778125 C T 0.777348 0.989758 0.00024644 0.000345697  
rs10000018 4 100458448 A G 0.707724 0.999129 -5.96813e-05 0.000315027  
rs10000021 4 159441457 G T 0.185355 0.99682 0.000127756 0.000369005  
rs1000002 3 183635768 C T 0.513401 1 -0.000269255 0.000286993 3.5E-01

and

sort -k 1,1 -g file2.txt | head 

rs10000003 G A 0.707825 1.010846 0.015580 0.980310 1.042333 0.490663
rs10000005 A G 0.550104 0.988740 0.014283 0.960744 1.017551 0.439681
rs1000000 G A 0.780117 0.987172 0.017380 0.953108 1.022454 0.471168 
rs10000010 C T 0.503288 1.009101 0.014611 0.980464 1.038574 0.537391 
rs10000011 C T 0.950554 0.997444 0.026380 0.945740 1.051976 0.924913 
rs10000012 C G 0.866931 0.966905 0.021645 0.924482 1.011276 0.141498 
rs10000017 C T 0.791953 1.003966 0.019870 0.965021 1.044483 0.844517 
rs10000018 A G 0.699162 1.006137 0.014434 0.977846 1.035245 0.674194 
rs10000021 T G 0.827782 0.991092 0.021206 0.949529 1.034474 0.682292 
rs10000023 T G 0.579281 1.024738 0.014014 0.997270 1.052962 0.077937

Why is this happening despite identical commands? I'm especially puzzled because I have never encountered this before.

Thank you for any advice.

aberg

You're not telling sort what your delimiter is. sort -t ' ' ...

Thanks. I've just tried that:

sort -t ' ' -k 1,1 -g file2.txt

but still getting the same output...

you need to sort the first field numerically starting with the 3rd character:
sort -k 1.3,1 -g myFile

1 Like

Thanks, but that's just done something really weird:

sort -k 1.3,1 -g file2.txt

Output:

rs3 13 32446842 C T 0.942913 0.998618 0.000435162 0.000617777 4.8E-01
rs4 13 32447222 A G 0.942913 0.998636 0.000435401 0.000617771 4.8E-01
rs15 7 11602932 C T 0.23134 0.997915 -0.000409121 0.000339871 2.3E-01
rs16 7 11602899 T C 0.416404 0.998601 0.000120855 0.000290625 6.8E-01
rs18 7 11597475 A G 0.44335 0.999215 0.000197021 0.000288103 4.9E-01
rs19 7 11597156 A G 0.48013 1 0.000283608 0.000286386 3.2E-01

Why? it appears to be sorted by column 1.
What's wrong?
What's your original file?

First, you haven't mentioned which OS you are using. Second, I'm beginning to think your file has some unprintable characters in it. Can you please pipe the first couple of lines of your file through either hd or cat -vet ?

Andrew