Why is sort not working properly here ?

Platform: RHEL 5.4
In the below text file I have strings like following.

$ cat /tmp/mytextfile.txt
DISK1
DISK10
DISK101
DISK102
DISK103
DISK104
DISK105
DISK106
DISK107
DISK108
DISK109
DISK110
DISK111
DISK112
DISK113
DISK114
DISK115
.
.
.
<output snipped for better readability>
.
.
DISK459
DISK46
DISK460
DISK461
DISK462
DISK47
DISK48
DISK49
DISK5
DISK50
DISK51
DISK52
DISK53
DISK54
DISK55
DISK56
DISK57
DISK58
DISK6
DISK7
DISK8
DISK9

I wanted to sort the output based on the number after the 'DISK' string. My expected output :

 
DISK1
DISK2
DISK3
DISK4
DISK5

But when I use sort, the output looks like below

 
$ sort /tmp/mytextfile.txt
DISK1
DISK10
DISK101
DISK102
DISK103
DISK104
DISK105
DISK106
DISK107
DISK108
DISK109
DISK110
.
.
.
<output snipped for better readability>
 

# Trying with -n option. But , still the same output as above.

 
$ sort -n /tmp/mytextfile.txt
DISK1
DISK10
DISK101
DISK102
DISK103
DISK104
DISK105
DISK106
DISK107
DISK108
DISK109
DISK110

How can I get the expected output using sort or any other utility ?

Use:

# sort -n -k 1.5 /tmp/mytextfile.txt
1 Like

Thank you. It works !! But how does this work?
What is 1.5 ?

This is what sort's man page says about -k

 
-k, --key=POS1[,POS2]
              start a key at POS1, end it at POS2 (origin 1)
 

Couldn't understand how -k option works from the man page's explanation.

---------- Post updated at 10:50 AM ---------- Previous update was at 08:56 AM ----------

anyone ?

Do not bump up posts.

If someone doesn't answer your post immediately, wait! We are not on call.

-k tells it what column and what characters of each column to sort on. There's more detail about what denotes a column below since it's actually a bit complicated.

What does most of the work here, I think, is -n, "numeric sort".

See the man page, roughly 35 lines down from your citation:

Since DISK is a constant four, you want to key your sort on the first field character 5 and following, numeric ordering. In the the ol day, this was '-n +0.4 -0.99' but now it is -k1.5,1.99n, as field and character offsets in +- ar zero based and in -k, one based.

If you rename your volumes with leading zeros then there is no problem with simple sort.

1 Like

Yes, you save a lot of trouble if just rename with leading zeroes. Of course, might start with DISK001 and oops now we need a DISK1000, so need to put in enough capacity.

Although the .99 will work for most fields, the standards say that -k1.5,1.99n specifies a sort key performing a numeric sort on the 5th through the 99th characters of the 1st field. The standard way to specify performing a numeric sort starting with the 5th character of the 1st field through the end of the 1st field would be -k1.5,1n or -k1.5,1.0n . And, for the record, the sort key -k1.5n specifies a numeric sort key starting with the 5th character of the 1st field and continuing through the end of the line.

Yes, 99 was a hack for as many digits as sort can find. Since sort probably does an atof(), it stops when it hits non-numeric after white space and then numeric.

Remember that in not -n, sort deals with letters -- you could go hex or just 0-9A-Z on a digit. As long as $LC_ALL is C, sort is in ascii/iso8859-1 one byte char land where 0-9 are less than A-Z. I even came up with a special code where I fused the values of visual synonyms like 0 and O, S and 5, 1 and I, 2 and Z, 8 and B together for a 5 bit number, but now I am 1 short, so choose a font with a small top on the 8 (10 + 26 = 36, - 4 = 32). I was working on tape identity numbers, where the OS had 5 places but the customer had way over 100K volumes. 5 5 bit numbers is 32 billion. If you go lower case an symbols, you might make 6 bits, again eliminating the visually similar.