Sort command question

I have a question about what the sort command is doing.

Here is some sample data:

348201310013RVE         2
600201310013GFJ        70
3302013020101NS        40
600201309013GFJ        70

The sort command that is running is as follows:

sort -k 1,3 -k 12,4 input.txt > output.txt

I think it is sorting it by the first 3 bytes, and then column 12 for 4 bytes?? Is that correct? I'm a little confused on the syntax.

Thanks for all help.

Columns 1 through 3, then columns 12 through 4. I'm not sure that command actually makes sense for the data as given.

Are you saying that -k 4,12 would do the same thing? Would that make more sense?

I'm saying that the data as given doesn't even have 12 columns, so I'm unsure what this statement's for.

Ok. I think it is trying to sort byte 1 for 3 bytes, then byte 12 for 4 bytes. How would that statement work?

The sort utility key field specifiers -k 1,3 and -k 12,4 are specifying ranges of fields (not output print columns). To sort on the 1st three characters (still not print columns) on the line as the primary sort key and the 12th through the 15th characters of the 1st field as the secondary sort key the way to specify it would be:

sort -k1.1,1.3 -k1.12,1.15 input.txt >output.txt

which would save:

3302013020101NS        40
348201310013RVE         2
600201309013GFJ        70
600201310013GFJ        70

in output.txt for the given sample input. Note that when all given sort keys give two or more lines the same sort order (as in the last two lines here), the tie is broken by using the entire line as a final increasing order alphanumeric sort key.

PS Note also that sort works on text, not binary data. It sorts characters; not bytes. If the file you're sorting is ASCII it might not matter; but if your text contains UTF-8 multibyte characters; it makes a big difference.

1 Like

Great. That makes a lot more sense to me now. Not sure what they were trying with the previous code.

Thanks for the help.