Sort command - strange behaviour

Hi guys,

I have the following example data:

A;00:00:19
B;00:01:02
C;00:00:13
D;00:00:16
E;00:02:27
F;00:00:12
G;00:00:21
H;00:00:19
I;00:00:13
J;00:13:22

I run the following sort against it, yet the output is as follows:

sort -t";" +1 -nr example_data.dat

A;00:00:19
B;00:01:02
C;00:00:13
D;00:00:16
E;00:02:27
F;00:00:12
G;00:00:21
H;00:00:19
I;00:00:13
J;00:13:22

I'd expect it to recognise the field delimiter, skipping field 1 and then sorting numerically on the second field (i.e. to put the longest time first and the shortest last). Any ideas please? Is the ":" in this field causing an issue?

I've proven it is recognising the second field as the sort key by changing to a dictionary-based sort too:

sort -t"," +1 -dr example_data.dat

J,00:13:22
E,00:02:27
B,00:01:02
G,00:00:21
A,00:00:19
H,00:00:19
D,00:00:16
C,00:00:13
I,00:00:13
F,00:00:12

This produces the desired output against this subset of data, but when I run it against the 'live' data (of much larger volume) it isn't any use as it will start with all times beginning with '9' first and then in descending order, it has to be a numeric-based sort ultimately.

Thanks in advance,

Mark

sort -t";" -rk2,2 example_data.dat

Thanks for the uber-fast reply Krish. I looked at the key definition thing (k switch) but it didn't seem to to work either. Using what you gave does the right thing, only, when I transpose that command to use on my live data, it doesn't. Here's an example (first 10 lines out of the newly sorted file):

sort -t";" rk2,2 mwreport_joined.txt > mwreport_sorted.txt

GLMLRP_ComparisonJob;989:13:42
GLMLRP_Diff_HighlighterJob;989:08:56
AD046;988:44:15
GleamMIPostCanadaExtractJob;9196:53:12
GleamMIAGREERepAllBackOutJob;9025:39:12
GleamMIAGREEProdFacilCombJob;9025:29:36
GleamMIAGREEExcRateHistExtractJob;9025:21:26
GleamMIAGREEDynamicParamJob;9025:19:10
GleamMIAGREEClassExtractJob;9025:11:35
GleamMIAGREEClassPODLoadJob;9025:09:43

As you can see above, the "9196:53:12" value in the fourth record should be top of the list as it is the largest numerically

Is this what you want?

sort -t";" -rn -k2,2 mwreport_joined.txt

GleamMIPostCanadaExtractJob;9196:53:12
GleamMIAGREERepAllBackOutJob;9025:39:12
GleamMIAGREEProdFacilCombJob;9025:29:36
GleamMIAGREEExcRateHistExtractJob;9025:21:26
GleamMIAGREEDynamicParamJob;9025:19:10
GleamMIAGREEClassPODLoadJob;9025:09:43
GleamMIAGREEClassExtractJob;9025:11:35
GLMLRP_Diff_HighlighterJob;989:08:56
GLMLRP_ComparisonJob;989:13:42
AD046;988:44:15

That's closer yes... although, I've highlighted above where records are out of line:

9025:11:35 - this should sit between 9025:19:10 and 9025:09:43

989:13:42 - this should be above 989:08:56

Wrote nonsense - will come up with a better idea, brb :wink:

Here it is:

awk -F ";" '{print $2}' mwreport_joined.txt| sort -t ";" -rn -k 1,1 -k 2,2 -k 3,3 | xargs -I {} grep {} mwreport_joined.txt

Maybe not nice but works on Debian Linux. Not sure about the -I {} on xargs for other OS'es. On AIX I usually just leave it out iirc.

Cheers again :slight_smile: That seems to work although it returns each line multiple times, meaning the output file becomes much larger than the original

Using option g?

sort -t";" -k 2,2rg file.to.sort

@ripat
Nope doesn't work - check the second column when having the ":" as delimeter. It only sorts the 1st number in front of the 1st ":".

@miwinter
Hm, I tested it with mawk on Debian Linux and on AIX with awk and no duplicate output. Not sure what you have to add to supress the duplicates... if you find no option to tell it awk, maybe pipe it into "uniq" at the end.

Or split the input to make it easier for sort to parse the fields you want.

vnix$*awk -F';' '{ split($2,t,":"); print t[1] ";" t[2] ";" t[3] ";" $0 }' <<HERE |
> GleamMIPostCanadaExtractJob;9196:53:12
> GleamMIAGREERepAllBackOutJob;9025:39:12
> GleamMIAGREEProdFacilCombJob;9025:29:36
> GleamMIAGREEExcRateHistExtractJob;9025:21:26
> GleamMIAGREEDynamicParamJob;9025:19:10
> GleamMIAGREEClassPODLoadJob;9025:09:43
> GleamMIAGREEClassExtractJob;9025:11:35
> GLMLRP_Diff_HighlighterJob;989:08:56
> GLMLRP_ComparisonJob;989:13:42
> AD046;988:44:15
> HERE
> sort -t ';' -rn | cut -d ';' -f4-
GleamMIPostCanadaExtractJob;9196:53:12
GleamMIAGREERepAllBackOutJob;9025:39:12
GleamMIAGREEProdFacilCombJob;9025:29:36
GleamMIAGREEExcRateHistExtractJob;9025:21:26
GleamMIAGREEDynamicParamJob;9025:19:10
GleamMIAGREEClassExtractJob;9025:11:35
GleamMIAGREEClassPODLoadJob;9025:09:43
GLMLRP_ComparisonJob;989:13:42
GLMLRP_Diff_HighlighterJob;989:08:56
AD046;988:44:15

Take away the final cut to see what it's really doing.

sort has a strange behavior. When you pipe him the second field using awk, the sort is ok:

$ awk -F";" '{print $2}' f | sort -rn
9196:53:12
9025:39:12
9025:29:36
9025:21:26
9025:19:10
9025:11:35
9025:09:43
989:13:42
989:08:56
988:44:15

But when you ask sort to do the very same i.e to split on the ; and sort on the second key, the output is, indeed, not as expected:

$ sort -t";" -k2rn f
GleamMIPostCanadaExtractJob;9196:53:12
GleamMIAGREEClassExtractJob;9025:11:35
GleamMIAGREEClassPODLoadJob;9025:09:43
GleamMIAGREEDynamicParamJob;9025:19:10
GleamMIAGREEExcRateHistExtractJob;9025:21:26
GleamMIAGREEProdFacilCombJob;9025:29:36
GleamMIAGREERepAllBackOutJob;9025:39:12
GLMLRP_ComparisonJob;989:13:42
GLMLRP_Diff_HighlighterJob;989:08:56
AD046;988:44:15

Very strange.

And one more...
Forgot about that you can use more than one Field Separator - so here a more compact version (ok, some cheating on the delimiters with sed in the end):

awk -F ";|:" 'BEGIN{OFS=":"} {print $1,$2,$3,$4}' infile| sort -t ":" -rn -k 2,2 -k 3,3 -k 4,4 | sed -n 's/\:/\;/p'

@era
Btw, funny awards on your site :slight_smile:

Got it! I had to read all possible options, even the one that are not well documented!

$ sort -t";" -s -k2,2rn f
GleamMIPostCanadaExtractJob;9196:53:12
GleamMIAGREERepAllBackOutJob;9025:39:12
GleamMIAGREEProdFacilCombJob;9025:29:36
GleamMIAGREEExcRateHistExtractJob;9025:21:26
GleamMIAGREEDynamicParamJob;9025:19:10
GleamMIAGREEClassExtractJob;9025:11:35
GleamMIAGREEClassPODLoadJob;9025:09:43
GLMLRP_ComparisonJob;989:13:42
GLMLRP_Diff_HighlighterJob;989:08:56
AD046;988:44:15

Option -s did the trick! Found solution by reading this man page:

   Finally, as a last resort when all keys compare equal (or if no
   ordering options were specified at all), `sort' compares the entire
   lines.  The last resort comparison honors the `-r' global option.  The
   `-s' (stable) option disables this last-resort comparison so that lines
   in which all fields compare equal are left in their original relative
   order.  If no fields or global options are specified, `-s' has no
   effect.

sort Linux Commands: Sort text files

Nice solution! :slight_smile:

Edit:
Just saw that sort does not have that switch on AIX (didn't look for a similar one) but on Linux, nice though :wink:

Aargh! Why on Earth does AIX not provide this option! It's just what I'm after

If you split on the colons as well, sort should work. (We assume you don't really require a stable sort; if two lines have exactly the same sort key, their order should not matter; correct?) Did you try the awk script I posted before?

Sorry era, yep, I have gone with that solution and it looks to be working a dream on my entire live data (about 43,000 rows). Thanks very much to all who have contributed in this thread, your help is much appreciated. I thought I was going mad when I saw the odd behaviour we've witnessed with sort in AIX here! :slight_smile: