Hi,
I am using some codes that have been ported from unix to linux, and now the sorting no longer results in the desired ordering. I'm hoping to find a way to mimic the unix sort command in linux. The input file is structured the following:
$> cat file.txt
US;KSU1;00;LHZ;2006-07-26;17:41:00;2008-12-31;23:59:59
US;KSU1;10;LH2;2006-07-26;17:41:00;2999-12-31;23:59:59
US;KSU1; ;LHZ;2008-12-31;17:41:00;2999-12-31;23:59:59
US;KSU1;HR;LHZ;2006-06-01;00:00:00;2999-12-31;23:59:59
US;KSU1;XX;LH1;2006-07-26;17:41:00;2999-12-31;23:59:59
US;KSU1;HR;LH2;2006-06-01;00:00:00;2999-12-31;23:59:59
US;KSU1;HR;LH1;2006-06-01;00:00:00;2999-12-31;23:59:59
US;BSU2;00;LHZ;2006-07-26;17:41:00;2008-12-31;23:59:59
US;BSU2;10;LHN;2006-07-26;17:41:00;2999-12-31;23:59:59
US;BSU2; ;LHZ;2008-12-31;17:41:00;2999-12-31;23:59:59
US;BSU2;HR;LHZ;2006-06-01;00:00:00;2999-12-31;23:59:59
US;BSU2;XX;LHE;2006-07-26;17:41:00;2999-12-31;23:59:59
US;BSU2;HR;LHN;2006-06-01;00:00:00;2999-12-31;23:59:59
US;BSU2;HR;LHE;2006-06-01;00:00:00;2999-12-31;23:59:59
It is semi colon separated (although that doesn't particularly matter). Please note that in the 3rd and 10th rows, column three appears to be "missing" a value. It isn't, it is simple two blanks "<space><space>". This is a real entry in this file. The output should be sorted into a specified format, where it is keyed in order on each column. In unix, the default sort command (also removing unique lines is what we've always used). The result is
unix> cat file.txt | sort -u
US;BSU2; ;LHZ;2008-12-31;17:41:00;2999-12-31;23:59:59
US;BSU2;00;LHZ;2006-07-26;17:41:00;2008-12-31;23:59:59
US;BSU2;10;LHN;2006-07-26;17:41:00;2999-12-31;23:59:59
US;BSU2;HR;LHE;2006-06-01;00:00:00;2999-12-31;23:59:59
US;BSU2;HR;LHN;2006-06-01;00:00:00;2999-12-31;23:59:59
US;BSU2;HR;LHZ;2006-06-01;00:00:00;2999-12-31;23:59:59
US;BSU2;XX;LHE;2006-07-26;17:41:00;2999-12-31;23:59:59
US;KSU1; ;LHZ;2008-12-31;17:41:00;2999-12-31;23:59:59
US;KSU1;00;LHZ;2006-07-26;17:41:00;2008-12-31;23:59:59
US;KSU1;10;LH2;2006-07-26;17:41:00;2999-12-31;23:59:59
US;KSU1;HR;LH1;2006-06-01;00:00:00;2999-12-31;23:59:59
US;KSU1;HR;LH2;2006-06-01;00:00:00;2999-12-31;23:59:59
US;KSU1;HR;LHZ;2006-06-01;00:00:00;2999-12-31;23:59:59
US;KSU1;XX;LH1;2006-07-26;17:41:00;2999-12-31;23:59:59
There are two main entries, as determined by column one and column two. "US BSU2" and "US KSU1". For each of these, the "blank" in column three has been sorted highest, then in numerical order, followed by the alphabetical values. This is the correct formatting for this file. However, if I perform the same command within linux, the output is much different.
linux$> cat file.txt | sort -t';' -u
US;BSU2;00;LHZ;2006-07-26;17:41:00;2008-12-31;23:59:59
US;BSU2;10;LHN;2006-07-26;17:41:00;2999-12-31;23:59:59
US;BSU2;HR;LHE;2006-06-01;00:00:00;2999-12-31;23:59:59
US;BSU2;HR;LHN;2006-06-01;00:00:00;2999-12-31;23:59:59
US;BSU2;HR;LHZ;2006-06-01;00:00:00;2999-12-31;23:59:59
US;BSU2; ;LHZ;2008-12-31;17:41:00;2999-12-31;23:59:59
US;BSU2;XX;LHE;2006-07-26;17:41:00;2999-12-31;23:59:59
US;KSU1;00;LHZ;2006-07-26;17:41:00;2008-12-31;23:59:59
US;KSU1;10;LH2;2006-07-26;17:41:00;2999-12-31;23:59:59
US;KSU1;HR;LH1;2006-06-01;00:00:00;2999-12-31;23:59:59
US;KSU1;HR;LH2;2006-06-01;00:00:00;2999-12-31;23:59:59
US;KSU1;HR;LHZ;2006-06-01;00:00:00;2999-12-31;23:59:59
US;KSU1; ;LHZ;2008-12-31;17:41:00;2999-12-31;23:59:59
US;KSU1;XX;LH1;2006-07-26;17:41:00;2999-12-31;23:59:59
In this case, the rows with the "blanks" are no longer given the highest ranking, and instead slot between HR and XX.
Is there a way to emulate the behaviour of the unix sort command within linux. I imagine there is a difference in the precedence of the characters, but how the <space><space> is interpreted to fit between HR and XX, I don't know.
Thanks for any help.