Sort Command

Theo_Score · February 7, 2017, 9:32pm

Hi All,

I have used sort -k1 -n data.txt > output.txt command on a large text data file with over 1,000,000 rows. The command managed to sort the data but the code did not read data according to sequence of occurrence. Given below are the first five lines of the data I need to sort;

1 1 -0.0506691 0.301248 -0.0540098 0 0 0 0 -1 -0 0 0 0 0 0 0 0.015 0 
7 1 0.0119942 0.300662 -0.0584242 0 0 0 0 -1 -0 0 0 0 0 0 0 0.015 0 
6 1 0.0589997 0.30511 -0.0540171 0 0 0 0 -1 -0 0 0 0 0 0 0 0.015 0 
3 1 -0.0512266 0.330591 -0.0441473 0 0 0 0 -1 -0 0 0 0 0 0 0 0.015 0 
16 2 -0.0118166 0.320646 -0.046286 0 0 0 0 -1 -0 0 0 0 0 0 0 0.015 0

I do have repeated numbers on the first columns at different places. I would want that the sort read the data which begins with 1, 2, 3,...,n, n+1 in the order of occurrence within the text. At the moment yes, the data is sorted but it takes maybe data which begins with 1 on line 147 and place it on line 2 yet there is data which begins with 1 in say line 51.

I would appreciate further help with this.

Don_Cragun · February 7, 2017, 10:42pm

It would help if you would show us some sample input that your sort command is not sorting the way you want.

Note that the sort key specification -k1 tells sort to sort a line with the sort key being the string that starts in the 1st character of the first field on the line and continues to the end of the line. I would suggest changing your sort command to:

sort -k1,1 -n data.txt > output.txt

and see if that works the way you want it to. The standards don't require lines that have equal key fields to sort in a stable manner. If a stable order is absolutely required you can add the line numbers to the file, sort on your original sort keys (adjusted for having an additional field) and add a key to sort on the line number as your last sort key, and then strip off the line number field:

nl -ba data.txt | sort -k2,2n -k1,1n | cut -f2- > output.txt

Theo_Score · February 7, 2017, 10:58pm

Thank you so much Don, second contribution was what I was looking for.