Sort strings with numbers

aydj · August 23, 2015, 2:24pm

I want to sort my data first by the 2nd field then by the first field.
I can't use

sort -V

because I don't have gnu sort and cannot install one.
How do I go about this?

Input:

G456 KT1 34
K234 KT10 45
L2 KT2 26
H5 LAF2 28
F3 LAF2 36

Output:

G456 KT1 34
L2 KT2 26
K234 KT10 45
F3 LAF2 36
H5 LAF2 28

Don_Cragun · August 23, 2015, 4:13pm

There are some data specific ways to pre-process your data into fields that a standard sort utility can process, sort it, and then post-process the results to get back your original data in your desired sorted order.

For example, with your sample data (which has single spaces as field separators and the 1st two fields each starting with a string of one or more alphabetic characters followed by a string of one or more decimal digits), you could add spaces before the first digit in the 1st and 2nd fields, sort with options -k3,3 -k4,4n -k1,1 -k2,2n , and then remove the 3rd and 1st spaces from the sorted output.

If your data isn't as simple as shown in your sample (some data in the 1st two fields with no letters, no digits, some numbers with a leading decimal point, numbers containing more then one decimal point, more than one string of letters with numbers interspersed, etc.), then the pre-processing and post-processing steps would be correspondingly more complex.

RudiC · August 24, 2015, 3:53am

Without loss of generality (esp. concerning Don Cragun's comments) and taylored to your sample, this would satisfy your request:

sed -r 's/(^| )([[:alpha:]]+)([[:digit:]]+)/\1\2 \3/g' file | sort -k3,3 -k4,4n -k1,1 -k2,2n | sed -r 's/([[:alpha:]]+) ([[:digit:]]+)/\1\2/g'

drl · August 24, 2015, 12:13pm

Hi.

Utility msort can handle all this internally:

       msort provides twelve types of key comparison: lexicographic,  numeric,
       numeric  string, hybrid, by string length, by angle, by date, by domain
       name, by time, by ISO8601 date/time stamp, by month name, and random.
-- man msort

#!/usr/bin/env bash

# @(#) s1	Demonstrate ordering of hybrid strings, msort.
# Requires: libc6 (>= 2.7-1), libgmp3c2, libicu38 (>= 3.8-5), libtre4, libuninum5
# If not in repository, see:
# homepage: http://www.billposer.org/Software/msort.html

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
LC_ALL=C ; LANG=C ; export LC_ALL LANG
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C

FILE=${1-data1}

pl " Input data file $FILE:"
cat $FILE

pl " Expected results:"
cat expected-results.txt

pl " Results:"
msort -q -l -n2,2 -chybrid -n1,1 -clexicographic $FILE

exit 0

producing:

$ ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian 5.0.8 (lenny, workstation) 
bash GNU bash 3.2.39

-----
 Input data file data1:
G456 KT1 34
K234 KT10 45
L2 KT2 26
H5 LAF2 28
F3 LAF2 36

-----
 Expected results:
G456 KT1 34
L2 KT2 26
K234 KT10 45
F3 LAF2 36
H5 LAF2 28

-----
 Results:
G456 KT1 34
L2 KT2 26
K234 KT10 45
F3 LAF2 36
H5 LAF2 28

Best wishes ... cheers, drl

aydj · August 24, 2015, 6:11pm

@RudiC, I get this error,

sed: illegal option -- r

, can awk be used?

Don_Cragun · August 24, 2015, 6:25pm

What operating system are you using? It is a good idea to always give us this information (and the shell you're using) when you ask for help so we can choose utilities and options that will work in your environment when we make suggestions.

aydj · August 24, 2015, 6:45pm

OS = Solaris 10
Shell = ksh

Scrutinizer · August 24, 2015, 10:40pm

With awk you could try:

nawk '{p=$0;$1=$2 FS $1; gsub(/[0-9]+/," &",$1)}{print $1,p}' file | sort -n | nawk '{print $5,$6,$7}'

RudiC · August 25, 2015, 4:55am

I don't have access to a solaris system. Does it's sed support the -E option (which is equivalent to -r )?

aydj · August 25, 2015, 5:25am

@Scrutinizer, seem not to be working, This is the output I got:

G456 KT1 34
K234 KT10 45
L2 KT2 26
F3 LAF2 36
H5 LAF2 28

---------- Post updated at 10:25 ---------- Previous update was at 10:22 ----------

No -E option:

sed: illegal option -- E
sed: illegal option -- E

RudiC · August 25, 2015, 6:47am

try

sed  's/\(^\| \)\([[:alpha:]]\+\)\([[:digit:]]\+\)/\1\2 \3/g' file | sort -k3,3 -k4,4n -k1,1 -k2,2n | sed 's/\([[:alpha:]]\+\) \([[:digit:]]\+\)/\1\2/g'

aydj · August 25, 2015, 9:01am

Not working, this is the output:

L2 KT2 26
H5 LAF2 28
G456 KT1 34
F3 LAF2 36
K234 KT10 45

RudiC · August 25, 2015, 9:11am

This is becoming a bit confusing. Try

sed  's/\([[:alpha:]][[:alpha:]]*\)\([[:digit:]][[:digit:]]*\)/\1 \2/g' file | sort -k3,3 -k4,4n -k1,1 -k2,2n | sed 's/\([[:alpha:]][[:alpha:]]*\) \([[:digit:]][[:digit:]]*\)/\1\2/g'

Scrutinizer · August 25, 2015, 12:20pm

Yes, the sort wasn't right. Try:

nawk '{p=$0;$1=$2 FS $1; gsub(/[0-9]+/," &",$1)}{print $1,p}' file | sort -k1,1 -k2,2n -k3,3 -k4,4n | nawk '{print $5,$6,$7}'

---

rudic:

This is becoming a bit confusing. Try

sed  's/\([[:alpha:]][[:alpha:]]*\)\([[:digit:]][[:digit:]]*\)/\1 \2/g' file | sort -k3,3 -k4,4n -k1,1 -k2,2n | sed 's/\([[:alpha:]][[:alpha:]]*\) \([[:digit:]][[:digit:]]*\)/\1\2/g'

On Solaris one needs to use /usr/xpg4/bin/sed to use POSIX character classes..

aydj · August 25, 2015, 6:23pm

scrutinizer:

Yes, the sort wasn't right. Try:
nawk '{p=$0;$1=$2 FS $1; gsub(/[0-9]+/," &",$1)}{print $1,p}' file | sort -k1,1 -k2,2n -k3,3 -k4,4n | nawk '{print $5,$6,$7}'
---

On Solaris one needs to use /usr/xpg4/bin/sed to use POSIX character classes..

Thanks, it works!