Sort strings with numbers

I want to sort my data first by the 2nd field then by the first field.
I can't use

sort -V

because I don't have gnu sort and cannot install one.
How do I go about this?

Input:

G456 KT1 34
K234 KT10 45
L2 KT2 26
H5 LAF2 28
F3 LAF2 36

Output:

G456 KT1 34
L2 KT2 26
K234 KT10 45
F3 LAF2 36
H5 LAF2 28

There are some data specific ways to pre-process your data into fields that a standard sort utility can process, sort it, and then post-process the results to get back your original data in your desired sorted order.

For example, with your sample data (which has single spaces as field separators and the 1st two fields each starting with a string of one or more alphabetic characters followed by a string of one or more decimal digits), you could add spaces before the first digit in the 1st and 2nd fields, sort with options -k3,3 -k4,4n -k1,1 -k2,2n , and then remove the 3rd and 1st spaces from the sorted output.

If your data isn't as simple as shown in your sample (some data in the 1st two fields with no letters, no digits, some numbers with a leading decimal point, numbers containing more then one decimal point, more than one string of letters with numbers interspersed, etc.), then the pre-processing and post-processing steps would be correspondingly more complex.

Without loss of generality (esp. concerning Don Cragun's comments) and taylored to your sample, this would satisfy your request:

sed -r 's/(^| )([[:alpha:]]+)([[:digit:]]+)/\1\2 \3/g' file | sort -k3,3 -k4,4n -k1,1 -k2,2n | sed -r 's/([[:alpha:]]+) ([[:digit:]]+)/\1\2/g'

Hi.

Utility msort can handle all this internally:

       msort provides twelve types of key comparison: lexicographic,  numeric,
       numeric  string, hybrid, by string length, by angle, by date, by domain
       name, by time, by ISO8601 date/time stamp, by month name, and random.
-- man msort
#!/usr/bin/env bash

# @(#) s1	Demonstrate ordering of hybrid strings, msort.
# Requires: libc6 (>= 2.7-1), libgmp3c2, libicu38 (>= 3.8-5), libtre4, libuninum5
# If not in repository, see:
# homepage: http://www.billposer.org/Software/msort.html

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
LC_ALL=C ; LANG=C ; export LC_ALL LANG
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C

FILE=${1-data1}

pl " Input data file $FILE:"
cat $FILE

pl " Expected results:"
cat expected-results.txt

pl " Results:"
msort -q -l -n2,2 -chybrid -n1,1 -clexicographic $FILE

exit 0

producing:

$ ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian 5.0.8 (lenny, workstation) 
bash GNU bash 3.2.39

-----
 Input data file data1:
G456 KT1 34
K234 KT10 45
L2 KT2 26
H5 LAF2 28
F3 LAF2 36

-----
 Expected results:
G456 KT1 34
L2 KT2 26
K234 KT10 45
F3 LAF2 36
H5 LAF2 28

-----
 Results:
G456 KT1 34
L2 KT2 26
K234 KT10 45
F3 LAF2 36
H5 LAF2 28

Best wishes ... cheers, drl

@RudiC, I get this error,

sed: illegal option -- r

, can awk be used?

What operating system are you using? It is a good idea to always give us this information (and the shell you're using) when you ask for help so we can choose utilities and options that will work in your environment when we make suggestions.

OS = Solaris 10
Shell = ksh

With awk you could try:

nawk '{p=$0;$1=$2 FS $1; gsub(/[0-9]+/," &",$1)}{print $1,p}' file | sort -n | nawk '{print $5,$6,$7}'

I don't have access to a solaris system. Does it's sed support the -E option (which is equivalent to -r )?

@Scrutinizer, seem not to be working, This is the output I got:

G456 KT1 34
K234 KT10 45
L2 KT2 26
F3 LAF2 36
H5 LAF2 28

---------- Post updated at 10:25 ---------- Previous update was at 10:22 ----------

No -E option:

sed: illegal option -- E
sed: illegal option -- E

try

sed  's/\(^\| \)\([[:alpha:]]\+\)\([[:digit:]]\+\)/\1\2 \3/g' file | sort -k3,3 -k4,4n -k1,1 -k2,2n | sed 's/\([[:alpha:]]\+\) \([[:digit:]]\+\)/\1\2/g'

Not working, this is the output:

L2 KT2 26
H5 LAF2 28
G456 KT1 34
F3 LAF2 36
K234 KT10 45

This is becoming a bit confusing. Try

sed  's/\([[:alpha:]][[:alpha:]]*\)\([[:digit:]][[:digit:]]*\)/\1 \2/g' file | sort -k3,3 -k4,4n -k1,1 -k2,2n | sed 's/\([[:alpha:]][[:alpha:]]*\) \([[:digit:]][[:digit:]]*\)/\1\2/g'

Yes, the sort wasn't right. Try:

nawk '{p=$0;$1=$2 FS $1; gsub(/[0-9]+/," &",$1)}{print $1,p}' file | sort -k1,1 -k2,2n -k3,3 -k4,4n | nawk '{print $5,$6,$7}'

---

On Solaris one needs to use /usr/xpg4/bin/sed to use POSIX character classes..

1 Like

Thanks, it works!