Sort strings containing numbers

How can I sort this, first by 2nd field then by 1st field.

tried

sort -b -k 2,2

Input:

AS11 AB1
BD34 AB10
AF12 AC2
A345 AB10
R134 AB2
456  AC10
TTT2 BD12

desired output:

AS11 AB1
R134 AB2
A345 AB10
BD34 AB10
AF12 AC2
456  AC10
TTT2 BD12

For your sample input, where you want to perform an alphabetic sort on the 1st two characters of the 2nd field as the primary key, a numeric sort on the remaining characters in the 2nd field as the secondary key, and an alphanumeric sort on the 1st field as the tertiary key, the following should work:

sort -k2.1b,2.2b -k2.3bn,2 -k1,1 Input

If the length of the alphabetic strings at the start of the 2nd field is variable length, or if you also want to split the 1st field into alpha and numeric portions and sort them as separate keys as well, the following should work:

#!/bin/ksh
TMPF=${0##*/}.$$
awk -v tmpf="$TMPF" '
BEGIN {	sort = "sort -t, -k3,3 -k4b,4bn -k1,1 -k2b,2bn > \"" tmpf "\""
}
{	match($1, /[0-9]/)
	printf("%s,%s,", substr($1, 1, RSTART - 1), substr($1, RSTART)) | sort
	match($2, /[0-9]/)
	printf("%s,%s,%d\n", substr($2, 1, RSTART - 1), substr($2, RSTART), NR) | sort
	l[NR] = $0
}
END {	close(sort)
	FS = ","
	while((getline < tmpf) == 1)
		print l[$5]
	close(tmpf)
}' Input
rm -f "$TMPF"

This was written and tested using a Korn shell, but should work with any POSIX-conforming shell. If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk .

The above should work on any system. Some versions of sort have simpler ways of sorting alphabetic and numeric parts of individual fields and some versions of awk have built-in sort features; but since you didn't bother telling us what operating system you're using, I limited my response to more portable code.

2 Likes

Hi.

Utility msort recognizes a hybrid string as in this example:

#!/usr/bin/env bash

# @(#) s1       Demonstrate comparing hybrid strings, msort.

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
LC_ALL=C ; LANG=C ; export LC_ALL LANG
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C msort pass-fail

FILE=${1-data1}

pl " Input data file $FILE:"
cat $FILE

pl " Expected output:"
cat expected-output.txt

pl " Results:"
msort -qj --line -n 2,2 --comparison-type hybrid -n 1,1 --comparison-type hybrid $FILE |
tee f1

pass-fail f1 expected-output.txt

exit 0

producing:

$ ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 3.16.0-4-amd64, x86_64
Distribution        : Debian 8.4 (jessie) 
bash GNU bash 4.3.30
msort 8.53
pass-fail - ( local: RepRev 1.2, ~/bin/pass-fail, 2012-06-14 )

-----
 Input data file data1:
AS11 AB1
BD34 AB10
AF12 AC2
A345 AB10
R134 AB2
456  AC10
TTT2 BD12

-----
 Expected output:
AS11 AB1
R134 AB2
A345 AB10
BD34 AB10
AF12 AC2
456  AC10
TTT2 BD12

-----
 Results:
AS11 AB1
R134 AB2
A345 AB10
BD34 AB10
AF12 AC2
456  AC10
TTT2 BD12

-----
 Comparison of 7 created lines with 7 lines of desired results:
 Succeeded -- files have same content.

( Note that pass-fail is a local command, replace it with cmp , diff , etc. )

The msort code was in GNU/Debian repository, as well as in Fedora, Ubuntu, MacOS (port), to mention a few. Also at MSORT

Best wishes ... cheers, drl