Some versions of awk provide a asort() and asorti() functions to sort data.
This example uses the sort utility to sort data and should work whether or not your version of awk has those functions. This would probably also be easier if it read in the three column input file rather than the unsorted matrix input, but I didn't want to take the time to add the entries you listed as ... to test the program. This example also automatically adjusts for varying input field widths:
#!/bin/ksh
# Usage: sortmatrix [unsorted_matrix_file]
# The following awk program reads in a matrix with unsorted row and
# column headings and writes the same data rearranged with sorted row and
# column headings. The output column widths automatically adjust for
# input data width. If this script is invoked with no operands, it will
# read unsorted matrix information from a file named "input".
#
# This script is written using ksh and awk on OS X. Other than changing
# the #! line above, any shell supporting basic POSIX shell syntax
# should work. If you're using a Solaris system, change "awk" to
# "/usr/xpg4/bin/awk" (nawk might also work, but I don't remember if it
# was updated to support "%*s" to specify a runtime settable field
# width in awk printf() statements).
#
# awk variables name key:
# c0w row heading Column Width
# cc data Column Count (not counting row heading)
# cw[] data Column Width
# dbg if non-0/non-null log DeBuGging info into the file named by dlf
# dlf name of Debugging Log File
# i loop control
# ich[] unsorted Input Column Headers
# och[] sorted Output Column Headers
# sc Sort Command used to execute to sort column and row headings
# tmpfile TeMPFILE to hold results of sorting row and column headings
tmpfile="sortmatrix.order"
awk -v dbg=0 -v tmpfile="$tmpfile" 'BEGIN{
sc="sort -o " tmpfile
if(dbg) dlf = "debug.out"
}
dbg{ printf("input: %s\n", $0) > dlf
}
NR == 1{# Read and sort data oolumn headings from the 1st input line
for(i = 1; i <= NF; i++) {
if(dbg) print $i > dlf
ich = $i
print $i | sc
}
close(sc)
# Produce the list of sorted output column headings
cc = 0
while((getline och[++cc] < tmpfile) == 1) { # read the sorted headings
cw[och[cc]] = length(och[cc]) # set initial column width
if(dbg) printf("och[%d]=%s, cw[%s]=%d\n", cc, och[cc], och[cc],
cw[och[cc]]) > dlf
}
close(tmpfile)
# Verify that the sort worked
if(cc != NF) {
printf("sortmatrix: %d columns read, but sort returned %d\n",
NF, cc)
exit 1
}
next
}
{ # Process remaining input lines
# Feed the row heading into sort
print $1 | sc
if(length($1) > c0w) {
# Increase output column 0 width to match row heading length
c0w = length($1)
if(dbg) printf("c0w increeased to %d\n", c0w) > dlf
}
for(i = 2; i <= NF; i++) {
# Save the data in an array indexed by row heading and
# column heading.
data[$1,ich[i - 1]] = $i
if(dbg) printf("data[%s,%s]=%s\n", $1, ich[i-1], $i) > dlf
if(cw[ich[i - 1]] < length($i)) {
# A data field is wider than the column heading,
# adjust the column width.
cw[ich[i - 1]] = length($i)
if(dbg) printf("cw[%s] increased to %d\n",
$i, length($i)) > dlf
}
}
}
END{ # Finish the row headings sort
close(sc)
# Print the column headings
printf("%*s", c0w, "")
for(i = 1; i <= cc; i++) {
cw[och]++ # Add room to put a space between output columns.
printf("%*s", cw[och], och)
}
printf("\n")
# Read the sorted row headings and print the data
rowcnt = 1
while((getline row < tmpfile) == 1) {
printf("%-*s", c0w, row)
for(i = 1; i <= cc; i++)
printf("%*s", cw[och], data[row,och])
printf("\n")
rowcnt++
}
# Verify that the sort worked
if(rowcnt != NR) {
printf("sortmatrix: Read %d data rows, sort only returned %d\n",
NR - 1, rowcnt - 1)
exit 2
}
}' ${1:-input}
rm $tmpfile
When the above is saved in a file named sortmatrix and made executable with chmod +x sortmatrix
run with a file named input that contains:
AB AE AC AD AA AF AC12 AClonger
SA 3 4 5 6 4 6 121 long1
SC 5 7 2 8 4 3 123 long3
SD 4 6 5 3 8 3 124 long4
SE 45 34 5 12 09 1 125 long5
SB 34 33 34 45 67 2 122 long2
more m1 m2 m3 m4 m5 m6 m7-xxx m8
it produces the following output:
AA AB AC AC12 AClonger AD AE AF
SA 4 3 5 121 long1 6 4 6
SB 67 34 34 122 long2 45 33 2
SC 4 5 2 123 long3 8 7 3
SD 8 4 5 124 long4 3 6 3
SE 09 45 5 125 long5 12 34 1
more m5 m1 m3 m7-xxx m8 m4 m2 m6
Note that if you change -v dbg=0
to -v dbg=1
, this script will produce a debugging log file providing data that may be useful if you need to make changes to some part of the program's logic.