If I use ls to print all the files of a folder, is there a way to sort using roman numerals?
I am thinking about a result like:
benjamin_I.wmv
benjamin_II.wmv
benjamin_II.wmv
benjamin_III.wmv
benjamin_IV.wmv
benjamin_V.wmv
benjamin_VI.wmv
benjamin_VII.wmv
benjamin_VIII.wmv
benjamin_IX.wmv
The roman numerals are always preceded by an underscore.
#!/usr/bin/perl -w
use Roman;
sub romanSort { #custom sorting definition
$a =~ /^.*_([MDCLXVI]+)\..*/; #capture all roman numerals after underscore
$aRom = arabic($1); #convert captured roman number to arabic (e.g. XIV --> 14 )
$b =~ /^.*_([MDCLXVI]+)\..*/; #repeat with second input
$aRom <=> arabic($1); #numeric comparison between the converted numbers
}
@data = (<>); #slurp the whole input into one array
print sort romanSort @data; #print sorted array using custom routine romanSort
Egads... I'd translate the numbers from roman numerals into normal numbers, sort, then change them back.
# roman.awk
# Adapted from a clever converter found here
# http://scripts.mit.edu/~yfarjoun/homepage/index.php?title=Code_Snippets
BEGIN {
R["I"]=1; R["V"]=5; R["X"]=10; R["L"]=50;
R["C"]=100; R["D"]=500; R["M"]=1000;
E["iv"]="IIII"; E["ix"]="VIIII";
E["xl"]="XXXX"; E["xc"]="LXXXX";
E["cd"]="CCCC"; E["cm"]="DCCCC";
E["iix"]="VIII"; E["xxc"]="LXXX";
E["ccm"]="DCCC"; E["vl"]="XXXXV";
E["ld"]="CCCCL";
}
function roman_arabic(RN)
{
SUM=0;
RN=tolower(RN);
# Substitute roman numeral forms into things we can count.
# Substitue lower case for upper case so substitutions
# don't happen twice by accident.
for(K in E) while(sub(K, E[K], RN));
# Convert anything that didn't get substituted to uppercase.
RN=toupper(RN);
for(K in R) while(sub(K, "", RN)) SUM+=R[K];
return(SUM);
}
{
split($0, ARR, /[_.]/);
if($0 ~ /_[iIvVlLxXcCdDmM]+/)
{
# print ARR[1], ARR[2], ARR[3];
printf("%s<!--%d-->_%s.%s\n", ARR[1], roman_arabic(ARR[2]), ARR[2], ARR[3]);
}
}
...but beware of all the valid words that can be made from roman numerals:
$ egrep -i "^[IVXLCRDM]{3}[IVXLCRDM]*$" /usr/share/dict/cracklib-small
cdc
cdr
civic
civil
did
dill
dim
drill
icc
iii
ill
lid
lim
livid
mid
mild
mill
mimi
mimic
mix
rid
rill
rim
vii
viii
vivid
$
% ./s2
Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version"ker|rel
er|rel, machine: Linux, 2.6.32-5-686, i686
Distribution : Debian GNU/Linux 6.0
GNU bash 4.1.5
msort 8.53
-----
Data file data3:
file ii suffix
file i suffix
file iv suffix
file iii suffix
file c suffix
-----
Results, msort on roman numerals, field 2:
file i suffix
file ii suffix
file iii suffix
file iv suffix
file c suffix
If you do not use Debian, see the msort home page as noted in the script.
Best wishes ... cheers, drl
PS. I had trouble with older versions of msort on 64-bit Debian (lenny), but the version of msort on the current stable edition (squeeze) seems to work correctly, as noted above.