Sort roman numerals

If I use ls to print all the files of a folder, is there a way to sort using roman numerals?

I am thinking about a result like:
benjamin_I.wmv
benjamin_II.wmv
benjamin_II.wmv
benjamin_III.wmv
benjamin_IV.wmv
benjamin_V.wmv
benjamin_VI.wmv
benjamin_VII.wmv
benjamin_VIII.wmv
benjamin_IX.wmv

The roman numerals are always preceded by an underscore.

Your best bet is probably perl's 'Roman' module.
Download Roman.pm from CPAN here: http://search.cpan.org/~chorny/Roman-1.23/lib/Roman.pm
copy it into /usr/lib/perl5/site_perl/5.8.8/ (adjust to your perl version).

Create a perl script sortRoman.pl:

#!/usr/bin/perl -w

use Roman; 

sub romanSort {  #custom sorting definition
    $a =~ /^.*_([MDCLXVI]+)\..*/;  #capture all roman numerals after underscore
    $aRom = arabic($1);   #convert captured roman number to arabic (e.g. XIV --> 14 )
    $b =~ /^.*_([MDCLXVI]+)\..*/;   #repeat with second input

    $aRom <=> arabic($1);  #numeric comparison between the converted numbers
}

@data = (<>);  #slurp the whole input into one array

print sort romanSort @data;  #print sorted array using custom routine romanSort

make it executable

chmod 754 sortRoman.pl

and try it out:

$ cat testdata
b_II.wmv
b_III.wmv
b_IX.wmv
b_IV.wmv
b_VI.wmv
b_V.wmv
b_VII.wmv
b_CXLIV.wmv
b_CXIV.wmv
b_CXII.wmv
A_B_XIV.wmv

$ ./sortRoman.pl testdata
b_II.wmv
b_III.wmv
b_IV.wmv
b_V.wmv
b_VI.wmv
b_VII.wmv
b_IX.wmv
A_B_XIV.wmv
b_CXII.wmv
b_CXIV.wmv
b_CXLIV.wmv
2 Likes

Egads... I'd translate the numbers from roman numerals into normal numbers, sort, then change them back.

# roman.awk
# Adapted from a clever converter found here
# http://scripts.mit.edu/~yfarjoun/homepage/index.php?title=Code_Snippets

BEGIN	{
		R["I"]=1;	R["V"]=5;	R["X"]=10;	R["L"]=50;
		R["C"]=100;	R["D"]=500;	R["M"]=1000;

		E["iv"]="IIII";		E["ix"]="VIIII";
		E["xl"]="XXXX";		E["xc"]="LXXXX";
		E["cd"]="CCCC";		E["cm"]="DCCCC";
		E["iix"]="VIII";	E["xxc"]="LXXX";
		E["ccm"]="DCCC";	E["vl"]="XXXXV";
		E["ld"]="CCCCL";
	}

	function roman_arabic(RN)
	{
		SUM=0;
		RN=tolower(RN);

		# Substitute roman numeral forms into things we can count.
		# Substitue lower case for upper case so substitutions
		# don't happen twice by accident.
		for(K in E)	while(sub(K, E[K], RN));

		# Convert anything that didn't get substituted to uppercase.
		RN=toupper(RN);

		for(K in R) while(sub(K, "", RN)) SUM+=R[K];

		return(SUM);
	}

	{
		split($0, ARR, /[_.]/);
		if($0 ~ /_[iIvVlLxXcCdDmM]+/)
		{
#			print ARR[1], ARR[2], ARR[3];
			printf("%s<!--%d-->_%s.%s\n", ARR[1], roman_arabic(ARR[2]), ARR[2], ARR[3]);
		}
	}
$ awk -f roman.awk < list
benjamin<!--1-->_I.wmv
benjamin<!--8-->_VIII.wmv
benjamin<!--4-->_IV.wmv
benjamin<!--5-->_V.wmv
benjamin<!--2-->_II.wmv
benjamin<!--2-->_II.wmv
benjamin<!--9-->_IX.wmv
benjamin<!--7-->_VII.wmv
benjamin<!--3-->_III.wmv
benjamin<!--6-->_VI.wmv
$ awk -f roman.awk < list | sort
benjamin<!--1-->_I.wmv
benjamin<!--2-->_II.wmv
benjamin<!--2-->_II.wmv
benjamin<!--3-->_III.wmv
benjamin<!--4-->_IV.wmv
benjamin<!--5-->_V.wmv
benjamin<!--6-->_VI.wmv
benjamin<!--7-->_VII.wmv
benjamin<!--8-->_VIII.wmv
benjamin<!--9-->_IX.wmv
$ awk -f roman.awk < list2 | sort | sed -r 's#<!--.*-->##g'
benjamin_I.wmv
benjamin_II.wmv
benjamin_II.wmv
benjamin_III.wmv
benjamin_IV.wmv
benjamin_V.wmv
benjamin_VI.wmv
benjamin_VII.wmv
benjamin_VIII.wmv
benjamin_IX.wmv
$

...but beware of all the valid words that can be made from roman numerals:

$ egrep -i "^[IVXLCRDM]{3}[IVXLCRDM]*$" /usr/share/dict/cracklib-small
cdc
cdr
civic
civil
did
dill
dim
drill
icc
iii
ill
lid
lim
livid
mid
mild
mill
mimi
mimic
mix
rid
rill
rim
vii
viii
vivid
$
3 Likes

What does this arabic($1) stand for ????

@centurion_13: arabic() is a function defined in Roman.pm module, to convert roman number to arabic (e.g.

$a = "XLVII";
$b = arabic($a);  # b==47

That's what that Roman.pm module is made for.

$1 references what was captured in a most recent regex with parentheses.

I added comments to my original reply.

1 Like

Hi.

The utility msort is in the Debian GNU/Linux repositories. It often makes life easy for complex sorting situations:

#!/usr/bin/env bash

# @(#) s2	Demonstrate sort of roman numerals, msort.
# http://freshmeat.net/projects/msort

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
pe() { for i;do printf "%s" "$i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
db() { ( printf " db, ";for i;do printf "%s" "$i";done; printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && . $C ;msort
msort --version | head -1

FILE=${1-data3}
pl " Data file $FILE:"
cat $FILE

pl " Results, msort on roman numerals, field 2:"
msort --quiet --line --position 2 --comparison-type numeric --number-system roman $FILE

exit 0

producing:

% ./s2

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version"ker|rel
er|rel, machine: Linux, 2.6.32-5-686, i686
Distribution        : Debian GNU/Linux 6.0 
GNU bash 4.1.5
msort 8.53

-----
 Data file data3:
file ii suffix
file i suffix
file iv suffix
file iii suffix
file c suffix

-----
 Results, msort on roman numerals, field 2:
file i suffix
file ii suffix
file iii suffix
file iv suffix
file c suffix

If you do not use Debian, see the msort home page as noted in the script.

Best wishes ... cheers, drl

PS. I had trouble with older versions of msort on 64-bit Debian (lenny), but the version of msort on the current stable edition (squeeze) seems to work correctly, as noted above.

2 Likes

...and in case it's unclear, "arabic" numerals are normal numbers, digits 0 through 9 etc.