Skip to content

Mixed sorting/ordering of strings acknowledging roman numerals?

2 messages · Henrik Bengtsson, David Winsemius

#
Hi,

does anyone know of an implementation/function that sorts strings that
*contain* roman numerals (I, II, III, IV, V, ...) which are treated as
numbers.  In 'gtools' there is mixedsort() which does this for strings
that contains (decimal) numbers.  I'm looking for a "mixedsortroman()"
function that does the same but with roman numbers, e.g.

## DECIMAL NUMBERS
[1] "chr 12" "chr 11" "chr 10" "chr 9"  "chr 8"
 [6] "chr 7"  "chr 6"  "chr 5"  "chr 4"  "chr 3"
[11] "chr 2"  "chr 1"
[1] "chr 1"  "chr 10" "chr 11" "chr 12" "chr 2"
 [6] "chr 3"  "chr 4"  "chr 5"  "chr 6"  "chr 7"
[11] "chr 8"  "chr 9"
[1] "chr 1"  "chr 2"  "chr 3"  "chr 4"  "chr 5"
 [6] "chr 6"  "chr 7"  "chr 8"  "chr 9"  "chr 10"
[11] "chr 11" "chr 12"


## ROMAN NUMBERS
[1] "chr XII"  "chr XI"   "chr X"    "chr IX"
 [5] "chr VIII" "chr VII"  "chr VI"   "chr V"
 [9] "chr IV"   "chr III"  "chr II"   "chr I"
[1] "chr I"    "chr II"   "chr III"  "chr IV"
 [5] "chr IX"   "chr V"    "chr VI"   "chr VII"
 [9] "chr VIII" "chr X"    "chr XI"   "chr XII"
[1] "chr I"    "chr II"   "chr III"  "chr IV"
 [5] "chr V"    "chr VI"   "chr VII"  "chr VIII"
 [9] "chr IX"   "chr X"    "chr XI"   "chr XII"

The latter is what I'm looking for.

Before hacking together something myself (e.g. identify roman numerals
substrings, translate them to decimal numbers, use gtools::mixedsort()
to sort them and then translate them back to roman numbers), I'd like
to hear if someone already has this implemented/know of a package that
does this.

Thanks,

Henrik
#
On Aug 26, 2014, at 5:24 PM, Henrik Bengtsson wrote:

            
It's pretty easy to sort something you know to be congruent with the existing roman class:

romanC <- as.character( as.roman(1:3899) )
match(c("I", "II", "III","X","V"), romanC)
#[1]  1  2  3 10  5

But I guess you already know that, so you want a regex approach to parsing. Looking at the path taken by Warnes, it would involve doing something like his regex based insertion of a delimiter for "Roman numeral" but simpler because he needed to deal with decimal points and signs and exponent notation, none of which you appear to need. If you only need to consider character and Roman, then this hack of Warnes tools succeeds:

 mixedorderRoman <- function (x) 
{
    if (length(x) < 1) 
        return(NULL)
    else if (length(x) == 1) 
        return(1)
    if (is.numeric(x)) 
        return(order(x))
    delim = "\\$\\@\\$"
    roman <- function(x) {
        suppressWarnings(match(x, romanC))
    }
    nonnumeric <- function(x) {
        suppressWarnings(ifelse(is.na(as.numeric(x)), toupper(x), 
            NA))
    }
    x <- as.character(x)
    which.nas <- which(is.na(x))
    which.blanks <- which(x == "")
    if (length(which.blanks) > 0) 
        x[which.blanks] <- -Inf
    if (length(which.nas) > 0) 
        x[which.nas] <- Inf
    delimited <- gsub("([IVXCL]+)", 
        paste(delim, "\\1", delim, sep = ""), x)
    step1 <- strsplit(delimited, delim)
    step1 <- lapply(step1, function(x) x[x > ""])
    step1.roman <- lapply(step1, roman)
    step1.character <- lapply(step1, nonnumeric)
    maxelem <- max(sapply(step1, length))
    step1.roman.t <- lapply(1:maxelem, function(i) sapply(step1.roman, 
        function(x) x[i]))
    step1.character.t <- lapply(1:maxelem, function(i) sapply(step1.character, 
        function(x) x[i]))
    rank.roman <- sapply(step1.roman.t, rank)
    rank.character <- sapply(step1.character.t, function(x) as.numeric(factor(x)))
    rank.roman[!is.na(rank.character)] <- 0
    rank.character <- t(t(rank.character) + apply(matrix(rank.roman), 
        2, max, na.rm = TRUE))
    rank.overall <- ifelse(is.na(rank.character), rank.numeric, 
        rank.character)
    order.frame <- as.data.frame(rank.overall)
    if (length(which.nas) > 0) 
        order.frame[which.nas, ] <- Inf
    retval <- do.call("order", order.frame)
    return(retval)
}

y[mixedorderRoman(y)]
 [1] "chr I"    "chr II"   "chr III"  "chr IV"   "chr IX"  
 [6] "chr V"    "chr VI"   "chr VII"  "chr VIII" "chr X"   
[11] "chr XI"   "chr XII"