Skip to content
Prev 343442 / 398506 Next

Mixed sorting/ordering of strings acknowledging roman numerals?

On Aug 26, 2014, at 5:24 PM, Henrik Bengtsson wrote:

            
It's pretty easy to sort something you know to be congruent with the existing roman class:

romanC <- as.character( as.roman(1:3899) )
match(c("I", "II", "III","X","V"), romanC)
#[1]  1  2  3 10  5

But I guess you already know that, so you want a regex approach to parsing. Looking at the path taken by Warnes, it would involve doing something like his regex based insertion of a delimiter for "Roman numeral" but simpler because he needed to deal with decimal points and signs and exponent notation, none of which you appear to need. If you only need to consider character and Roman, then this hack of Warnes tools succeeds:

 mixedorderRoman <- function (x) 
{
    if (length(x) < 1) 
        return(NULL)
    else if (length(x) == 1) 
        return(1)
    if (is.numeric(x)) 
        return(order(x))
    delim = "\\$\\@\\$"
    roman <- function(x) {
        suppressWarnings(match(x, romanC))
    }
    nonnumeric <- function(x) {
        suppressWarnings(ifelse(is.na(as.numeric(x)), toupper(x), 
            NA))
    }
    x <- as.character(x)
    which.nas <- which(is.na(x))
    which.blanks <- which(x == "")
    if (length(which.blanks) > 0) 
        x[which.blanks] <- -Inf
    if (length(which.nas) > 0) 
        x[which.nas] <- Inf
    delimited <- gsub("([IVXCL]+)", 
        paste(delim, "\\1", delim, sep = ""), x)
    step1 <- strsplit(delimited, delim)
    step1 <- lapply(step1, function(x) x[x > ""])
    step1.roman <- lapply(step1, roman)
    step1.character <- lapply(step1, nonnumeric)
    maxelem <- max(sapply(step1, length))
    step1.roman.t <- lapply(1:maxelem, function(i) sapply(step1.roman, 
        function(x) x[i]))
    step1.character.t <- lapply(1:maxelem, function(i) sapply(step1.character, 
        function(x) x[i]))
    rank.roman <- sapply(step1.roman.t, rank)
    rank.character <- sapply(step1.character.t, function(x) as.numeric(factor(x)))
    rank.roman[!is.na(rank.character)] <- 0
    rank.character <- t(t(rank.character) + apply(matrix(rank.roman), 
        2, max, na.rm = TRUE))
    rank.overall <- ifelse(is.na(rank.character), rank.numeric, 
        rank.character)
    order.frame <- as.data.frame(rank.overall)
    if (length(which.nas) > 0) 
        order.frame[which.nas, ] <- Inf
    retval <- do.call("order", order.frame)
    return(retval)
}

y[mixedorderRoman(y)]
 [1] "chr I"    "chr II"   "chr III"  "chr IV"   "chr IX"  
 [6] "chr V"    "chr VI"   "chr VII"  "chr VIII" "chr X"   
[11] "chr XI"   "chr XII"