sorting variable names containing digits
Dear Gabor, Thanks for this -- I was unaware of mixedsort(). As you point out, however, mixedsort() doesn't cover all of the cases in which I'm interested and which are handled by mysort(). Regards, John On Sun, 21 Dec 2008 20:51:17 -0500
"Gabor Grothendieck" <ggrothendieck at gmail.com> wrote:
mixedsort in gtools will give the same result as mysort(s) but differs in the case of t. On Sun, Dec 21, 2008 at 8:33 PM, John Fox <jfox at mcmaster.ca> wrote:
Dear r-helpers, I'm looking for a way of sorting variable names in a "natural"
order, when
the names are composed of digits and other characters. I know that
this is a
vague idea, and that sorting character strings is a complex topic,
but
perhaps a couple of examples will clarify what I mean:
s <- c("x1b", "x1a", "x02b", "x02a", "x02", "y1a1", "y10a2",
+ "y10a10", "y10a1", "y2", "var10a2", "var2", "y10")
sort(s)
[1] "var10a2" "var2" "x02" "x02a" "x02b" "x1a" [7] "x1b" "y10" "y10a1" "y10a10" "y10a2" "y1a1" [13] "y2"
mysort(s)
[1] "var2" "var10a2" "x1a" "x1b" "x02" "x02a" [7] "x02b" "y1a1" "y2" "y10" "y10a1" "y10a2" [13] "y10a10"
t <- c("q10.1.1", "q10.2.1", "q2.1.1", "q10.10.2")
sort(t)
[1] "q10.1.1" "q10.10.2" "q10.2.1" "q2.1.1"
mysort(t)
[1] "q2.1.1" "q10.1.1" "q10.2.1" "q10.10.2" Here, sort() is the standard R function and mysort() is a
replacement, which
sorts the names into the order that seems natural to me, at least
in the
cases that I've tried:
mysort <- function(x){
sort.helper <- function(x){
prefix <- strsplit(x, "[0-9]")
prefix <- sapply(prefix, "[", 1)
prefix[is.na(prefix)] <- ""
suffix <- strsplit(x, "[^0-9]")
suffix <- as.numeric(sapply(suffix, "[", 2))
suffix[is.na(suffix)] <- -Inf
remainder <- sub("[^0-9]+", "", x)
remainder <- sub("[0-9]+", "", remainder)
if (all (remainder == "")) list(prefix, suffix)
else c(list(prefix, suffix), Recall(remainder))
}
ord <- do.call("order", sort.helper(x))
x[ord]
}
I have a couple of applications in mind, one of which is
recognizing
repeated-measures variables in "wide" longitudinal datasets, which
often are
named in the form x1, x2, ... , xn. mysort(), which works by recursively slicing off pairs of non-digit
and
digit strings, seems more complicated than it should have to be,
and I
wonder whether anyone has a more elegant solution. I don't think
that
efficiency is a serious issue for the applications I'm considering,
but of
course a more efficient solution would be of interest. Thanks, John ------------------------------ John Fox, Professor Department of Sociology McMaster University Hamilton, Ontario, Canada web: socserv.mcmaster.ca/jfox
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-------------------------------- John Fox, Professor Department of Sociology McMaster University Hamilton, Ontario, Canada http://socserv.mcmaster.ca/jfox/