Prev 165312 / 398506 Next

sorting variable names containing digits

John Fox

Sun, Dec 21, 2008 6:57 PM

Dear Gabor,

Thanks for this -- I was unaware of mixedsort(). As you point out,
however, mixedsort() doesn't cover all of the cases in which I'm
interested and which are handled by mysort().

Regards,
 John

On Sun, 21 Dec 2008 20:51:17 -0500

"Gabor Grothendieck" <ggrothendieck at gmail.com> wrote:

mixedsort in gtools will give the same result as mysort(s) but
differs in the case of t.

On Sun, Dec 21, 2008 at 8:33 PM, John Fox <jfox at mcmaster.ca> wrote:

Dear r-helpers,

I'm looking for a way of sorting variable names in a "natural"

order, when

the names are composed of digits and other characters. I know that

this is a

vague idea, and that sorting character strings is a complex topic,

but

perhaps a couple of examples will clarify what I mean:

s <- c("x1b", "x1a", "x02b", "x02a", "x02", "y1a1", "y10a2",

+   "y10a10", "y10a1", "y2", "var10a2", "var2", "y10")

sort(s)

 [1] "var10a2" "var2"    "x02"     "x02a"    "x02b"    "x1a"
 [7] "x1b"     "y10"     "y10a1"   "y10a10"  "y10a2"   "y1a1"
[13] "y2"

mysort(s)

 [1] "var2"    "var10a2" "x1a"     "x1b"     "x02"     "x02a"
 [7] "x02b"    "y1a1"    "y2"      "y10"     "y10a1"   "y10a2"
[13] "y10a10"

t <- c("q10.1.1", "q10.2.1", "q2.1.1", "q10.10.2")

sort(t)

[1] "q10.1.1"  "q10.10.2" "q10.2.1"  "q2.1.1"

mysort(t)

[1] "q2.1.1"   "q10.1.1"  "q10.2.1"  "q10.10.2"

Here, sort() is the standard R function and mysort() is a

replacement, which

sorts the names into the order that seems natural to me, at least

in the

cases that I've tried:

mysort <- function(x){
 sort.helper <- function(x){
   prefix <- strsplit(x, "[0-9]")
   prefix <- sapply(prefix, "[", 1)
   prefix[is.na(prefix)] <- ""
   suffix <- strsplit(x, "[^0-9]")
   suffix <- as.numeric(sapply(suffix, "[", 2))
   suffix[is.na(suffix)] <- -Inf
   remainder <- sub("[^0-9]+", "", x)
   remainder <- sub("[0-9]+", "", remainder)
   if (all (remainder == "")) list(prefix, suffix)
   else c(list(prefix, suffix), Recall(remainder))
   }
 ord <- do.call("order", sort.helper(x))
 x[ord]
  }

I have a couple of applications in mind, one of which is

recognizing

repeated-measures variables in "wide" longitudinal datasets, which

often are

named in the form x1, x2, ... , xn.

mysort(), which works by recursively slicing off pairs of non-digit

and

digit strings, seems more complicated than it should have to be,

and I

wonder whether anyone has a more elegant solution. I don't think

that

efficiency is a serious issue for the applications I'm considering,

but of

course a more efficient solution would be of interest.

Thanks,
 John

------------------------------
John Fox, Professor
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
web: socserv.mcmaster.ca/jfox

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--------------------------------
John Fox, Professor
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
http://socserv.mcmaster.ca/jfox/

Thread (6 messages)

John Fox sorting variable names containing digits Dec 21 Gabor Grothendieck sorting variable names containing digits Dec 21 John Fox sorting variable names containing digits Dec 21 Gabor Grothendieck sorting variable names containing digits Dec 21 John Fox sorting variable names containing digits Dec 21 Gabor Grothendieck sorting variable names containing digits Dec 22