Inspired by the exchange between Rolf Turner and Wacek Kusnierczyk, I
thought I'd clear up for myself the exact relationship among the
various sequence concepts in R, including not only generic vectors
(lists) and atomic vectors, but also pairlists, factor sequences,
date/time sequences, and difftime sequences.
I tabulated type of sequence vs. property to see if I could make sense
of all this. The properties I looked at were the predicates
is.{vector,list,pairlist}; whether various sequence operations (c,
rev, unique, sort, rle) can be used on objects of the various types,
and if relevant, whether they preserve the type of the input; and what
the length of class( as.XXX (1:2) ) is.
Here are the results (code to reproduce at end of email):
numer list plist fact POSIXct difft
is.vector TRUE TRUE FALSE FALSE FALSE FALSE
is.list FALSE TRUE TRUE FALSE FALSE FALSE
is.pairlist FALSE FALSE TRUE FALSE FALSE FALSE
c_keep? TRUE TRUE FALSE FALSE TRUE FALSE
rev_keep? TRUE TRUE FALSE TRUE TRUE TRUE
unique_keep? TRUE TRUE "Err" TRUE TRUE FALSE
sort_keep? TRUE "Err" "Err" TRUE TRUE TRUE
rle_len 2 "Err" "Err" "Err" "Err" "Err"
Alas, this tabulation, rather than clarifying things for me, just
confused me more -- the diverse treatment of sequences by various
operations is all rather bewildering.
Wouldn't it be easier to teach, learn, and use R if there were more
consistency in the treatment of sequences? I understand that in
long-running projects like S/R, there is an accumulation of
contributions by a variety of authors, but perhaps the time has come
for some cleanup at least for the base library?
-s
# generic outer: for generic vectors and non-vectorized functions
gouter <-
function(x,y,f,...)
matrix( mapply( f,
rep(x,length(y)),
rep(y,each = length(x)),
SIMPLIFY = FALSE ), # don't coerce booleans to numerics
length(x), length(y),
dimnames = list( names(x), names(y) ) )
# if arg evaluation gives error, return "Err", else its value
if_err <-
function(expr)
{ if (class(try(expr,silent = TRUE)) == "try-error") "Err"
else expr }
# {} needed so else will parse properly
# does f(x) have the same class as x?
keep_class <-
function(f)
function(x)
if_err( all(class(x) == class(f(x))))
seqtest <- function(seq)
{
lseq <- length(seq)
gouter(
list(
is.vector = is.vector,
is.list = is.list,
is.pairlist = is.pairlist,
`c_keep?` = keep_class(c),
`rev_keep?` = keep_class(rev) ,
`unique_keep?` = keep_class(unique),
## Beware: unique prints an error message for bad args
## even within try(...,silent=TRUE)
`sort_keep?` = keep_class(sort),
rle_len = function(a) if_err(length(rle(a)$length))
),
list(
numer = as.numeric(seq),
list = as.list(seq),
plist = as.pairlist(seq),
fact = as.factor(seq),
POSIXct = as.POSIXct(seq,origin = '1970-1-1'),
difft = as.difftime(seq,units = 'secs')
),
function(f,a)f(a)
)
}
print(seqtest(1:2))
# This starts by printing [[1]] [1]...
# because of the bug in unique mentioned above
Semantics of sequences in R
3 messages · Stavros Macrakis, Duncan Murdoch, Raubertas, Richard
I think this was posted to the wrong list, so my followup is going to R-devel.
On 22/02/2009 3:42 PM, Stavros Macrakis wrote:
Inspired by the exchange between Rolf Turner and Wacek Kusnierczyk, I
thought I'd clear up for myself the exact relationship among the
various sequence concepts in R, including not only generic vectors
(lists) and atomic vectors, but also pairlists, factor sequences,
date/time sequences, and difftime sequences.
I tabulated type of sequence vs. property to see if I could make sense
of all this. The properties I looked at were the predicates
is.{vector,list,pairlist}; whether various sequence operations (c,
rev, unique, sort, rle) can be used on objects of the various types,
and if relevant, whether they preserve the type of the input; and what
the length of class( as.XXX (1:2) ) is.
Here are the results (code to reproduce at end of email):
numer list plist fact POSIXct difft
is.vector TRUE TRUE FALSE FALSE FALSE FALSE
is.list FALSE TRUE TRUE FALSE FALSE FALSE
is.pairlist FALSE FALSE TRUE FALSE FALSE FALSE
c_keep? TRUE TRUE FALSE FALSE TRUE FALSE
rev_keep? TRUE TRUE FALSE TRUE TRUE TRUE
unique_keep? TRUE TRUE "Err" TRUE TRUE FALSE
sort_keep? TRUE "Err" "Err" TRUE TRUE TRUE
rle_len 2 "Err" "Err" "Err" "Err" "Err"
Alas, this tabulation, rather than clarifying things for me, just
confused me more -- the diverse treatment of sequences by various
operations is all rather bewildering.
But you are asking lots of different questions, so of course you should get different answers. For example, the first three rows are behaving exactly as documented. (Perhaps the functions should have been designed differently, but a pretty-looking matrix isn't an argument for that. Give some examples of how the documented behaviour is causing problems.) I think some of the operations in the later rows are undocumented (generally pairlists tend not to be documented, even if in some cases they are supported), and it might make sense to make them more consistent in the undocumented cases. But it may make more sense to completely hide pairlists, for instance, and then several more of the examples are behaving as documented. (BTW, your description of your last row doesn't match what you did, as far as I can see.)
Wouldn't it be easier to teach, learn, and use R if there were more consistency in the treatment of sequences?
Which ones in particular should change? What should they change to? What will break when you do that? > I understand that in
long-running projects like S/R, there is an accumulation of contributions by a variety of authors, but perhaps the time has come for some cleanup at least for the base library?
Generally R core members are reluctant to take on work just because someone else thinks it would be nice if they did. If you want to do this, that's one thing, but if you are just saying that it would be nice if someone else did it, then it's much less likely to get done. To get someone else to do it you need to convince them that it's a valuable use of their time, and I don't see that yet. Duncan Murdoch
1 day later
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Duncan Murdoch Sent: Sunday, February 22, 2009 4:13 PM I think this was posted to the wrong list, so my followup is going to R-devel. On 22/02/2009 3:42 PM, Stavros Macrakis wrote:
Inspired by the exchange between Rolf Turner and Wacek
Kusnierczyk, I
thought I'd clear up for myself the exact relationship among the various sequence concepts in R, including not only generic vectors (lists) and atomic vectors, but also pairlists, factor sequences, date/time sequences, and difftime sequences. I tabulated type of sequence vs. property to see if I could
make sense
of all this. The properties I looked at were the predicates
is.{vector,list,pairlist}; whether various sequence operations (c,
rev, unique, sort, rle) can be used on objects of the various types,
and if relevant, whether they preserve the type of the
input; and what
the length of class( as.XXX (1:2) ) is.
Here are the results (code to reproduce at end of email):
numer list plist fact POSIXct difft
is.vector TRUE TRUE FALSE FALSE FALSE FALSE
is.list FALSE TRUE TRUE FALSE FALSE FALSE
is.pairlist FALSE FALSE TRUE FALSE FALSE FALSE
c_keep? TRUE TRUE FALSE FALSE TRUE FALSE
rev_keep? TRUE TRUE FALSE TRUE TRUE TRUE
unique_keep? TRUE TRUE "Err" TRUE TRUE FALSE
sort_keep? TRUE "Err" "Err" TRUE TRUE TRUE
rle_len 2 "Err" "Err" "Err" "Err" "Err"
Alas, this tabulation, rather than clarifying things for me, just
confused me more -- the diverse treatment of sequences by various
operations is all rather bewildering.
But you are asking lots of different questions, so of course you should get different answers. For example, the first three rows are behaving exactly as documented. (Perhaps the functions should have been designed differently, but a pretty-looking matrix isn't an argument for that. Give some examples of how the documented behaviour is causing problems.) I think some of the operations in the later rows are undocumented (generally pairlists tend not to be documented, even if in some cases they are supported), and it might make sense to make them more consistent in the undocumented cases. But it may make more sense to completely hide pairlists, for instance, and then several more of the examples are behaving as documented. (BTW, your description of your last row doesn't match what you did, as far as I can see.)
Wouldn't it be easier to teach, learn, and use R if there were more consistency in the treatment of sequences?
Which ones in particular should change? What should they change to? What will break when you do that?
Okay, here is one that should change: 'c()' should do something useful with factors, for example return a factor whose levels are the union of the levels of the arguments. Note that precedent for this already exists in base R:
f1 <- factor(letters[1:3]) f2 <- factor(letters[3:5]) c(f1, f2)
[1] 1 2 3 1 2 3
str(rbind(data.frame(f=f1), data.frame(f=f2)))
'data.frame': 6 obs. of 1 variable: $ f: Factor w/ 5 levels "a","b","c","d",..: 1 2 3 3 4 5 So the code and documentation already exist in 'rbind.data.frame'. As for what would break, well, it is hard to imagine any possible use for the current behavior, or who could have made use of it. But you never know I guess ... Rich Raubertas Merck & Co.
> I understand that in long-running projects like S/R, there is an accumulation of contributions by a variety of authors, but perhaps the time has come for some cleanup at least for the base library?
Generally R core members are reluctant to take on work just because someone else thinks it would be nice if they did. If you want to do this, that's one thing, but if you are just saying that it would be nice if someone else did it, then it's much less likely to get done. To get someone else to do it you need to convince them that it's a valuable use of their time, and I don't see that yet. Duncan Murdoch
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Notice: This e-mail message, together with any attachme...{{dropped:12}}