Skip to content
Prev 55575 / 63424 Next

vctrs: a type system for the tidyverse

Hadley,

Looks interesting and like a fun project from what you said in the email (I
don't have time right now to dig deep into the readme) A few thoughts.

First off, you are using the word "type" throughout this email; You seem to
mean class (judging by your Date and factor examples, and the fact you
mention S3 dispatch) as opposed to type in the sense of what is returned by
R's  typeof() function. I think it would be clearer if you called it class
throughout unless that isn't actually what you mean (in which case I would
have other questions...)

More thoughts inline.
On Mon, Aug 6, 2018 at 9:21 AM, Hadley Wickham <h.wickham at gmail.com> wrote:

            
What is the reasoning behind taking the union of the levels here? I'm not
sure that is actually the behavior I would want if I have a vector of
factors and I try to append some new data to it. I might want/ expect to
retain the existing levels and get either NAs or an error if the new data
has (present) levels not in the first data. The behavior as above doesn't
seem in-line with what I understand the purpose of factors to be (explicit
restriction of possible values).

I guess what I'm saying is that while I agree associativity is good for
most things, it doesn't seem like the right behavior to me in the case of
factors.

Also, while we're on factors, what does

vec_type2(factor("a"), "a")

return, character or factor with levels "a"?
Why is this not a list? Do you have the additional restraint that vec_type2
must return the class of one of its operands? If so, what is the
justification of that? Are you not counting list as a "type of vector"?
I must admit I'm a bit surprised here. rbind is one of the few places that
immediately come to mind where R takes a fail early and loud approach to
likely errors (as opposed to the more permissive do soemthing  that could
be what they meant appraoch of, e.g., out-of-bounds indexing). Are we sure
we want rbind to get less strict with respect to compatibility of the
data.frames being combined? Another "permissive" option would be to return
a data.frame which has only the intersection of the columns. There are
certainly times when that is what I want (rather than columns with tons of
NAs in them) and it would be convenient not to need to do the column
subsetting myself. This behavior would also meet your design goals of
associativity and commutivity.

I want to be clear, I think what you describe is a useful operation, if it
is what is intended, but perhaps a different name rather than calling it
rbind? maybe vec_rcbind to indicate that both rows and columns are being
potentially added to any given individual input.

Best,
~G