[Bioc-devel] Interoperability between DataFrame and dplyr?
On Fri, Apr 24, 2015 at 7:42 AM, Jim Hester <james.f.hester at gmail.com> wrote:
dplyr internally converts all `data.frame` objects to its `tbl_df` class and most dplyr methods operate on the `tbl` superclass, see ( https://github.com/hadley/dplyr/blob/master/R/tbl-df.r, https://github.com/hadley/dplyr/blob/master/R/tbl.r).
I hope you're speaking only of the data frame implementation here.
The most direct route would to getting DataFrame objects working be just to just provide a method that converts the `DataFrame` objects to `data.frame`, then call `tbl_df()` on that.
That coercion already exists, of course, and it's via the S3 as.data.frame, so it should work already.
However this would copy the data multiple times, so probably the best option would be to create a new `tbl_DF` class to handle `DataFrame` objects directly.
It doesn't copy the data, outside of the list of pointers (so it's pretty much instantaneous), but yea, I agree a new implementation is the way to go.
You can look in the various tbl-*.r files at ( https://github.com/hadley/dplyr/blob/master/R/) to see what methods should be implemented. On Fri, Apr 24, 2015 at 10:16 AM, Michael Lawrence < lawrence.michael at gene.com> wrote:
Sure, but the way DataFrame is flexible is by relying on two abstractions in base R. Just length() and '['. If dplyr does the same thing, which seems totally reasonable, everything should work the same. On Thu, Apr 23, 2015 at 4:32 PM, Vincent Carey < stvjc at channing.harvard.edu> wrote:
Seems to me that DataFrame is too flexible -- you can have very complex objects in the columns (anything that inherits from Vector) with which,
in
its current state, dplyr would not work too naturally. You would wind
up
doing a fair amount of coercion of such entities, so it seems to me that arranging a coercion of DataFrames satisfying specific conditions to data.frame would be a path of low resistance. Ready to be corrected of course. On Thu, Apr 23, 2015 at 7:06 PM, Ryan C. Thompson <rct at thompsonclan.org wrote:
Hi all, So, dplyr is a pretty cool thing, but it currently works with
data.frame
and data.table, but not S4Vectors::DataFrame. I'd like to change that
if
possible, and I assume that this would "simply" involve writing some
glue
code. However, I'm not really sure where to start, and I expect things might be complicated because dplyr uses S3 and S4Vectors uses S4. Can anyone offer any pointers? -Ryan
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel