Skip to content

[R-pkg-devel] tibbles are not data frames

3 messages · Patrick Perry, Hadley Wickham

#
On Tue, Sep 26, 2017 at 9:22 AM, Patrick Perry <patperry at gmail.com> wrote:
They can currently rely on x[[1]] returning alway a vector and x[, 1,
drop = FALSE] always returning a data frame whether x is a tibble or a
data frame. I personally don't believe that an additional approach
would help.
As I've said elsewhere in the thread that would effectively render
tibbles useless because they wouldn't work with many functions.

Hadley
#
Pro ignoring x[,1,drop=TRUE]:
(1) it forces users to write consistent code for extracting a vector 
from a data frame

Con:
(1) functions that accept both matrices and data frames might break 
(x[[j]][i] doesn't work for a matrix)
(2) functions that use the access pattern x[i,j,drop = TRUE] will break

Most of the breakages for Con (2) can be fixed by changing to x[[j]][i], 
but not all of them:

 > x <- data.frame(V=1:26, row.names = letters)
 > x[c("a","e","i","o","u"), "V", drop = TRUE]
[1]  1  5  9 15 21
 > x[["V"]][c("a","e","i","o","u")]
[1] NA NA NA NA NA

To me, the Cons outweigh the Pro, but I understand that the tidyverse 
puts a heavy weight on "one way to do things".

Perhaps a bigger issue with tibbles is that they don't let you index 
with row names:

 > y <- tibble(x = letters)
 > rownames(y)
  [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10" "11" "12" "13" 
"14" "15"
[16] "16" "17" "18" "19" "20" "21" "22" "23" "24" "25" "26"
 > y[rownames(y)[c(1,5,9,15,21)],]
# A tibble: 5 x 1
       x
<chr>
1 <NA>
2 <NA>
3 <NA>
4 <NA>
5 <NA>

If you want to write code that supports both tibbles and data frames, 
then you either have to avoid row names and drop = TRUE, or else you 
have to call `as.data.frame` on the input. This goes the other way, too. 
If you want to write a tidyverse function that also accepts data.frames, 
then you should call as_tibble on the input, otherwise your function 
will break when you index the input like x[,1].


Patrick

  
  
#
On Tue, Sep 26, 2017 at 12:15 PM, Patrick Perry <pperry at stern.nyu.edu> wrote:
I generally think that it's better to keep matrices and data frame
completely separate, but point taken.
This seems pretty rare, and I don't think anyone has complained about it yet.

I don't love adding support for drop = TRUE because it makes [.tibble
type-unstable, but maybe it's reasonable to do so in order to slightly
improve backward compatibility. I've filed an issue so we consider it
for the next major release:
https://github.com/tidyverse/tibble/issues/311
I'd argue that this is not as big as an issue, as I have no
recollection of anyone complaining about it.

Hadley