Skip to content
Prev 365998 / 398502 Next

Odd behaviour of mean() with a numeric column in a tibble

On Dec 10, 2016 4:59 PM, "Chris Evans" <chrishold at psyctc.org> wrote:
Thanks to both Jeff and Ista for your inputs some days back.  I confess I
was _indeed_ too tired to be thinking well and laterally, and even to be
copying things into Emails successfully.

I have since had more sleep (!) and I have read ?`[[`, gone back to the
pertinent parts of "Introduction to R" and generally pondered all this.  I
confess I had always avoided [[ and only ever used it for lists that were
not data frames.  I can now see just how badly I was misguessing its
behaviour: apologies, I should have realised that I needed to go right back
to basics.

I _can_ see that there are things in the behaviour of data frames that are
not that obvious but I had become very used to them.  I can see values in
converting to using tibbles instead of data frames and may try to do that.

However, I think the documentation for tibble would be improved for people
like myself if it started with something that made it even clearer that
tibbles are lists, just as data frames are, but that whereas a data frame
has a single class(df) of "data.frame", class(tibble) is:
c("tbl_df","tbl","data.frame").

I can now see that what I get from ?tibble, i.e. "tibble is a trimmed down
version of data.frame" is probably technically true though I'd describe it
as a rationalised or even a beefed up version of data.frame.  I can also
now see that what I find in https://cran.r-project.org/
web/packages/tibble/tibble.pdf:

"[ Never simplifies (drops), so always returns data.frame"

is true, but only to the extent that any tibble is still a data.frame but
with "data.frame" moved to the third position in the classes of the tibble
where it would be the first and only class were it a pure data.frame.  I
can also see now that that is not really inconsistent with what I get in
https://github.com/tidyverse/tibble:

"Tibbles also clearly delineate [ and [[: [ always returns another tibble,
[[ always returns a vector. No more drop = FALSE!"

However, I think it would be better if the tibble.pdf document said:

"[ Never simplifies (drops), so always returns tibble" even though "[ Never
simplifies (drops), so always returns data.frame" is technically true, up
to and including passing is.data.frame() as

Finally, I think I can see that if want various functions I have written
that worked fine on data frames, but which depended on indexing or
subsetting those data frames using [,i] or sometimes [,i:j]to select
vectors or matrices, then I will have to modify them so they test whether
the input is a simple data frame or a data frame that is also a tibble.


only if you relied on [.data.frame returning a vector for length-one j.
Just use [[ (or always pass a drop argument) for that case and your
indexing code will work the same on pure data.frames and tbl_dfs.

--Ista

  I guess that I could have trapped things had my functions (where
appropriate) had an is.numeric() input check ... and that I have to use an
is.tibble() check, not an is.data.frame() check to distinguish the two!


Ah well, even after years of part-time use of R, I guess it's been good for
my soul and my deeper and wider understanding of R to go right back to the
basics.

Thanks again to you both.  I am posting here to convey thanks and in case
this is useful to anyone like myself who benefits from a bit more narrative
than is usually offered by R definitions and help entries.

Chris


----- Original Message -----
r-help at r-project.org>
since you
R
return a
dta[
you
wrote:
______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.