From: "Jeff Newmiller" <jdnewmil at dcn.davis.ca.us>
To: "Chris Evans" <chrishold at psyctc.org>, "r-helpr-project.org" <r-help at r-project.org>
Sent: Tuesday, 6 December, 2016 23:23:28
Subject: Re: [R] Odd behaviour of mean() with a numeric column in a tibble
You really need sleep. Then you need to read
?`[[`
and in particular read about the second argument to the `[[` function, since you
don't seem to understand what it is for. Maybe reread the Introduction to R
document that comes with R.
The simplest solution is to treat `[[` as supporting one index and `[` as
supporting either one or two.
As for expecting any form of row indexing of data frames or tibbles to return a
vector, that is hopeless because each column can have a different type. dta[
1, ] returns exactly what it has to return to avoid losing fidelity. If you
really need row indexing to return a vector you should be using a matrix.
--
Sent from my phone. Please excuse my brevity.
On December 6, 2016 2:10:15 PM PST, Chris Evans <chrishold at psyctc.org> wrote:
{{SIGH}}
You are absolutely right.
I wonder if I am losing some cognitive capacities that are needed to be
part of the evolving R community. It seems to me that if a tibble is
designed to be an enhanced replacement for a dataframe then it
shouldn't quite so radically change things.
I notice that the documentation on tibble says "[ Never simplifies
(drops), so always returns data.frame"
That is much less explicit than I would have liked and actually doesn't
seem to be true. In fact, as you rightly say, it generally, but not
quite always, returns a tibble. In fact it can be fooled into a vector
of length 1.
Error in `[[.data.frame`(tmpTibble, 1, ) :
argument "..2" is missing, with no default
# A tibble: 26 ? 1
ID
<chr>
1 a
2 b
3 c
4 d
5 e
6 f
7 g
8 h
9 i
10 j
# ... with 16 more rows
# A tibble: 26 ? 1
ID
<chr>
1 a
2 b
3 c
4 d
5 e
6 f
7 g
8 h
9 i
10 j
# ... with 16 more rows
Error in `[<-.data.frame`(`*tmp*`, , value = list(ID = c("a", "a", "a",
:
replacement element 3 is a matrix/data frame of 26 rows, need 1
In addition: Warning messages:
1: In `[<-.data.frame`(`*tmp*`, , value = list(ID = c("a", "a", "a", :
replacement element 1 has 26 rows to replace 1 rows
2: In `[<-.data.frame`(`*tmp*`, , value = list(ID = c("a", "a", "a", :
replacement element 2 has 26 rows to replace 1 rows
Error: Invalid column indexes: 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26
Error in col[[i, exact = exact]] :
attempt to select more than one element in vectorIndex
So [[a,b]] works if a and b are legal with the dimensions of the tibble
and if a is of length 1 but returns NOT a tibble but a vector of length
1 (I think), I can see that's logical but not what it says in the
documentation.
[[a]] and [[,a]] return the same result, that seems excessively
tolerant to me.
[[a,b:c]] actually returns [[a,c]] and again as a single value, NOT a
tibble.
And row subsetting/indexing has gone.
Why create replacement for a dataframe that has no row indexing and so
radically redefines column indexing, in fact redefines the whole of
indexing and subsetting?
OK. I will go to sleep now and hope to feel less dumb(ed) when I wake.
Perhaps Prof. Wickham or someone can spell out a bit less tersely, and
I think incompletely, than the tibble documentation does, why all this
is good.
Thanks anyway Ista, you certainly hit the issue!
Very best all,
Chris
From: "Ista Zahn" <istazahn at gmail.com>
To: "Chris Evans" <chrishold at psyctc.org>
Cc: "r-helpr-project.org" <r-help at r-project.org>
Sent: Tuesday, 6 December, 2016 21:40:41
Subject: Re: [R] Odd behaviour of mean() with a numeric column in a
Not at a computer to check right now, but I believe single bracket
tibble always returns a tibble. To extract a vector use [[
On Dec 6, 2016 4:28 PM, "Chris Evans" < chrishold at psyctc.org > wrote:
I hope I am obeying the list rules here. I am using a raw R IDE for
running 3.3.2 (2016-10-31) on x86_64-w64-mingw32/x64 (64-bit)
Here is a reproducible example. Code only first
require(tibble)
tmpTibble <- tibble(ID=letters,num=1:26)
min(tmpTibble[,2]) # fine
max(tmpTibble[,2]) # fine
median(tmpTibble[,2]) # not fine
mean(tmpTibble[,2]) # not fine
newMeanFun <- function(x) {mean(as.numeric(unlist(x)))}
newMeanFun(tmpTibble[,2]) # solved problem but surely shouldn't be
newMedianFun <- function(x) {median(as.numeric(unlist(x)))}
newMedianFun(tmpTibble[,2]) # ditto
str(tmpTibble[,2])
### then I tried this to make sure it wasn't about having fed in
tmpTibble2 <- tibble(ID=letters,num=1:26,num2=(1:26)/10)
tmpTibble2
mean(tmpTibble2[,3]) # not fine, not about integers!
### before I just created tmpTibble2 I found myself trying to add a
tmpTibble
tmpTibble$newNum <- tmpTibble[,2]/10 # NO!
tmpTibble[["newNum"]] <- tmpTibble[,2]/10 # NO!
### and oddly enough ...
add_column(tmpTibble,newNum = tmpTibble[,2]/10) # NO!
Now here it is with the output:
Loading required package: tibble
tmpTibble <- tibble(ID=letters,num=1:26)
min(tmpTibble[,2]) # fine
max(tmpTibble[,2]) # fine
median(tmpTibble[,2]) # not fine
Error in median.default(tmpTibble[, 2]) : need numeric data
mean(tmpTibble[,2]) # not fine
[1] NA
Warning message:
In mean.default(tmpTibble[, 2]) :
argument is not numeric or logical: returning NA
newMeanFun <- function(x) {mean(as.numeric(unlist(x)))}
newMeanFun(tmpTibble[,2]) # solved problem but surely shouldn't
newMedianFun <- function(x) {median(as.numeric(unlist(x)))}
newMedianFun(tmpTibble[,2]) # ditto
Classes ?tbl_df?, ?tbl? and 'data.frame': 26 obs. of 1 variable:
$ num: int 1 2 3 4 5 6 7 8 9 10 ...
### then I tried this to make sure it wasn't about having fed in
tmpTibble2 <- tibble(ID=letters,num=1:26,num2=(1:26)/10)
tmpTibble2
# A tibble: 26 ? 3
ID num num2
<chr> <int> <dbl>
1 a 1 0.1
2 b 2 0.2
3 c 3 0.3
4 d 4 0.4
5 e 5 0.5
6 f 6 0.6
7 g 7 0.7
8 h 8 0.8
9 i 9 0.9
10 j 10 1.0
# ... with 16 more rows
mean(tmpTibble2[,3]) # not fine, not about integers!
[1] NA
Warning message:
In mean.default(tmpTibble2[, 3]) :
argument is not numeric or logical: returning NA
### before I just created tmpTibble2 I found myself trying to add
tmpTibble
tmpTibble$newNum <- tmpTibble[,2]/10 # NO!
tmpTibble[["newNum"]] <- tmpTibble[,2]/10 # NO!
### and oddly enough ...
add_column(tmpTibble,newNum = tmpTibble[,2]/10) # NO!
Error: Each variable must be a 1d atomic vector or list.
Problem variables: 'newNum'
I discovered this when I hit odd behaviour after using read_spss()
haven package for the first time as it seemed to be offering a step
over good old read.spss() from the excellent foreign package. I am
here not directly to Prof. Wickham as the issues seem rather general
guessing that it needs to be fixed with a fix to tibble. Or perhaps
completely missed something.