On 09/07/2021 6:44 p.m., Bert Gunter wrote:
OK, I stand somewhat chastised.
But my point still is that what you get when you "extract" depends on
how you define "extract." Do note that ?"[" yields a help file titled
"Extract or Replace Parts of an object"; and afaics, the term
is not explicitly used as Duncan prefers.
?"[[" gives you the same page, but I agree: this part of the
documentation isn't written very clearly. The "Introduction to R"
manual
uses the terms I used (see section 2.7, "Index vectors; selecting and
modifying subsets of a data set"), as does the source code (and the R
Language Definition manual, though it's not as clear as the Intro).
But the point isn't to chastise you, it's to educate you (and the OP).
Thinking of [] as subsetting is more helpful than thinking of it as
extraction. That way the result of x[c(1,2)] makes sense. It's a
little bit more of a stretch, but the result of x[[c(1,2)]] also makes
sense when you think of it as extraction.
Duncan Murdoch
The relevant part of the
Help file says for "[" for recursive objects says: "Indexing by [ is
similar to atomic vectors and selects a list of the specified
element(s)." That a data.frame is a list is explicitly stated, as I
noted; that lists are in fact vectors is also explicitly stated
says: "Almost all lists in R internally are Generic Vectors") but
one is stuck with: a data.frame is a list and therefore a vector, but
is.vector(d3) is FALSE. The explanation is explicit again in
?is.vector ("is.vector returns TRUE if x is a vector of the specified
mode having no attributes other than names. It returns FALSE
otherwise."). But I would say these issues are sufficiently murky
my warning to be precise is not entirely inappropriate;
I may have made them more so. Sigh....
Cheers,
Bert
Bert Gunter
"The trouble with having an open mind is that people keep coming
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Fri, Jul 9, 2021 at 3:05 PM Duncan Murdoch
<murdoch.duncan at gmail.com> wrote:
On 09/07/2021 5:51 p.m., Jeff Newmiller wrote:
"Strictly speaking", Greg is correct, Bert.
Lists in R are vectors. What we colloquially refer to as "vectors"
are more precisely referred to as "atomic vectors". And without a
doubt, this "vector" nature of lists is a key underlying concept that
explains why adding a dim attribute creates a matrix that can hold data
frames. It is also a stumbling block for programmers from other
languages that have things like linked lists.
I would also object to v3 (below) as "extracting" a column from d.
"d[2]" doesn't extract anything, it "subsets" the data frame, so the
result is a data frame, not what you get when you extract something
a data frame.
People don't realize that "x <- 1:10; y <- x[[3]]" is perfectly
That extracts the 3rd element (the number 3). The problem is that R
no way to represent a scalar number, only a vector of numbers, so
gets promoted to a vector containing that number when it is returned
assigned to y.
Lists are vectors of R objects, so if x is a list, x[[3]] is
that can be returned, and it is different from x[3].
Duncan Murdoch
On July 9, 2021 2:36:19 PM PDT, Bert Gunter
<bgunter.4567 at gmail.com> wrote:
"1. a column, when extracted from a data frame, *is* a vector."
Strictly speaking, this is false; it depends on exactly what is
d <- data.frame(col1 = 1:3, col2 = letters[1:3])
v1 <- d[,2] ## a vector
v2 <- d[[2]] ## the same, i.e
identical(v1,v2)
v3 <- d[2] ## a data.frame
v1
[1] "a" "b" "c" ## a character vector
[1] TRUE
which is simply explained in ?data.frame (where else?!) by:
"A data frame is a **list** [emphasis added] of variables of the
number of rows with unique row names, given class "data.frame". If
variables are included, the row names determine the number of
"2. maybe your question is "is a given function for a vector, or
data frame/matrix/array?". if so, i think the only way is
the help information (?foo)."
Indeed! Is this not what the Help system is for?! But note also
the S3 class system may somewhat blur the issue: foo() may work
appropriately and differently for different (S3) classes of
detailed explanation of this behavior can be found in appropriate
resources or (more tersely) via ?UseMethod .
"you might find reading ?"[" and ?"[.data.frame" useful"
Not just 'useful" -- **essential** if you want to work in R,
one gets this information via any of the numerous online
courses, or books that are available. The Help system is accurate
authoritative, but terse. I happen to like this mode of
but others may prefer more extended expositions. I stand by this
even if one chooses to use the "Tidyverse", data.table package, or
other alternative frameworks for handling data. Again, others may
disagree, but R is structured around these basics, and imo one
ignorant of them at their peril.
Cheers,
Bert
Bert Gunter
"The trouble with having an open mind is that people keep coming
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Fri, Jul 9, 2021 at 11:57 AM Greg Minshall <minshall at umich.edu>
wrote:
one more question, how can I know if the function is for column
manipulations or for vector?
i still stumble around R code. but, i'd say the following (and
forward to being corrected! :):
1. a column, when extracted from a data frame, *is* a vector.
2. maybe your question is "is a given function for a vector, or
data frame/matrix/array?". if so, i think the only way is
the help information (?foo).
3. sometimes, extracting the column as a vector from a data
object might be non-intuitive. you might find reading ?"["
?"[.data.frame" useful (as well as ?"[.data.table" if you
package). also, the str() command can be helpful in
what is happening. (the lobstr:: package's sxp() function,
as more verbose .Internal(inspect()) can also give you
with the data.table:: package, for example, if "DT" is a
object, with "x2" as a column, adding or leaving off
for the column name can make all the difference between
with a vector, or with a (much reduced) data table:
----
num [1:9] 32 32 32 32 32 32 32 32 32
Classes ?data.table? and 'data.frame': 9 obs. of 1 variable:
$ x2: num 32 32 32 32 32 32 32 32 32
- attr(*, ".internal.selfref")=<externalptr>
----
a second level of indexing may or may not help, mostly
the use of '[' versus of '[['. this can sometimes cause
when you are learning the language.
----
Classes ?data.table? and 'data.frame': 1 obs. of 1 variable:
$ x2: num 32
- attr(*, ".internal.selfref")=<externalptr>
num [1:9] 32 32 32 32 32 32 32 32 32
----
the tibble:: package (used in, e.g., the dplyr:: package)
(always?) returns a single column as a non-vector. again,
second indexing with double '[[]]' can produce a vector.
----
DP <- tibble(DT)
is.vector(DP[, "x2"])
is.vector(DP[, "x2"][[1]])
[1] TRUE
----
but, note that a list of lists is also a vector:
is.vector(list(list(1), list(1,2,3)))
str(list(list(1), list(1,2,3)))
List of 2
$ :List of 1
..$ : num 1
$ :List of 3
..$ : num 1
..$ : num 2
..$ : num 3
etc.
hth. good luck learning!
cheers, Greg