problem for strsplit function

Kai,
one more question, how can I know if the function is for column
manipulations or for vector?
i still stumble around R code.  but, i'd say the following (and look
forward to being corrected! :):

1.  a column, when extracted from a data frame, *is* a vector.

2.  maybe your question is "is a given function for a vector, or for a
    data frame/matrix/array?".  if so, i think the only way is reading
    the help information (?foo).

3.  sometimes, extracting the column as a vector from a data frame-like
    object might be non-intuitive.  you might find reading ?"[" and
    ?"[.data.frame" useful (as well as ?"[.data.table" if you use that
    package).  also, the str() command can be helpful in understanding
    what is happening.  (the lobstr:: package's sxp() function, as well
    as more verbose .Internal(inspect()) can also give you insight.)

    with the data.table:: package, for example, if "DT" is a data.table
    object, with "x2" as a column, adding or leaving off quotation marks
    for the column name can make all the difference between ending up
    with a vector, or with a (much reduced) data table:
----
is.vector(DT[, x2])
[1] TRUE
str(DT[, x2])
num [1:9] 32 32 32 32 32 32 32 32 32
is.vector(DT[, "x2"])
[1] FALSE
str(DT[, "x2"])
Classes ?data.table? and 'data.frame':  9 obs. of  1 variable:
 $ x2: num  32 32 32 32 32 32 32 32 32
 - attr(*, ".internal.selfref")=<externalptr>
----

    a second level of indexing may or may not help, mostly depending on
    the use of '[' versus of '[['.  this can sometimes cause confusion
    when you are learning the language.
----
str(DT[, "x2"][1])
Classes ?data.table? and 'data.frame':  1 obs. of  1 variable:
 $ x2: num 32
 - attr(*, ".internal.selfref")=<externalptr>
str(DT[, "x2"][[1]])
num [1:9] 32 32 32 32 32 32 32 32 32
----

    the tibble:: package (used in, e.g., the dplyr:: package) also
    (always?) returns a single column as a non-vector.  again, a
    second indexing with double '[[]]' can produce a vector.
----
DP <- tibble(DT)
is.vector(DP[, "x2"])
[1] FALSE
is.vector(DP[, "x2"][[1]])
[1] TRUE
----

    but, note that a list of lists is also a vector:
is.vector(list(list(1), list(1,2,3)))
[1] TRUE
str(list(list(1), list(1,2,3)))
List of 2
 $ :List of 1
  ..$ : num 1
 $ :List of 3
  ..$ : num 1
  ..$ : num 2
  ..$ : num 3

    etc.

hth.  good luck learning!

cheers, Greg
"1.  a column, when extracted from a data frame, *is* a vector."
Strictly speaking, this is false; it depends on exactly what is meant
by "extracted." e.g.:
d <- data.frame(col1 = 1:3, col2 = letters[1:3])
v1 <- d[,2] ## a vector
v2 <- d[[2]] ## the same, i.e
identical(v1,v2)
[1] TRUE
v3 <- d[2] ## a data.frame
v1
[1] "a" "b" "c"  ## a character vector
v3
col2
1    a
2    b
3    c
is.vector(v1)
[1] TRUE
is.vector(v3)
[1] FALSE
class(v3)  ## data.frame
[1] "data.frame"
## but
is.list(v3)
[1] TRUE

which is simply explained in ?data.frame (where else?!) by:
"A data frame is a **list** [emphasis added] of variables of the same
number of rows with unique row names, given class "data.frame". If no
variables are included, the row names determine the number of rows."

"2.  maybe your question is "is a given function for a vector, or for a
    data frame/matrix/array?".  if so, i think the only way is reading
    the help information (?foo)."

Indeed! Is this not what the Help system is for?! But note also that
the S3 class system may somewhat blur the issue: foo() may work
appropriately and differently for different (S3) classes of objects. A
detailed explanation of this behavior can be found in appropriate
resources or (more tersely) via ?UseMethod .

"you might find reading ?"[" and  ?"[.data.frame" useful"

Not just 'useful" -- **essential** if you want to work in R, unless
one gets this information via any of the numerous online tutorials,
courses, or books that are available. The Help system is accurate and
authoritative, but terse. I happen to like this mode of documentation,
but others may prefer more extended expositions. I stand by this claim
even if one chooses to use the "Tidyverse", data.table package, or
other alternative frameworks for handling data. Again, others may
disagree, but R is structured around these basics, and imo one remains
ignorant of them at their peril.

Cheers,
Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
Kai,

one more question, how can I know if the function is for column
manipulations or for vector?
i still stumble around R code.  but, i'd say the following (and look
forward to being corrected! :):

1.  a column, when extracted from a data frame, *is* a vector.

2.  maybe your question is "is a given function for a vector, or for a
    data frame/matrix/array?".  if so, i think the only way is reading
    the help information (?foo).

3.  sometimes, extracting the column as a vector from a data frame-like
    object might be non-intuitive.  you might find reading ?"[" and
    ?"[.data.frame" useful (as well as ?"[.data.table" if you use that
    package).  also, the str() command can be helpful in understanding
    what is happening.  (the lobstr:: package's sxp() function, as well
    as more verbose .Internal(inspect()) can also give you insight.)

    with the data.table:: package, for example, if "DT" is a data.table
    object, with "x2" as a column, adding or leaving off quotation marks
    for the column name can make all the difference between ending up
    with a vector, or with a (much reduced) data table:
----
is.vector(DT[, x2])
[1] TRUE
str(DT[, x2])
 num [1:9] 32 32 32 32 32 32 32 32 32
is.vector(DT[, "x2"])
[1] FALSE
str(DT[, "x2"])
Classes ?data.table? and 'data.frame':  9 obs. of  1 variable:
 $ x2: num  32 32 32 32 32 32 32 32 32
 - attr(*, ".internal.selfref")=<externalptr>
----

    a second level of indexing may or may not help, mostly depending on
    the use of '[' versus of '[['.  this can sometimes cause confusion
    when you are learning the language.
----
str(DT[, "x2"][1])
Classes ?data.table? and 'data.frame':  1 obs. of  1 variable:
 $ x2: num 32
 - attr(*, ".internal.selfref")=<externalptr>
str(DT[, "x2"][[1]])
 num [1:9] 32 32 32 32 32 32 32 32 32
----

    the tibble:: package (used in, e.g., the dplyr:: package) also
    (always?) returns a single column as a non-vector.  again, a
    second indexing with double '[[]]' can produce a vector.
----
DP <- tibble(DT)
is.vector(DP[, "x2"])
[1] FALSE
is.vector(DP[, "x2"][[1]])
[1] TRUE
----

    but, note that a list of lists is also a vector:
is.vector(list(list(1), list(1,2,3)))
[1] TRUE
str(list(list(1), list(1,2,3)))
List of 2
 $ :List of 1
  ..$ : num 1
 $ :List of 3
  ..$ : num 1
  ..$ : num 2
  ..$ : num 3

    etc.

hth.  good luck learning!

cheers, Greg

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
"Strictly speaking", Greg is correct, Bert.

https://cran.r-project.org/doc/manuals/r-release/R-lang.html#List-objects

Lists in R are vectors. What we colloquially refer to as "vectors" are more precisely referred to as "atomic vectors". And without a doubt, this "vector" nature of lists is a key underlying concept that explains why adding a dim attribute creates a matrix that can hold data frames. It is also a stumbling block for programmers from other languages that have things like linked lists.
"1.  a column, when extracted from a data frame, *is* a vector."
Strictly speaking, this is false; it depends on exactly what is meant
by "extracted." e.g.:

d <- data.frame(col1 = 1:3, col2 = letters[1:3])
v1 <- d[,2] ## a vector
v2 <- d[[2]] ## the same, i.e
identical(v1,v2)
[1] TRUE
v3 <- d[2] ## a data.frame
v1
[1] "a" "b" "c"  ## a character vector
v3
 col2
1    a
2    b
3    c
is.vector(v1)
[1] TRUE
is.vector(v3)
[1] FALSE
class(v3)  ## data.frame
[1] "data.frame"
## but
is.list(v3)
[1] TRUE

which is simply explained in ?data.frame (where else?!) by:
"A data frame is a **list** [emphasis added] of variables of the same
number of rows with unique row names, given class "data.frame". If no
variables are included, the row names determine the number of rows."

"2.  maybe your question is "is a given function for a vector, or for a
   data frame/matrix/array?".  if so, i think the only way is reading
   the help information (?foo)."

Indeed! Is this not what the Help system is for?! But note also that
the S3 class system may somewhat blur the issue: foo() may work
appropriately and differently for different (S3) classes of objects. A
detailed explanation of this behavior can be found in appropriate
resources or (more tersely) via ?UseMethod .

"you might find reading ?"[" and  ?"[.data.frame" useful"

Not just 'useful" -- **essential** if you want to work in R, unless
one gets this information via any of the numerous online tutorials,
courses, or books that are available. The Help system is accurate and
authoritative, but terse. I happen to like this mode of documentation,
but others may prefer more extended expositions. I stand by this claim
even if one chooses to use the "Tidyverse", data.table package, or
other alternative frameworks for handling data. Again, others may
disagree, but R is structured around these basics, and imo one remains
ignorant of them at their peril.

Cheers,
Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Fri, Jul 9, 2021 at 11:57 AM Greg Minshall <minshall at umich.edu>
wrote:
Kai,

one more question, how can I know if the function is for column
manipulations or for vector?
i still stumble around R code.  but, i'd say the following (and look
forward to being corrected! :):

1.  a column, when extracted from a data frame, *is* a vector.

2.  maybe your question is "is a given function for a vector, or for
a
    data frame/matrix/array?".  if so, i think the only way is
reading
    the help information (?foo).

3.  sometimes, extracting the column as a vector from a data
frame-like
    object might be non-intuitive.  you might find reading ?"[" and
    ?"[.data.frame" useful (as well as ?"[.data.table" if you use
that
    package).  also, the str() command can be helpful in
understanding
    what is happening.  (the lobstr:: package's sxp() function, as
well
    as more verbose .Internal(inspect()) can also give you insight.)

    with the data.table:: package, for example, if "DT" is a
data.table
    object, with "x2" as a column, adding or leaving off quotation
marks
    for the column name can make all the difference between ending up
    with a vector, or with a (much reduced) data table:
----
is.vector(DT[, x2])
[1] TRUE
str(DT[, x2])
 num [1:9] 32 32 32 32 32 32 32 32 32
is.vector(DT[, "x2"])
[1] FALSE
str(DT[, "x2"])
Classes ?data.table? and 'data.frame':  9 obs. of  1 variable:
 $ x2: num  32 32 32 32 32 32 32 32 32
 - attr(*, ".internal.selfref")=<externalptr>
----

    a second level of indexing may or may not help, mostly depending
on
    the use of '[' versus of '[['.  this can sometimes cause
confusion
    when you are learning the language.
----
str(DT[, "x2"][1])
Classes ?data.table? and 'data.frame':  1 obs. of  1 variable:
 $ x2: num 32
 - attr(*, ".internal.selfref")=<externalptr>
str(DT[, "x2"][[1]])
 num [1:9] 32 32 32 32 32 32 32 32 32
----

    the tibble:: package (used in, e.g., the dplyr:: package) also
    (always?) returns a single column as a non-vector.  again, a
    second indexing with double '[[]]' can produce a vector.
----
DP <- tibble(DT)
is.vector(DP[, "x2"])
[1] FALSE
is.vector(DP[, "x2"][[1]])
[1] TRUE
----

    but, note that a list of lists is also a vector:
is.vector(list(list(1), list(1,2,3)))
[1] TRUE
str(list(list(1), list(1,2,3)))
List of 2
 $ :List of 1
  ..$ : num 1
 $ :List of 3
  ..$ : num 1
  ..$ : num 2
  ..$ : num 3

    etc.

hth.  good luck learning!

cheers, Greg

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Sent from my phone. Please excuse my brevity.
"Strictly speaking", Greg is correct, Bert.

https://cran.r-project.org/doc/manuals/r-release/R-lang.html#List-objects

Lists in R are vectors. What we colloquially refer to as "vectors" are more precisely referred to as "atomic vectors". And without a doubt, this "vector" nature of lists is a key underlying concept that explains why adding a dim attribute creates a matrix that can hold data frames. It is also a stumbling block for programmers from other languages that have things like linked lists.
I would also object to v3 (below) as "extracting" a column from d. 
"d[2]" doesn't extract anything, it "subsets" the data frame, so the 
result is a data frame, not what you get when you extract something from 
a data frame.

People don't realize that "x <- 1:10; y <- x[[3]]" is perfectly legal. 
That extracts the 3rd element (the number 3).  The problem is that R has 
no way to represent a scalar number, only a vector of numbers, so x[[3]] 
gets promoted to a vector containing that number when it is returned and 
assigned to y.

Lists are vectors of R objects, so if x is a list, x[[3]] is something 
that can be returned, and it is different from x[3].

Duncan Murdoch
On July 9, 2021 2:36:19 PM PDT, Bert Gunter <bgunter.4567 at gmail.com> wrote:
"1.  a column, when extracted from a data frame, *is* a vector."
Strictly speaking, this is false; it depends on exactly what is meant
by "extracted." e.g.:

d <- data.frame(col1 = 1:3, col2 = letters[1:3])
v1 <- d[,2] ## a vector
v2 <- d[[2]] ## the same, i.e
identical(v1,v2)
[1] TRUE
v3 <- d[2] ## a data.frame
v1
[1] "a" "b" "c"  ## a character vector
v3
  col2
1    a
2    b
3    c
is.vector(v1)
[1] TRUE
is.vector(v3)
[1] FALSE
class(v3)  ## data.frame
[1] "data.frame"
## but
is.list(v3)
[1] TRUE

which is simply explained in ?data.frame (where else?!) by:
"A data frame is a **list** [emphasis added] of variables of the same
number of rows with unique row names, given class "data.frame". If no
variables are included, the row names determine the number of rows."

"2.  maybe your question is "is a given function for a vector, or for a
    data frame/matrix/array?".  if so, i think the only way is reading
    the help information (?foo)."

Indeed! Is this not what the Help system is for?! But note also that
the S3 class system may somewhat blur the issue: foo() may work
appropriately and differently for different (S3) classes of objects. A
detailed explanation of this behavior can be found in appropriate
resources or (more tersely) via ?UseMethod .

"you might find reading ?"[" and  ?"[.data.frame" useful"

Not just 'useful" -- **essential** if you want to work in R, unless
one gets this information via any of the numerous online tutorials,
courses, or books that are available. The Help system is accurate and
authoritative, but terse. I happen to like this mode of documentation,
but others may prefer more extended expositions. I stand by this claim
even if one chooses to use the "Tidyverse", data.table package, or
other alternative frameworks for handling data. Again, others may
disagree, but R is structured around these basics, and imo one remains
ignorant of them at their peril.

Cheers,
Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Fri, Jul 9, 2021 at 11:57 AM Greg Minshall <minshall at umich.edu>
wrote:
Kai,

one more question, how can I know if the function is for column
manipulations or for vector?
i still stumble around R code.  but, i'd say the following (and look
forward to being corrected! :):

1.  a column, when extracted from a data frame, *is* a vector.

2.  maybe your question is "is a given function for a vector, or for
a
     data frame/matrix/array?".  if so, i think the only way is
reading
     the help information (?foo).

3.  sometimes, extracting the column as a vector from a data
frame-like
     object might be non-intuitive.  you might find reading ?"[" and
     ?"[.data.frame" useful (as well as ?"[.data.table" if you use
that
     package).  also, the str() command can be helpful in
understanding
     what is happening.  (the lobstr:: package's sxp() function, as
well
     as more verbose .Internal(inspect()) can also give you insight.)

     with the data.table:: package, for example, if "DT" is a
data.table
     object, with "x2" as a column, adding or leaving off quotation
marks
     for the column name can make all the difference between ending up
     with a vector, or with a (much reduced) data table:
----
is.vector(DT[, x2])
[1] TRUE
str(DT[, x2])
  num [1:9] 32 32 32 32 32 32 32 32 32
is.vector(DT[, "x2"])
[1] FALSE
str(DT[, "x2"])
Classes ?data.table? and 'data.frame':  9 obs. of  1 variable:
  $ x2: num  32 32 32 32 32 32 32 32 32
  - attr(*, ".internal.selfref")=<externalptr>
----

     a second level of indexing may or may not help, mostly depending
on
     the use of '[' versus of '[['.  this can sometimes cause
confusion
     when you are learning the language.
----
str(DT[, "x2"][1])
Classes ?data.table? and 'data.frame':  1 obs. of  1 variable:
  $ x2: num 32
  - attr(*, ".internal.selfref")=<externalptr>
str(DT[, "x2"][[1]])
  num [1:9] 32 32 32 32 32 32 32 32 32
----

     the tibble:: package (used in, e.g., the dplyr:: package) also
     (always?) returns a single column as a non-vector.  again, a
     second indexing with double '[[]]' can produce a vector.
----
DP <- tibble(DT)
is.vector(DP[, "x2"])
[1] FALSE
is.vector(DP[, "x2"][[1]])
[1] TRUE
----

     but, note that a list of lists is also a vector:
is.vector(list(list(1), list(1,2,3)))
[1] TRUE
str(list(list(1), list(1,2,3)))
List of 2
  $ :List of 1
   ..$ : num 1
  $ :List of 3
   ..$ : num 1
   ..$ : num 2
   ..$ : num 3

     etc.

hth.  good luck learning!

cheers, Greg

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

OK, I stand somewhat chastised.

But my point still is that what you get when you "extract" depends on
how you define "extract." Do note that ?"[" yields a help file titled
"Extract or Replace Parts of an object"; and afaics, the term "subset"
is not explicitly used as Duncan prefers. The relevant part of the
Help file says for "[" for recursive objects says: "Indexing by [ is
similar to atomic vectors and selects a list of the specified
element(s)."  That a data.frame is a list is explicitly stated, as I
noted; that lists are in fact vectors is also explicitly stated (?list
says: "Almost all lists in R internally are Generic Vectors") but then
one is stuck with: a data.frame is a list and therefore a vector, but
is.vector(d3) is FALSE. The explanation is explicit again in
?is.vector ("is.vector returns TRUE if x is a vector of the specified
mode having no attributes other than names. It returns FALSE
otherwise."). But I would say these issues are sufficiently murky that
my warning to be precise is not entirely inappropriate; unfortunately,
I may have made them more so. Sigh....

Cheers,
Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On 09/07/2021 5:51 p.m., Jeff Newmiller wrote:
"Strictly speaking", Greg is correct, Bert.

https://cran.r-project.org/doc/manuals/r-release/R-lang.html#List-objects

Lists in R are vectors. What we colloquially refer to as "vectors" are more precisely referred to as "atomic vectors". And without a doubt, this "vector" nature of lists is a key underlying concept that explains why adding a dim attribute creates a matrix that can hold data frames. It is also a stumbling block for programmers from other languages that have things like linked lists.
I would also object to v3 (below) as "extracting" a column from d.
"d[2]" doesn't extract anything, it "subsets" the data frame, so the
result is a data frame, not what you get when you extract something from
a data frame.

People don't realize that "x <- 1:10; y <- x[[3]]" is perfectly legal.
That extracts the 3rd element (the number 3).  The problem is that R has
no way to represent a scalar number, only a vector of numbers, so x[[3]]
gets promoted to a vector containing that number when it is returned and
assigned to y.

Lists are vectors of R objects, so if x is a list, x[[3]] is something
that can be returned, and it is different from x[3].

Duncan Murdoch

On July 9, 2021 2:36:19 PM PDT, Bert Gunter <bgunter.4567 at gmail.com> wrote:
"1.  a column, when extracted from a data frame, *is* a vector."
Strictly speaking, this is false; it depends on exactly what is meant
by "extracted." e.g.:

d <- data.frame(col1 = 1:3, col2 = letters[1:3])
v1 <- d[,2] ## a vector
v2 <- d[[2]] ## the same, i.e
identical(v1,v2)
[1] TRUE
v3 <- d[2] ## a data.frame
v1
[1] "a" "b" "c"  ## a character vector
v3
  col2
1    a
2    b
3    c
is.vector(v1)
[1] TRUE
is.vector(v3)
[1] FALSE
class(v3)  ## data.frame
[1] "data.frame"
## but
is.list(v3)
[1] TRUE

which is simply explained in ?data.frame (where else?!) by:
"A data frame is a **list** [emphasis added] of variables of the same
number of rows with unique row names, given class "data.frame". If no
variables are included, the row names determine the number of rows."

"2.  maybe your question is "is a given function for a vector, or for a
    data frame/matrix/array?".  if so, i think the only way is reading
    the help information (?foo)."

Indeed! Is this not what the Help system is for?! But note also that
the S3 class system may somewhat blur the issue: foo() may work
appropriately and differently for different (S3) classes of objects. A
detailed explanation of this behavior can be found in appropriate
resources or (more tersely) via ?UseMethod .

"you might find reading ?"[" and  ?"[.data.frame" useful"

Not just 'useful" -- **essential** if you want to work in R, unless
one gets this information via any of the numerous online tutorials,
courses, or books that are available. The Help system is accurate and
authoritative, but terse. I happen to like this mode of documentation,
but others may prefer more extended expositions. I stand by this claim
even if one chooses to use the "Tidyverse", data.table package, or
other alternative frameworks for handling data. Again, others may
disagree, but R is structured around these basics, and imo one remains
ignorant of them at their peril.

Cheers,
Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Fri, Jul 9, 2021 at 11:57 AM Greg Minshall <minshall at umich.edu>
wrote:
Kai,

one more question, how can I know if the function is for column
manipulations or for vector?
i still stumble around R code.  but, i'd say the following (and look
forward to being corrected! :):

1.  a column, when extracted from a data frame, *is* a vector.

2.  maybe your question is "is a given function for a vector, or for
a
     data frame/matrix/array?".  if so, i think the only way is
reading
     the help information (?foo).

3.  sometimes, extracting the column as a vector from a data
frame-like
     object might be non-intuitive.  you might find reading ?"[" and
     ?"[.data.frame" useful (as well as ?"[.data.table" if you use
that
     package).  also, the str() command can be helpful in
understanding
     what is happening.  (the lobstr:: package's sxp() function, as
well
     as more verbose .Internal(inspect()) can also give you insight.)

     with the data.table:: package, for example, if "DT" is a
data.table
     object, with "x2" as a column, adding or leaving off quotation
marks
     for the column name can make all the difference between ending up
     with a vector, or with a (much reduced) data table:
----
is.vector(DT[, x2])
[1] TRUE
str(DT[, x2])
  num [1:9] 32 32 32 32 32 32 32 32 32
is.vector(DT[, "x2"])
[1] FALSE
str(DT[, "x2"])
Classes ?data.table? and 'data.frame':  9 obs. of  1 variable:
  $ x2: num  32 32 32 32 32 32 32 32 32
  - attr(*, ".internal.selfref")=<externalptr>
----

     a second level of indexing may or may not help, mostly depending
on
     the use of '[' versus of '[['.  this can sometimes cause
confusion
     when you are learning the language.
----
str(DT[, "x2"][1])
Classes ?data.table? and 'data.frame':  1 obs. of  1 variable:
  $ x2: num 32
  - attr(*, ".internal.selfref")=<externalptr>
str(DT[, "x2"][[1]])
  num [1:9] 32 32 32 32 32 32 32 32 32
----

     the tibble:: package (used in, e.g., the dplyr:: package) also
     (always?) returns a single column as a non-vector.  again, a
     second indexing with double '[[]]' can produce a vector.
----
DP <- tibble(DT)
is.vector(DP[, "x2"])
[1] FALSE
is.vector(DP[, "x2"][[1]])
[1] TRUE
----

     but, note that a list of lists is also a vector:
is.vector(list(list(1), list(1,2,3)))
[1] TRUE
str(list(list(1), list(1,2,3)))
List of 2
  $ :List of 1
   ..$ : num 1
  $ :List of 3
   ..$ : num 1
   ..$ : num 2
   ..$ : num 3

     etc.

hth.  good luck learning!

cheers, Greg

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Thanks Bert,
I'm reading some books now. But it takes me a while to get familiar R.

Best,

"Strictly speaking", Greg is correct, Bert.

https://cran.r-project.org/doc/manuals/r-release/R-lang.html#List-objects

Lists in R are vectors. What we colloquially refer to as "vectors" are more precisely referred to as "atomic vectors". And without a doubt, this "vector" nature of lists is a key underlying concept that explains why adding a dim attribute creates a matrix that can hold data frames. It is also a stumbling block for programmers from other languages that have things like linked lists.
I would also object to v3 (below) as "extracting" a column from d. 
"d[2]" doesn't extract anything, it "subsets" the data frame, so the 
result is a data frame, not what you get when you extract something from 
a data frame.

People don't realize that "x <- 1:10; y <- x[[3]]" is perfectly legal. 
That extracts the 3rd element (the number 3).? The problem is that R has 
no way to represent a scalar number, only a vector of numbers, so x[[3]] 
gets promoted to a vector containing that number when it is returned and 
assigned to y.

Lists are vectors of R objects, so if x is a list, x[[3]] is something 
that can be returned, and it is different from x[3].

Duncan Murdoch
On July 9, 2021 2:36:19 PM PDT, Bert Gunter <bgunter.4567 at gmail.com> wrote:
"1.? a column, when extracted from a data frame, *is* a vector."
Strictly speaking, this is false; it depends on exactly what is meant
by "extracted." e.g.:

d <- data.frame(col1 = 1:3, col2 = letters[1:3])
v1 <- d[,2] ## a vector
v2 <- d[[2]] ## the same, i.e
identical(v1,v2)
[1] TRUE
v3 <- d[2] ## a data.frame
v1
[1] "a" "b" "c"? ## a character vector
v3
? col2
1? ? a
2? ? b
3? ? c
is.vector(v1)
[1] TRUE
is.vector(v3)
[1] FALSE
class(v3)? ## data.frame
[1] "data.frame"
## but
is.list(v3)
[1] TRUE

which is simply explained in ?data.frame (where else?!) by:
"A data frame is a **list** [emphasis added] of variables of the same
number of rows with unique row names, given class "data.frame". If no
variables are included, the row names determine the number of rows."

"2.? maybe your question is "is a given function for a vector, or for a
? ? data frame/matrix/array?".? if so, i think the only way is reading
? ? the help information (?foo)."

Indeed! Is this not what the Help system is for?! But note also that
the S3 class system may somewhat blur the issue: foo() may work
appropriately and differently for different (S3) classes of objects. A
detailed explanation of this behavior can be found in appropriate
resources or (more tersely) via ?UseMethod .

"you might find reading ?"[" and? ?"[.data.frame" useful"

Not just 'useful" -- **essential** if you want to work in R, unless
one gets this information via any of the numerous online tutorials,
courses, or books that are available. The Help system is accurate and
authoritative, but terse. I happen to like this mode of documentation,
but others may prefer more extended expositions. I stand by this claim
even if one chooses to use the "Tidyverse", data.table package, or
other alternative frameworks for handling data. Again, others may
disagree, but R is structured around these basics, and imo one remains
ignorant of them at their peril.

Cheers,
Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Fri, Jul 9, 2021 at 11:57 AM Greg Minshall <minshall at umich.edu>
wrote:
Kai,

one more question, how can I know if the function is for column
manipulations or for vector?
i still stumble around R code.? but, i'd say the following (and look
forward to being corrected! :):

1.? a column, when extracted from a data frame, *is* a vector.

2.? maybe your question is "is a given function for a vector, or for
a
? ? ? data frame/matrix/array?".? if so, i think the only way is
reading
? ? ? the help information (?foo).

3.? sometimes, extracting the column as a vector from a data
frame-like
? ? ? object might be non-intuitive.? you might find reading ?"[" and
? ? ? ?"[.data.frame" useful (as well as ?"[.data.table" if you use
that
? ? ? package).? also, the str() command can be helpful in
understanding
? ? ? what is happening.? (the lobstr:: package's sxp() function, as
well
? ? ? as more verbose .Internal(inspect()) can also give you insight.)

? ? ? with the data.table:: package, for example, if "DT" is a
data.table
? ? ? object, with "x2" as a column, adding or leaving off quotation
marks
? ? ? for the column name can make all the difference between ending up
? ? ? with a vector, or with a (much reduced) data table:
----
is.vector(DT[, x2])
[1] TRUE
str(DT[, x2])
? num [1:9] 32 32 32 32 32 32 32 32 32
is.vector(DT[, "x2"])
[1] FALSE
str(DT[, "x2"])
Classes ?data.table? and 'data.frame':? 9 obs. of? 1 variable:
? $ x2: num? 32 32 32 32 32 32 32 32 32
? - attr(*, ".internal.selfref")=<externalptr>
----

? ? ? a second level of indexing may or may not help, mostly depending
on
? ? ? the use of '[' versus of '[['.? this can sometimes cause
confusion
? ? ? when you are learning the language.
----
str(DT[, "x2"][1])
Classes ?data.table? and 'data.frame':? 1 obs. of? 1 variable:
? $ x2: num 32
? - attr(*, ".internal.selfref")=<externalptr>
str(DT[, "x2"][[1]])
? num [1:9] 32 32 32 32 32 32 32 32 32
----

? ? ? the tibble:: package (used in, e.g., the dplyr:: package) also
? ? ? (always?) returns a single column as a non-vector.? again, a
? ? ? second indexing with double '[[]]' can produce a vector.
----
DP <- tibble(DT)
is.vector(DP[, "x2"])
[1] FALSE
is.vector(DP[, "x2"][[1]])
[1] TRUE
----

? ? ? but, note that a list of lists is also a vector:
is.vector(list(list(1), list(1,2,3)))
[1] TRUE
str(list(list(1), list(1,2,3)))
List of 2
? $ :List of 1
? ? ..$ : num 1
? $ :List of 3
? ? ..$ : num 1
? ? ..$ : num 2
? ? ..$ : num 3

? ? ? etc.

hth.? good luck learning!

cheers, Greg

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
"But it takes me a while to get familiar R."

Of course. That is true for all of us. Just keep on plugging away and
you'll get it. Probably far better than I before too long.

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
Thanks Bert,

I'm reading some books now. But it takes me a while to get familiar R.

Best,

Kai
On Friday, July 9, 2021, 03:06:11 PM PDT, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:

On 09/07/2021 5:51 p.m., Jeff Newmiller wrote:
"Strictly speaking", Greg is correct, Bert.

https://cran.r-project.org/doc/manuals/r-release/R-lang.html#List-objects

Lists in R are vectors. What we colloquially refer to as "vectors" are more precisely referred to as "atomic vectors". And without a doubt, this "vector" nature of lists is a key underlying concept that explains why adding a dim attribute creates a matrix that can hold data frames. It is also a stumbling block for programmers from other languages that have things like linked lists.
I would also object to v3 (below) as "extracting" a column from d.
"d[2]" doesn't extract anything, it "subsets" the data frame, so the
result is a data frame, not what you get when you extract something from
a data frame.

People don't realize that "x <- 1:10; y <- x[[3]]" is perfectly legal.
That extracts the 3rd element (the number 3).  The problem is that R has
no way to represent a scalar number, only a vector of numbers, so x[[3]]
gets promoted to a vector containing that number when it is returned and
assigned to y.

Lists are vectors of R objects, so if x is a list, x[[3]] is something
that can be returned, and it is different from x[3].

Duncan Murdoch

On July 9, 2021 2:36:19 PM PDT, Bert Gunter <bgunter.4567 at gmail.com> wrote:
"1.  a column, when extracted from a data frame, *is* a vector."
Strictly speaking, this is false; it depends on exactly what is meant
by "extracted." e.g.:

d <- data.frame(col1 = 1:3, col2 = letters[1:3])
v1 <- d[,2] ## a vector
v2 <- d[[2]] ## the same, i.e
identical(v1,v2)
[1] TRUE
v3 <- d[2] ## a data.frame
v1
[1] "a" "b" "c"  ## a character vector
v3
 col2
1    a
2    b
3    c
is.vector(v1)
[1] TRUE
is.vector(v3)
[1] FALSE
class(v3)  ## data.frame
[1] "data.frame"
## but
is.list(v3)
[1] TRUE

which is simply explained in ?data.frame (where else?!) by:
"A data frame is a **list** [emphasis added] of variables of the same
number of rows with unique row names, given class "data.frame". If no
variables are included, the row names determine the number of rows."

"2.  maybe your question is "is a given function for a vector, or for a
   data frame/matrix/array?".  if so, i think the only way is reading
   the help information (?foo)."

Indeed! Is this not what the Help system is for?! But note also that
the S3 class system may somewhat blur the issue: foo() may work
appropriately and differently for different (S3) classes of objects. A
detailed explanation of this behavior can be found in appropriate
resources or (more tersely) via ?UseMethod .

"you might find reading ?"[" and  ?"[.data.frame" useful"

Not just 'useful" -- **essential** if you want to work in R, unless
one gets this information via any of the numerous online tutorials,
courses, or books that are available. The Help system is accurate and
authoritative, but terse. I happen to like this mode of documentation,
but others may prefer more extended expositions. I stand by this claim
even if one chooses to use the "Tidyverse", data.table package, or
other alternative frameworks for handling data. Again, others may
disagree, but R is structured around these basics, and imo one remains
ignorant of them at their peril.

Cheers,
Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Fri, Jul 9, 2021 at 11:57 AM Greg Minshall <minshall at umich.edu>
wrote:
Kai,

one more question, how can I know if the function is for column
manipulations or for vector?
i still stumble around R code.  but, i'd say the following (and look
forward to being corrected! :):

1.  a column, when extracted from a data frame, *is* a vector.

2.  maybe your question is "is a given function for a vector, or for
a
     data frame/matrix/array?".  if so, i think the only way is
reading
     the help information (?foo).

3.  sometimes, extracting the column as a vector from a data
frame-like
     object might be non-intuitive.  you might find reading ?"[" and
     ?"[.data.frame" useful (as well as ?"[.data.table" if you use
that
     package).  also, the str() command can be helpful in
understanding
     what is happening.  (the lobstr:: package's sxp() function, as
well
     as more verbose .Internal(inspect()) can also give you insight.)

     with the data.table:: package, for example, if "DT" is a
data.table
     object, with "x2" as a column, adding or leaving off quotation
marks
     for the column name can make all the difference between ending up
     with a vector, or with a (much reduced) data table:
----
is.vector(DT[, x2])
[1] TRUE
str(DT[, x2])
 num [1:9] 32 32 32 32 32 32 32 32 32
is.vector(DT[, "x2"])
[1] FALSE
str(DT[, "x2"])
Classes ?data.table? and 'data.frame':  9 obs. of  1 variable:
 $ x2: num  32 32 32 32 32 32 32 32 32
 - attr(*, ".internal.selfref")=<externalptr>
----

     a second level of indexing may or may not help, mostly depending
on
     the use of '[' versus of '[['.  this can sometimes cause
confusion
     when you are learning the language.
----
str(DT[, "x2"][1])
Classes ?data.table? and 'data.frame':  1 obs. of  1 variable:
 $ x2: num 32
 - attr(*, ".internal.selfref")=<externalptr>
str(DT[, "x2"][[1]])
 num [1:9] 32 32 32 32 32 32 32 32 32
----

     the tibble:: package (used in, e.g., the dplyr:: package) also
     (always?) returns a single column as a non-vector.  again, a
     second indexing with double '[[]]' can produce a vector.
----
DP <- tibble(DT)
is.vector(DP[, "x2"])
[1] FALSE
is.vector(DP[, "x2"][[1]])
[1] TRUE
----

     but, note that a list of lists is also a vector:
is.vector(list(list(1), list(1,2,3)))
[1] TRUE
str(list(list(1), list(1,2,3)))
List of 2
 $ :List of 1
   ..$ : num 1
 $ :List of 3
   ..$ : num 1
   ..$ : num 2
   ..$ : num 3

     etc.

hth.  good luck learning!

cheers, Greg

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
OK, I stand somewhat chastised.

But my point still is that what you get when you "extract" depends on
how you define "extract." Do note that ?"[" yields a help file titled
"Extract or Replace Parts of an object"; and afaics, the term "subset"
is not explicitly used as Duncan prefers.
?"[[" gives you the same page, but I agree:  this part of the 
documentation isn't written very clearly. The "Introduction to R" manual 
uses the terms I used (see section 2.7, "Index vectors; selecting and 
modifying subsets of a data set"), as does the source code (and the R 
Language Definition manual, though it's not as clear as the Intro).

But the point isn't to chastise you, it's to educate you (and the OP). 
Thinking of [] as subsetting is more helpful than thinking of it as 
extraction.  That way the result of x[c(1,2)] makes sense.  It's a 
little bit more of a stretch, but the result of x[[c(1,2)]] also makes 
sense when you think of it as extraction.

Duncan Murdoch

  The relevant part of the
Help file says for "[" for recursive objects says: "Indexing by [ is
similar to atomic vectors and selects a list of the specified
element(s)."  That a data.frame is a list is explicitly stated, as I
noted; that lists are in fact vectors is also explicitly stated (?list
says: "Almost all lists in R internally are Generic Vectors") but then
one is stuck with: a data.frame is a list and therefore a vector, but
is.vector(d3) is FALSE. The explanation is explicit again in
?is.vector ("is.vector returns TRUE if x is a vector of the specified
mode having no attributes other than names. It returns FALSE
otherwise."). But I would say these issues are sufficiently murky that
my warning to be precise is not entirely inappropriate; unfortunately,
I may have made them more so. Sigh....

Cheers,
Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Fri, Jul 9, 2021 at 3:05 PM Duncan Murdoch <murdoch.duncan at gmail.com> wrote:
On 09/07/2021 5:51 p.m., Jeff Newmiller wrote:
"Strictly speaking", Greg is correct, Bert.

https://cran.r-project.org/doc/manuals/r-release/R-lang.html#List-objects

Lists in R are vectors. What we colloquially refer to as "vectors" are more precisely referred to as "atomic vectors". And without a doubt, this "vector" nature of lists is a key underlying concept that explains why adding a dim attribute creates a matrix that can hold data frames. It is also a stumbling block for programmers from other languages that have things like linked lists.
I would also object to v3 (below) as "extracting" a column from d.
"d[2]" doesn't extract anything, it "subsets" the data frame, so the
result is a data frame, not what you get when you extract something from
a data frame.

People don't realize that "x <- 1:10; y <- x[[3]]" is perfectly legal.
That extracts the 3rd element (the number 3).  The problem is that R has
no way to represent a scalar number, only a vector of numbers, so x[[3]]
gets promoted to a vector containing that number when it is returned and
assigned to y.

Lists are vectors of R objects, so if x is a list, x[[3]] is something
that can be returned, and it is different from x[3].

Duncan Murdoch

On July 9, 2021 2:36:19 PM PDT, Bert Gunter <bgunter.4567 at gmail.com> wrote:
"1.  a column, when extracted from a data frame, *is* a vector."
Strictly speaking, this is false; it depends on exactly what is meant
by "extracted." e.g.:

d <- data.frame(col1 = 1:3, col2 = letters[1:3])
v1 <- d[,2] ## a vector
v2 <- d[[2]] ## the same, i.e
identical(v1,v2)
[1] TRUE
v3 <- d[2] ## a data.frame
v1
[1] "a" "b" "c"  ## a character vector
v3
   col2
1    a
2    b
3    c
is.vector(v1)
[1] TRUE
is.vector(v3)
[1] FALSE
class(v3)  ## data.frame
[1] "data.frame"
## but
is.list(v3)
[1] TRUE

which is simply explained in ?data.frame (where else?!) by:
"A data frame is a **list** [emphasis added] of variables of the same
number of rows with unique row names, given class "data.frame". If no
variables are included, the row names determine the number of rows."

"2.  maybe your question is "is a given function for a vector, or for a
     data frame/matrix/array?".  if so, i think the only way is reading
     the help information (?foo)."

Indeed! Is this not what the Help system is for?! But note also that
the S3 class system may somewhat blur the issue: foo() may work
appropriately and differently for different (S3) classes of objects. A
detailed explanation of this behavior can be found in appropriate
resources or (more tersely) via ?UseMethod .

"you might find reading ?"[" and  ?"[.data.frame" useful"

Not just 'useful" -- **essential** if you want to work in R, unless
one gets this information via any of the numerous online tutorials,
courses, or books that are available. The Help system is accurate and
authoritative, but terse. I happen to like this mode of documentation,
but others may prefer more extended expositions. I stand by this claim
even if one chooses to use the "Tidyverse", data.table package, or
other alternative frameworks for handling data. Again, others may
disagree, but R is structured around these basics, and imo one remains
ignorant of them at their peril.

Cheers,
Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Fri, Jul 9, 2021 at 11:57 AM Greg Minshall <minshall at umich.edu>
wrote:
Kai,

one more question, how can I know if the function is for column
manipulations or for vector?
i still stumble around R code.  but, i'd say the following (and look
forward to being corrected! :):

1.  a column, when extracted from a data frame, *is* a vector.

2.  maybe your question is "is a given function for a vector, or for
a
      data frame/matrix/array?".  if so, i think the only way is
reading
      the help information (?foo).

3.  sometimes, extracting the column as a vector from a data
frame-like
      object might be non-intuitive.  you might find reading ?"[" and
      ?"[.data.frame" useful (as well as ?"[.data.table" if you use
that
      package).  also, the str() command can be helpful in
understanding
      what is happening.  (the lobstr:: package's sxp() function, as
well
      as more verbose .Internal(inspect()) can also give you insight.)

      with the data.table:: package, for example, if "DT" is a
data.table
      object, with "x2" as a column, adding or leaving off quotation
marks
      for the column name can make all the difference between ending up
      with a vector, or with a (much reduced) data table:
----
is.vector(DT[, x2])
[1] TRUE
str(DT[, x2])
   num [1:9] 32 32 32 32 32 32 32 32 32
is.vector(DT[, "x2"])
[1] FALSE
str(DT[, "x2"])
Classes ?data.table? and 'data.frame':  9 obs. of  1 variable:
   $ x2: num  32 32 32 32 32 32 32 32 32
   - attr(*, ".internal.selfref")=<externalptr>
----

      a second level of indexing may or may not help, mostly depending
on
      the use of '[' versus of '[['.  this can sometimes cause
confusion
      when you are learning the language.
----
str(DT[, "x2"][1])
Classes ?data.table? and 'data.frame':  1 obs. of  1 variable:
   $ x2: num 32
   - attr(*, ".internal.selfref")=<externalptr>
str(DT[, "x2"][[1]])
   num [1:9] 32 32 32 32 32 32 32 32 32
----

      the tibble:: package (used in, e.g., the dplyr:: package) also
      (always?) returns a single column as a non-vector.  again, a
      second indexing with double '[[]]' can produce a vector.
----
DP <- tibble(DT)
is.vector(DP[, "x2"])
[1] FALSE
is.vector(DP[, "x2"][[1]])
[1] TRUE
----

      but, note that a list of lists is also a vector:
is.vector(list(list(1), list(1,2,3)))
[1] TRUE
str(list(list(1), list(1,2,3)))
List of 2
   $ :List of 1
    ..$ : num 1
   $ :List of 3
    ..$ : num 1
    ..$ : num 2
    ..$ : num 3

      etc.

hth.  good luck learning!

cheers, Greg

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

My mental model for the `[` vs `[[` behavior is that `[` indexes multiple results while `[[` indexes only one item. If returning multiple items from a list the result must be a list. For consistency, `[` always returns a list when applied to a list. The double bracket drops the containing list.

The is.vector() behavior is not intuitive to me... I avoid that function, as I think it is more useful to think of lists as vectors than as something "other".
OK, I stand somewhat chastised.

But my point still is that what you get when you "extract" depends on
how you define "extract." Do note that ?"[" yields a help file titled
"Extract or Replace Parts of an object"; and afaics, the term "subset"
is not explicitly used as Duncan prefers. The relevant part of the
Help file says for "[" for recursive objects says: "Indexing by [ is
similar to atomic vectors and selects a list of the specified
element(s)."  That a data.frame is a list is explicitly stated, as I
noted; that lists are in fact vectors is also explicitly stated (?list
says: "Almost all lists in R internally are Generic Vectors") but then
one is stuck with: a data.frame is a list and therefore a vector, but
is.vector(d3) is FALSE. The explanation is explicit again in
?is.vector ("is.vector returns TRUE if x is a vector of the specified
mode having no attributes other than names. It returns FALSE
otherwise."). But I would say these issues are sufficiently murky that
my warning to be precise is not entirely inappropriate; unfortunately,
I may have made them more so. Sigh....

Cheers,
Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Fri, Jul 9, 2021 at 3:05 PM Duncan Murdoch
<murdoch.duncan at gmail.com> wrote:
On 09/07/2021 5:51 p.m., Jeff Newmiller wrote:
"Strictly speaking", Greg is correct, Bert.

https://cran.r-project.org/doc/manuals/r-release/R-lang.html#List-objects
Lists in R are vectors. What we colloquially refer to as "vectors"
are more precisely referred to as "atomic vectors". And without a
doubt, this "vector" nature of lists is a key underlying concept that
explains why adding a dim attribute creates a matrix that can hold data
frames. It is also a stumbling block for programmers from other
languages that have things like linked lists.
I would also object to v3 (below) as "extracting" a column from d.
"d[2]" doesn't extract anything, it "subsets" the data frame, so the
result is a data frame, not what you get when you extract something
from
a data frame.

People don't realize that "x <- 1:10; y <- x[[3]]" is perfectly
legal.
That extracts the 3rd element (the number 3).  The problem is that R
has
no way to represent a scalar number, only a vector of numbers, so
x[[3]]
gets promoted to a vector containing that number when it is returned
and
assigned to y.

Lists are vectors of R objects, so if x is a list, x[[3]] is
something
that can be returned, and it is different from x[3].

Duncan Murdoch

On July 9, 2021 2:36:19 PM PDT, Bert Gunter
<bgunter.4567 at gmail.com> wrote:
"1.  a column, when extracted from a data frame, *is* a vector."
Strictly speaking, this is false; it depends on exactly what is
meant
by "extracted." e.g.:

d <- data.frame(col1 = 1:3, col2 = letters[1:3])
v1 <- d[,2] ## a vector
v2 <- d[[2]] ## the same, i.e
identical(v1,v2)
[1] TRUE
v3 <- d[2] ## a data.frame
v1
[1] "a" "b" "c"  ## a character vector
v3
  col2
1    a
2    b
3    c
is.vector(v1)
[1] TRUE
is.vector(v3)
[1] FALSE
class(v3)  ## data.frame
[1] "data.frame"
## but
is.list(v3)
[1] TRUE

which is simply explained in ?data.frame (where else?!) by:
"A data frame is a **list** [emphasis added] of variables of the
same
number of rows with unique row names, given class "data.frame". If
no
variables are included, the row names determine the number of
rows."
"2.  maybe your question is "is a given function for a vector, or
for a
    data frame/matrix/array?".  if so, i think the only way is
reading
    the help information (?foo)."

Indeed! Is this not what the Help system is for?! But note also
that
the S3 class system may somewhat blur the issue: foo() may work
appropriately and differently for different (S3) classes of
objects. A
detailed explanation of this behavior can be found in appropriate
resources or (more tersely) via ?UseMethod .

"you might find reading ?"[" and  ?"[.data.frame" useful"

Not just 'useful" -- **essential** if you want to work in R,
unless
one gets this information via any of the numerous online
tutorials,
courses, or books that are available. The Help system is accurate
and
authoritative, but terse. I happen to like this mode of
documentation,
but others may prefer more extended expositions. I stand by this
claim
even if one chooses to use the "Tidyverse", data.table package, or
other alternative frameworks for handling data. Again, others may
disagree, but R is structured around these basics, and imo one
remains
ignorant of them at their peril.

Cheers,
Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming
along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Fri, Jul 9, 2021 at 11:57 AM Greg Minshall <minshall at umich.edu>
wrote:
Kai,

one more question, how can I know if the function is for column
manipulations or for vector?
i still stumble around R code.  but, i'd say the following (and
look
forward to being corrected! :):

1.  a column, when extracted from a data frame, *is* a vector.

2.  maybe your question is "is a given function for a vector, or
for
a
     data frame/matrix/array?".  if so, i think the only way is
reading
     the help information (?foo).

3.  sometimes, extracting the column as a vector from a data
frame-like
     object might be non-intuitive.  you might find reading ?"["
and
     ?"[.data.frame" useful (as well as ?"[.data.table" if you
use
that
     package).  also, the str() command can be helpful in
understanding
     what is happening.  (the lobstr:: package's sxp() function,
as
well
     as more verbose .Internal(inspect()) can also give you
insight.)
     with the data.table:: package, for example, if "DT" is a
data.table
     object, with "x2" as a column, adding or leaving off
quotation
marks
     for the column name can make all the difference between
ending up
     with a vector, or with a (much reduced) data table:
----
is.vector(DT[, x2])
[1] TRUE
str(DT[, x2])
  num [1:9] 32 32 32 32 32 32 32 32 32
is.vector(DT[, "x2"])
[1] FALSE
str(DT[, "x2"])
Classes ?data.table? and 'data.frame':  9 obs. of  1 variable:
  $ x2: num  32 32 32 32 32 32 32 32 32
  - attr(*, ".internal.selfref")=<externalptr>
----

     a second level of indexing may or may not help, mostly
depending
on
     the use of '[' versus of '[['.  this can sometimes cause
confusion
     when you are learning the language.
----
str(DT[, "x2"][1])
Classes ?data.table? and 'data.frame':  1 obs. of  1 variable:
  $ x2: num 32
  - attr(*, ".internal.selfref")=<externalptr>
str(DT[, "x2"][[1]])
  num [1:9] 32 32 32 32 32 32 32 32 32
----

     the tibble:: package (used in, e.g., the dplyr:: package)
also
     (always?) returns a single column as a non-vector.  again, a
     second indexing with double '[[]]' can produce a vector.
----
DP <- tibble(DT)
is.vector(DP[, "x2"])
[1] FALSE
is.vector(DP[, "x2"][[1]])
[1] TRUE
----

     but, note that a list of lists is also a vector:
is.vector(list(list(1), list(1,2,3)))
[1] TRUE
str(list(list(1), list(1,2,3)))
List of 2
  $ :List of 1
   ..$ : num 1
  $ :List of 3
   ..$ : num 1
   ..$ : num 2
   ..$ : num 3

     etc.

hth.  good luck learning!

cheers, Greg

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible
code.

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Sent from my phone. Please excuse my brevity.
A bit too fast there, Duncan... x[[c(1,2)]] is illegal.
On 09/07/2021 6:44 p.m., Bert Gunter wrote:
OK, I stand somewhat chastised.

But my point still is that what you get when you "extract" depends on
how you define "extract." Do note that ?"[" yields a help file titled
"Extract or Replace Parts of an object"; and afaics, the term
"subset"
is not explicitly used as Duncan prefers.
?"[[" gives you the same page, but I agree:  this part of the 
documentation isn't written very clearly. The "Introduction to R"
manual 
uses the terms I used (see section 2.7, "Index vectors; selecting and 
modifying subsets of a data set"), as does the source code (and the R 
Language Definition manual, though it's not as clear as the Intro).

But the point isn't to chastise you, it's to educate you (and the OP). 
Thinking of [] as subsetting is more helpful than thinking of it as 
extraction.  That way the result of x[c(1,2)] makes sense.  It's a 
little bit more of a stretch, but the result of x[[c(1,2)]] also makes 
sense when you think of it as extraction.

Duncan Murdoch

 The relevant part of the
Help file says for "[" for recursive objects says: "Indexing by [ is
similar to atomic vectors and selects a list of the specified
element(s)."  That a data.frame is a list is explicitly stated, as I
noted; that lists are in fact vectors is also explicitly stated
(?list
says: "Almost all lists in R internally are Generic Vectors") but
then
one is stuck with: a data.frame is a list and therefore a vector, but
is.vector(d3) is FALSE. The explanation is explicit again in
?is.vector ("is.vector returns TRUE if x is a vector of the specified
mode having no attributes other than names. It returns FALSE
otherwise."). But I would say these issues are sufficiently murky
that
my warning to be precise is not entirely inappropriate;
unfortunately,
I may have made them more so. Sigh....

Cheers,
Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming
along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Fri, Jul 9, 2021 at 3:05 PM Duncan Murdoch
<murdoch.duncan at gmail.com> wrote:
On 09/07/2021 5:51 p.m., Jeff Newmiller wrote:
"Strictly speaking", Greg is correct, Bert.

https://cran.r-project.org/doc/manuals/r-release/R-lang.html#List-objects
Lists in R are vectors. What we colloquially refer to as "vectors"
are more precisely referred to as "atomic vectors". And without a
doubt, this "vector" nature of lists is a key underlying concept that
explains why adding a dim attribute creates a matrix that can hold data
frames. It is also a stumbling block for programmers from other
languages that have things like linked lists.
I would also object to v3 (below) as "extracting" a column from d.
"d[2]" doesn't extract anything, it "subsets" the data frame, so the
result is a data frame, not what you get when you extract something
from
a data frame.

People don't realize that "x <- 1:10; y <- x[[3]]" is perfectly
legal.
That extracts the 3rd element (the number 3).  The problem is that R
has
no way to represent a scalar number, only a vector of numbers, so
x[[3]]
gets promoted to a vector containing that number when it is returned
and
assigned to y.

Lists are vectors of R objects, so if x is a list, x[[3]] is
something
that can be returned, and it is different from x[3].

Duncan Murdoch

On July 9, 2021 2:36:19 PM PDT, Bert Gunter
<bgunter.4567 at gmail.com> wrote:
"1.  a column, when extracted from a data frame, *is* a vector."
Strictly speaking, this is false; it depends on exactly what is
meant
by "extracted." e.g.:

d <- data.frame(col1 = 1:3, col2 = letters[1:3])
v1 <- d[,2] ## a vector
v2 <- d[[2]] ## the same, i.e
identical(v1,v2)
[1] TRUE
v3 <- d[2] ## a data.frame
v1
[1] "a" "b" "c"  ## a character vector
v3
   col2
1    a
2    b
3    c
is.vector(v1)
[1] TRUE
is.vector(v3)
[1] FALSE
class(v3)  ## data.frame
[1] "data.frame"
## but
is.list(v3)
[1] TRUE

which is simply explained in ?data.frame (where else?!) by:
"A data frame is a **list** [emphasis added] of variables of the
same
number of rows with unique row names, given class "data.frame". If
no
variables are included, the row names determine the number of
rows."
"2.  maybe your question is "is a given function for a vector, or
for a
     data frame/matrix/array?".  if so, i think the only way is
reading
     the help information (?foo)."

Indeed! Is this not what the Help system is for?! But note also
that
the S3 class system may somewhat blur the issue: foo() may work
appropriately and differently for different (S3) classes of
objects. A
detailed explanation of this behavior can be found in appropriate
resources or (more tersely) via ?UseMethod .

"you might find reading ?"[" and  ?"[.data.frame" useful"

Not just 'useful" -- **essential** if you want to work in R,
unless
one gets this information via any of the numerous online
tutorials,
courses, or books that are available. The Help system is accurate
and
authoritative, but terse. I happen to like this mode of
documentation,
but others may prefer more extended expositions. I stand by this
claim
even if one chooses to use the "Tidyverse", data.table package, or
other alternative frameworks for handling data. Again, others may
disagree, but R is structured around these basics, and imo one
remains
ignorant of them at their peril.

Cheers,
Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming
along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Fri, Jul 9, 2021 at 11:57 AM Greg Minshall <minshall at umich.edu>
wrote:
Kai,

one more question, how can I know if the function is for column
manipulations or for vector?
i still stumble around R code.  but, i'd say the following (and
look
forward to being corrected! :):

1.  a column, when extracted from a data frame, *is* a vector.

2.  maybe your question is "is a given function for a vector, or
for
a
      data frame/matrix/array?".  if so, i think the only way is
reading
      the help information (?foo).

3.  sometimes, extracting the column as a vector from a data
frame-like
      object might be non-intuitive.  you might find reading ?"["
and
      ?"[.data.frame" useful (as well as ?"[.data.table" if you
use
that
      package).  also, the str() command can be helpful in
understanding
      what is happening.  (the lobstr:: package's sxp() function,
as
well
      as more verbose .Internal(inspect()) can also give you
insight.)
      with the data.table:: package, for example, if "DT" is a
data.table
      object, with "x2" as a column, adding or leaving off
quotation
marks
      for the column name can make all the difference between
ending up
      with a vector, or with a (much reduced) data table:
----
is.vector(DT[, x2])
[1] TRUE
str(DT[, x2])
   num [1:9] 32 32 32 32 32 32 32 32 32
is.vector(DT[, "x2"])
[1] FALSE
str(DT[, "x2"])
Classes ?data.table? and 'data.frame':  9 obs. of  1 variable:
   $ x2: num  32 32 32 32 32 32 32 32 32
   - attr(*, ".internal.selfref")=<externalptr>
----

      a second level of indexing may or may not help, mostly
depending
on
      the use of '[' versus of '[['.  this can sometimes cause
confusion
      when you are learning the language.
----
str(DT[, "x2"][1])
Classes ?data.table? and 'data.frame':  1 obs. of  1 variable:
   $ x2: num 32
   - attr(*, ".internal.selfref")=<externalptr>
str(DT[, "x2"][[1]])
   num [1:9] 32 32 32 32 32 32 32 32 32
----

      the tibble:: package (used in, e.g., the dplyr:: package)
also
      (always?) returns a single column as a non-vector.  again,
a
      second indexing with double '[[]]' can produce a vector.
----
DP <- tibble(DT)
is.vector(DP[, "x2"])
[1] FALSE
is.vector(DP[, "x2"][[1]])
[1] TRUE
----

      but, note that a list of lists is also a vector:
is.vector(list(list(1), list(1,2,3)))
[1] TRUE
str(list(list(1), list(1,2,3)))
List of 2
   $ :List of 1
    ..$ : num 1
   $ :List of 3
    ..$ : num 1
    ..$ : num 2
    ..$ : num 3

      etc.

hth.  good luck learning!

cheers, Greg

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible
code.

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Sent from my phone. Please excuse my brevity.
This discussion has developed in such a way that it seems a better
subject line would be "problem for the hairsplit function". :-)

cheers,

Rolf Turner
Honorary Research Fellow
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276
My method would be to use parse and deparse and substitute.  It would iterate over each file name and build a new list of file names with the last four characters removed to have only the left side, and only the last four remaining to have only the right side.  Then a new dataframe would be created of the partial file names.   

Deparse and substitute to get the file names into a string, then use character removal on the sides, put the file name into a new vector, and then create the relevant data frame if desired.

This allows one to Rely on their software development metaphor.  It might lack a certain finess, but the metaphor is either a loom or a boxing match against a CSV so it?s fun. :)

Sent from my iPhone
On Jul 9, 2021, at 10:33 PM, Rolf Turner <r.turner at auckland.ac.nz> wrote:

?
This discussion has developed in such a way that it seems a better
subject line would be "problem for the hairsplit function". :-)

cheers,

Rolf Turner

-- 
Honorary Research Fellow
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.