Skip to content
Back to formatted view

Raw Message

Message-ID: <683060686.1903725.1625870735640@mail.yahoo.com>
Date: 2021-07-09T22:45:35Z
From: Kai Yang
Subject: problem for strsplit function
In-Reply-To: <5f6bcd58-23c1-f6ce-7e36-9abd41fd72b8@gmail.com>

Thanks Bert,
I'm reading some books now. But it takes me a while to get familiar R.

Best,
Kai    On Friday, July 9, 2021, 03:06:11 PM PDT, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:  
 
 On 09/07/2021 5:51 p.m., Jeff Newmiller wrote:
> "Strictly speaking", Greg is correct, Bert.
> 
> https://cran.r-project.org/doc/manuals/r-release/R-lang.html#List-objects
> 
> Lists in R are vectors. What we colloquially refer to as "vectors" are more precisely referred to as "atomic vectors". And without a doubt, this "vector" nature of lists is a key underlying concept that explains why adding a dim attribute creates a matrix that can hold data frames. It is also a stumbling block for programmers from other languages that have things like linked lists.

I would also object to v3 (below) as "extracting" a column from d. 
"d[2]" doesn't extract anything, it "subsets" the data frame, so the 
result is a data frame, not what you get when you extract something from 
a data frame.

People don't realize that "x <- 1:10; y <- x[[3]]" is perfectly legal. 
That extracts the 3rd element (the number 3).? The problem is that R has 
no way to represent a scalar number, only a vector of numbers, so x[[3]] 
gets promoted to a vector containing that number when it is returned and 
assigned to y.

Lists are vectors of R objects, so if x is a list, x[[3]] is something 
that can be returned, and it is different from x[3].

Duncan Murdoch

> 
> On July 9, 2021 2:36:19 PM PDT, Bert Gunter <bgunter.4567 at gmail.com> wrote:
>> "1.? a column, when extracted from a data frame, *is* a vector."
>> Strictly speaking, this is false; it depends on exactly what is meant
>> by "extracted." e.g.:
>>
>>> d <- data.frame(col1 = 1:3, col2 = letters[1:3])
>>> v1 <- d[,2] ## a vector
>>> v2 <- d[[2]] ## the same, i.e
>>> identical(v1,v2)
>> [1] TRUE
>>> v3 <- d[2] ## a data.frame
>>> v1
>> [1] "a" "b" "c"? ## a character vector
>>> v3
>>? col2
>> 1? ? a
>> 2? ? b
>> 3? ? c
>>> is.vector(v1)
>> [1] TRUE
>>> is.vector(v3)
>> [1] FALSE
>>> class(v3)? ## data.frame
>> [1] "data.frame"
>> ## but
>>> is.list(v3)
>> [1] TRUE
>>
>> which is simply explained in ?data.frame (where else?!) by:
>> "A data frame is a **list** [emphasis added] of variables of the same
>> number of rows with unique row names, given class "data.frame". If no
>> variables are included, the row names determine the number of rows."
>>
>> "2.? maybe your question is "is a given function for a vector, or for a
>>? ? data frame/matrix/array?".? if so, i think the only way is reading
>>? ? the help information (?foo)."
>>
>> Indeed! Is this not what the Help system is for?! But note also that
>> the S3 class system may somewhat blur the issue: foo() may work
>> appropriately and differently for different (S3) classes of objects. A
>> detailed explanation of this behavior can be found in appropriate
>> resources or (more tersely) via ?UseMethod .
>>
>> "you might find reading ?"[" and? ?"[.data.frame" useful"
>>
>> Not just 'useful" -- **essential** if you want to work in R, unless
>> one gets this information via any of the numerous online tutorials,
>> courses, or books that are available. The Help system is accurate and
>> authoritative, but terse. I happen to like this mode of documentation,
>> but others may prefer more extended expositions. I stand by this claim
>> even if one chooses to use the "Tidyverse", data.table package, or
>> other alternative frameworks for handling data. Again, others may
>> disagree, but R is structured around these basics, and imo one remains
>> ignorant of them at their peril.
>>
>> Cheers,
>> Bert
>>
>>
>> Bert Gunter
>>
>> "The trouble with having an open mind is that people keep coming along
>> and sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>
>> On Fri, Jul 9, 2021 at 11:57 AM Greg Minshall <minshall at umich.edu>
>> wrote:
>>>
>>> Kai,
>>>
>>>> one more question, how can I know if the function is for column
>>>> manipulations or for vector?
>>>
>>> i still stumble around R code.? but, i'd say the following (and look
>>> forward to being corrected! :):
>>>
>>> 1.? a column, when extracted from a data frame, *is* a vector.
>>>
>>> 2.? maybe your question is "is a given function for a vector, or for
>> a
>>>? ? ? data frame/matrix/array?".? if so, i think the only way is
>> reading
>>>? ? ? the help information (?foo).
>>>
>>> 3.? sometimes, extracting the column as a vector from a data
>> frame-like
>>>? ? ? object might be non-intuitive.? you might find reading ?"[" and
>>>? ? ? ?"[.data.frame" useful (as well as ?"[.data.table" if you use
>> that
>>>? ? ? package).? also, the str() command can be helpful in
>> understanding
>>>? ? ? what is happening.? (the lobstr:: package's sxp() function, as
>> well
>>>? ? ? as more verbose .Internal(inspect()) can also give you insight.)
>>>
>>>? ? ? with the data.table:: package, for example, if "DT" is a
>> data.table
>>>? ? ? object, with "x2" as a column, adding or leaving off quotation
>> marks
>>>? ? ? for the column name can make all the difference between ending up
>>>? ? ? with a vector, or with a (much reduced) data table:
>>> ----
>>>> is.vector(DT[, x2])
>>> [1] TRUE
>>>> str(DT[, x2])
>>>? num [1:9] 32 32 32 32 32 32 32 32 32
>>>>
>>>> is.vector(DT[, "x2"])
>>> [1] FALSE
>>>> str(DT[, "x2"])
>>> Classes ?data.table? and 'data.frame':? 9 obs. of? 1 variable:
>>>? $ x2: num? 32 32 32 32 32 32 32 32 32
>>>? - attr(*, ".internal.selfref")=<externalptr>
>>> ----
>>>
>>>? ? ? a second level of indexing may or may not help, mostly depending
>> on
>>>? ? ? the use of '[' versus of '[['.? this can sometimes cause
>> confusion
>>>? ? ? when you are learning the language.
>>> ----
>>>> str(DT[, "x2"][1])
>>> Classes ?data.table? and 'data.frame':? 1 obs. of? 1 variable:
>>>? $ x2: num 32
>>>? - attr(*, ".internal.selfref")=<externalptr>
>>>> str(DT[, "x2"][[1]])
>>>? num [1:9] 32 32 32 32 32 32 32 32 32
>>> ----
>>>
>>>? ? ? the tibble:: package (used in, e.g., the dplyr:: package) also
>>>? ? ? (always?) returns a single column as a non-vector.? again, a
>>>? ? ? second indexing with double '[[]]' can produce a vector.
>>> ----
>>>> DP <- tibble(DT)
>>>> is.vector(DP[, "x2"])
>>> [1] FALSE
>>>> is.vector(DP[, "x2"][[1]])
>>> [1] TRUE
>>> ----
>>>
>>>? ? ? but, note that a list of lists is also a vector:
>>>> is.vector(list(list(1), list(1,2,3)))
>>> [1] TRUE
>>>> str(list(list(1), list(1,2,3)))
>>> List of 2
>>>? $ :List of 1
>>>? ? ..$ : num 1
>>>? $ :List of 3
>>>? ? ..$ : num 1
>>>? ? ..$ : num 2
>>>? ? ..$ : num 3
>>>
>>>? ? ? etc.
>>>
>>> hth.? good luck learning!
>>>
>>> cheers, Greg
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
  
	[[alternative HTML version deleted]]