Hi R, I am a little confused by the data.table package. library(data.table) df <- data.frame(w=rnorm(20, -10, 1), x= rnorm(20, 0, 1), y=rnorm(20, 10, 1), z=rnorm(20, 20, 1)) df <- data.table(df) #drop column w df_1 <- df[, w := NULL] # I thought you are supposed to do: df_1 <- df[, -w] df_2 <- df[x<y] # aren't you supposed to do df_2 <- df[x<y]? df_3 <- df[, a := x-y] # created new column a using x minus y, why are we using colon equals? I am a bit confused by this syntax. Thanks!
Confused about using data.table package,
7 messages · Hadley Wickham, David Winsemius, C W +2 more
On Feb 19, 2017, at 11:37 AM, C W <tmrsg11 at gmail.com> wrote: Hi R, I am a little confused by the data.table package. library(data.table) df <- data.frame(w=rnorm(20, -10, 1), x= rnorm(20, 0, 1), y=rnorm(20, 10, 1), z=rnorm(20, 20, 1)) df <- data.table(df)
df <- setDT(df) is preferred.
#drop column w df_1 <- df[, w := NULL] # I thought you are supposed to do: df_1 <- df[, -w]
Nope. The "[.data.table" function is very different from the "[.data.frame' function. As you should be able to see, an expression in the `j` position for "[.data.table" gets evaluated in the environment of the data.table object, so unquoted column names get returned after application of any function. Here it's just a unary minus. Actually "nope" on two accounts. You cannot use a unary minus for column names in `[.data.frame` either. Would have needed to be df[ , !colnames(df) in "w"] # logical indexing
df_2 <- df[x<y] # aren't you supposed to do df_2 <- df[x<y]?
I don't see a difference.
df_3 <- df[, a := x-y] # created new column a using x minus y, why are we using colon equals?
You need to do more study of the extensive documentation. The behavior of the ":=" function is discussed in detail there.
I am a bit confused by this syntax.
It's non-standard for R but many people find the efficiencies of the package worth the extra effort to learn what is essentially a different evaluation strategy.
Thanks! [[alternative HTML version deleted]]
Rhelp is a plain text mailing list,
David > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA
On Sun, Feb 19, 2017 at 3:01 PM, David Winsemius <dwinsemius at comcast.net> wrote:
On Feb 19, 2017, at 11:37 AM, C W <tmrsg11 at gmail.com> wrote: Hi R, I am a little confused by the data.table package. library(data.table) df <- data.frame(w=rnorm(20, -10, 1), x= rnorm(20, 0, 1), y=rnorm(20, 10, 1), z=rnorm(20, 20, 1)) df <- data.table(df)
df <- setDT(df) is preferred.
Don't you mean just setDT(df) ? setDT() modifies by reference.
df_3 <- df[, a := x-y] # created new column a using x minus y, why are we using colon equals?
You need to do more study of the extensive documentation. The behavior of the ":=" function is discussed in detail there.
You can get to that documentation with ?":=" Hadley
On Feb 20, 2017, at 8:12 AM, Hadley Wickham <h.wickham at gmail.com> wrote: On Sun, Feb 19, 2017 at 3:01 PM, David Winsemius <dwinsemius at comcast.net> wrote:
On Feb 19, 2017, at 11:37 AM, C W <tmrsg11 at gmail.com> wrote: Hi R, I am a little confused by the data.table package. library(data.table) df <- data.frame(w=rnorm(20, -10, 1), x= rnorm(20, 0, 1), y=rnorm(20, 10, 1), z=rnorm(20, 20, 1)) df <- data.table(df)
df <- setDT(df) is preferred.
Don't you mean just setDT(df) ? setDT() modifies by reference.
Thanks for the correction.
df_3 <- df[, a := x-y] # created new column a using x minus y, why are we using colon equals?
You need to do more study of the extensive documentation. The behavior of the ":=" function is discussed in detail there.
You can get to that documentation with ?":="
That's a good place to start reading, but I was thinking of data.table::datatable-faq, data.table::datatable-intro which are on the Vignettes page from: help(pac=data.table).
Hadley -- http://hadley.nz
David Winsemius Alameda, CA, USA
Thanks Hadley! While I got your attention, what is a good way to get started on ggplot2? ;) My impression is that I first need to learn plyr, dplyr, AND THEN ggplot2. That's A LOT! Suppose i have this: iris iris2 <- cbind(iris, grade = sample(1:5, 150, replace = TRUE)) iris2 I want to have some kind of graph conditioned on species, by grade . What's a good lead to learn about plotting this? Thank you! On Mon, Feb 20, 2017 at 11:12 AM, Hadley Wickham <h.wickham at gmail.com> wrote:
On Sun, Feb 19, 2017 at 3:01 PM, David Winsemius <dwinsemius at comcast.net> wrote:
On Feb 19, 2017, at 11:37 AM, C W <tmrsg11 at gmail.com> wrote: Hi R, I am a little confused by the data.table package. library(data.table) df <- data.frame(w=rnorm(20, -10, 1), x= rnorm(20, 0, 1), y=rnorm(20,
10, 1),
z=rnorm(20, 20, 1)) df <- data.table(df)
df <- setDT(df) is preferred.
Don't you mean just setDT(df) ? setDT() modifies by reference.
df_3 <- df[, a := x-y] # created new column a using x minus y, why are
we
using colon equals?
You need to do more study of the extensive documentation. The behavior
of the ":=" function is discussed in detail there. You can get to that documentation with ?":=" Hadley -- http://hadley.nz
I suspect Hadley would recommend reading his new book, R for Data Science (r4ds.had.co.nz), in particular Chapter 3. You don't need plyr, but it won't take long before you will want to be using dplyr and tidyr, which are covered in later chapters.
Sent from my phone. Please excuse my brevity. On February 20, 2017 6:53:29 PM PST, C W <tmrsg11 at gmail.com> wrote: >Thanks Hadley! > >While I got your attention, what is a good way to get started on >ggplot2? ;) > >My impression is that I first need to learn plyr, dplyr, AND THEN >ggplot2. >That's A LOT! > >Suppose i have this: >iris >iris2 <- cbind(iris, grade = sample(1:5, 150, replace = TRUE)) >iris2 > >I want to have some kind of graph conditioned on species, by grade . >What's >a good lead to learn about plotting this? > >Thank you! > > > >On Mon, Feb 20, 2017 at 11:12 AM, Hadley Wickham <h.wickham at gmail.com> >wrote: > >> On Sun, Feb 19, 2017 at 3:01 PM, David Winsemius ><dwinsemius at comcast.net> >> wrote: >> > >> >> On Feb 19, 2017, at 11:37 AM, C W <tmrsg11 at gmail.com> wrote: >> >> >> >> Hi R, >> >> >> >> I am a little confused by the data.table package. >> >> >> >> library(data.table) >> >> >> >> df <- data.frame(w=rnorm(20, -10, 1), x= rnorm(20, 0, 1), >y=rnorm(20, >> 10, 1), >> >> z=rnorm(20, 20, 1)) >> >> >> >> df <- data.table(df) >> > >> > df <- setDT(df) is preferred. >> >> Don't you mean just >> >> setDT(df) >> >> ? >> >> setDT() modifies by reference. >> >> >> >> >> df_3 <- df[, a := x-y] # created new column a using x minus y, why >are >> we >> >> using colon equals? >> > >> > You need to do more study of the extensive documentation. The >behavior >> of the ":=" function is discussed in detail there. >> >> You can get to that documentation with ?":=" >> >> Hadley >> >> -- >> http://hadley.nz >> > > [[alternative HTML version deleted]] > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.
Just. Don't. Do. This. (Hint: Threading mail readers.)
On 21 Feb 2017, at 03:53 , C W <tmrsg11 at gmail.com> wrote:
Thanks Hadley! While I got your attention, what is a good way to get started on ggplot2? ;)
Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com