Skip to content

Confused about using data.table package,

7 messages · Hadley Wickham, David Winsemius, C W +2 more

C W
#
Hi R,

I am a little confused by the data.table package.

library(data.table)

df <- data.frame(w=rnorm(20, -10, 1), x= rnorm(20, 0, 1), y=rnorm(20, 10, 1),
z=rnorm(20, 20, 1))

df <- data.table(df)

#drop column w

df_1 <- df[, w := NULL] # I thought you are supposed to do: df_1 <- df[, -w]

df_2 <- df[x<y] # aren't you supposed to do df_2 <- df[x<y]?

df_3 <- df[, a := x-y] # created new column a using x minus y, why are we
using colon equals?

I am a bit confused by this syntax.

Thanks!
#
df <- setDT(df) is preferred.
Nope. The "[.data.table" function is very different from the "[.data.frame' function. As you should be able to see, an expression in the `j` position for "[.data.table" gets evaluated in the environment of the data.table object, so unquoted column names get returned after application of any function. Here it's just a unary minus. 

Actually "nope" on two accounts. You cannot use a unary minus for column names in `[.data.frame` either. Would have needed to be df[ , !colnames(df) in "w"]  # logical indexing
I don't see a difference.
You need to do more study of the extensive documentation. The behavior of the ":=" function is discussed in detail there.
It's non-standard for R but many people find the efficiencies of the package worth the extra effort to learn what is essentially a different evaluation strategy.
Rhelp is a plain text mailing list,
#
On Sun, Feb 19, 2017 at 3:01 PM, David Winsemius <dwinsemius at comcast.net> wrote:
Don't you mean just

setDT(df)

?

setDT() modifies by reference.
You can get to that documentation with ?":="

Hadley
#
Thanks for the correction.
That's a good place to start reading, but I was thinking of data.table::datatable-faq, data.table::datatable-intro which are on the Vignettes page from: help(pac=data.table).
David Winsemius
Alameda, CA, USA
C W
#
Thanks Hadley!

While I got your attention, what is a good way to get started on ggplot2? ;)

My impression is that I first need to learn plyr, dplyr, AND THEN ggplot2.
That's A LOT!

Suppose i have this:
iris
iris2 <- cbind(iris, grade = sample(1:5, 150, replace = TRUE))
iris2

I want to have some kind of graph conditioned on species, by grade . What's
a good lead to learn about plotting this?

Thank you!



On Mon, Feb 20, 2017 at 11:12 AM, Hadley Wickham <h.wickham at gmail.com>
wrote:

  
  
#
I suspect Hadley would recommend reading his new book, R for Data Science (r4ds.had.co.nz), in particular Chapter 3. You don't need plyr, but it won't take long before you will want to be using dplyr and tidyr, which are covered in later chapters.
#
Just. Don't. Do. This. (Hint: Threading mail readers.)
On 21 Feb 2017, at 03:53 , C W <tmrsg11 at gmail.com> wrote: