Skip to content
Prev 393616 / 398503 Next

Removing variables from data frame with a wile card

Valentin,

You are correct that R does many things largely behind the scenes that make some operations fairly efficient.
So if they made a copy of the original data with fewer columns, they might be tempted to think the original item was completely copied and the original is either around or if the identifier was re-used, will be garbage collected. As you note, the only thinks collected are the columns you chose not to include.

For some it seems cleaner to set a list item to NULL, which seems to remove it immediately. 

The real point I hoped to make is that using base R, you can indeed approach removing (multiple) columns in two logical ways. One is to seemingly remove them in the original object, even if your point is valid. The other is to make a copy of just what you want and ignore the rest and it may be kept around or not.

If someone really wanted to get down to the basics, they could get a reference to all the columns they want to keep, as in col1 <- mydata[["col1"] ] and use those to make a new data.frame, or many other variants on these methods.  

Many programming languages have some qualms (I mean designers and programmers, and just plain purists) about when "pointers" of sorts are used and whether things should be mutable and so on so I prefer to avoid religious wars.

-----Original Message-----
From: Valentin Petzel <valentin at petzel.at> 
Sent: Saturday, January 14, 2023 1:21 PM
To: avi.e.gross at gmail.com
Cc: 'R-help Mailing List' <r-help at r-project.org>
Subject: Re: [R] Removing variables from data frame with a wile card

Hello Avi,

while something like d$something <- ... may seem like you're directly modifying the data it does not actually do so. Most R objects try to be immutable, that is, the object may not change after creation. This guarantees that if you have a binding for same object the object won't change sneakily.

There is a data structure that is in fact mutable which are environments. For example compare

L <- list()
local({L$a <- 3})
L$a

with

E <- new.env()
local({E$a <- 3})
E$a

The latter will in fact work, as the same Environment is modified, while in the first one a modified copy of the list is made.

Under the hood we have a parser trick: If R sees something like

f(a) <- ...

it will look for a function f<- and call

a <- f<-(a, ...)

(this also happens for example when you do names(x) <- ...)

So in fact in our case this is equivalent to creating a copy with removed columns and rebind the symbol in the current environment to the result.

The data.table package breaks with this convention and uses C based routines that allow changing of data without copying the object. Doing

d[, (cols_to_remove) := NULL]

will actually change the data.

Regards,
Valentin

14.01.2023 18:28:33 avi.e.gross at gmail.com: