Skip to content

dynamic variable creation in lists and data frames

2 messages · Dan Rabosky, Marc Schwartz

#
Hi

I have a question about the creation of variables within lists in R.  I am
running simulations and am interested in two parameters, ESM and ESMM (the
similarity of these names is important for my question).  I do simulations
to generate ESMM, then plug these values into a second simulation function
to get ESM:

x <- list()

for (i in 1:nsimulations)
{
	x$ESMM[i] <- do_simulation1()
	x$ESM[i] <- do_simulation2(x$ESMM[i])
}

and I return everything as a dataframe, x <- as.data.frame(x)

When I do this, I find that x$ESMM is overwritten by x$ESM for the first
simulation.  However, x$ESM is nonetheless correctly generated using
x$ESMM.

Thus, x$ESM[1] =  x$ESMM[1], but for the other n-thousand simulations,
ESMM is not overwritten; the error only occurs on the first instance of
ESM.

I think I know why this is occurring: I am creating a new variable in a
list and assigning it a value, but when R can?t find the variable, it
overwrites the next most similar variable (ESMM).  But it still proceeds
to create the new variable ESM, having overwritten x$ESMM[1].  And it
doesn?t happen for subsequent simulations, because both variables then
exist in the list.

My questions are:
1) how different do variable names have to be to avoid this problem?  What
exactly is R using to decide that ESMM is the same as ESM?

or

2) is there something fundamentally flawed with the manner in which I
dynamically create variables in lists, without initializing them in some
fashion?  This approach worked fine until I noticed this issue with
variables having similar names.

Thanks very much in advance for your help.

Dan Rabosky


Dan Rabosky
Department of Ecology and Evolutionary Biology
Corson Hall
Cornell University
Ithaca, NY 14853
#
On Tue, 2006-12-05 at 14:41 -0500, Daniel Lee Rabosky wrote:
This has to do with partial matching to index data frame columns and
list elements. It is the default behavior in R and if you search the
archives using:

  RSiteSearch("partial matching")

you will note prior discussions on this.

A simple example:
list()
$ESMM
[1] 1
$ESMM
[1] 2

$ESM
[1] 2


Both values are changed, since x$ESM does not yet exist and the
assignment partially matches x$ESMM. Then x$ESM is created.

I think that in this particular situation, you might want to try:

# Create a simple function that returns pairs of random samples from 
# 'letters', which is a:z
Sim <- function()
{
   list(ESMM = letters[sample(26, 1)], 
        ESM = letters[sample(26, 1)])
}

# Run it once
$ESMM
[1] "l"

$ESM
[1] "z"


Now use replicate() to do this 10 times. Note the default behavior is to
simplify the returned values into a matrix.
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
ESMM "x"  "q"  "c"  "f"  "e"  "f"  "y"  "d"  "z"  "h"  
ESM  "u"  "c"  "j"  "v"  "u"  "j"  "o"  "p"  "g"  "g"  


So, in your case create a function Sim() like this:

Sim <- function()
{
  ESMM <- do_simulation1()
  ESM <- do_simulation2(ESMM)
  
  list(ESMM = ESMM, ESM = ESM)
}


and then use replicate() as above.  See ?replicate for more information.

HTH,

Marc Schwartz