Dear list,
I still have no idea how to specify or give start values for vectors in a formula for nlmer()
The technique works fine for an approximate method I use in lmer()
(That technique depends heavily on some scaling of variables.)
Here is a self-contained example using some mock data whose origin is described.
I always get one of these two errors when trying to specify the nlmer() model
Error in nlmer(ylong ~ xinter(a, xlong, x0.est) ~ (a | species) + (x0.est | :
not all parameter names are used in the nonlinear model expression
or
Error: s > 0 is not TRUE
Can I get any help on formula specification or start values?
I have coded in a function with derivatives using deriv() .
# Example of emissions estimation
#
# Define a "true" background concentration ... 100 instnaces
x0 = rnorm(100, 20, 5)
# Define a perturbation of the key tracer -- log normal
xd = rlnorm( 100, log(8), 0.3)
# Define the measurable, x
x = x0 + xd
# Define some "emission factors" linearly relating several dependent variables
# as they are related to the key tracer (above background)
# 10 different dependent variables or "species"
a = 0.1*rexp(10,rate=0.6)
# Define the pure response variable (yun ... "y unscaled")
yun = t( outer( a,xd ) )
# Add some noise in both the response, both w.r.t. the dependent species
# and also the measurement instance
for (i in 1:10 ) yun[,i] = yun[,i]*(1+rnorm(100,0.05,0.1))
for (j in 1:100 ) yun[j,] = yun[j,]*(1+rnorm(10,0.05,0.1))
#
y = yun # dimension a scaled response and fill it with scaled values (still all positive)
ysc = vector(mode="numeric", length=10)
asc = vector(mode="numeric", length=10)
for ( i in 1:10 ) {
+ ytemp = scale(yun[,i], center=F) # get ytemp to define scaling factor
+ y[,i] = ytemp # scale
+ ysc[i] = attributes(ytemp)$`scaled:scale`
+ asc[i] = 1/ysc[i] # useful version of a to relate scaled quantities
+ }
#
# Make a "longi" list of the tracer-species, repeating once for every response
# Note: normally I use reshape() to do this. See: ?reshape
xlong = rep(x,times=10)
ylong = as.vector(y) # collapse the matrix into a similar "long" vector
# Make a factor variable describing each measurement instance and each dependent variable
id = as.factor(rep(paste("sample",as.character(1:100),sep=""),times=10))
species = as.factor(rep(paste("species",as.character(1:10),sep=""),times=100))
#
# Create the data-frame for nlmer
trial.dat = data.frame(xlong=xlong,ylong=ylong,id=id,species=species)
#
# Define a formula ... trying to force a vector definition
## xinterform <- ~a*(xlong+ x0.est) # previous try
#
xinterform <- ~( a * xlong+ x0.est)
# Explicitly declaring these as vectors gives an error
# xinterform <- ~( a=vector(mode="numeric",length=1000) * xlong=vector(mode="numeric",length=1000) + x0.est=vector(mode="numeric",length=1000) )
# Doesn't work:
# Error in deriv.formula(xinterform, namevec = c("a", "xlong", "x0.est"), :
# invalid expression in 'FindSubexprs'
xinter <- deriv(xinterform, namevec=c("a","xlong","x0.est"),
+ start=start.vec, data=trial.dat )
Error in nlmer(ylong ~ xinter(a, xlong, x0.est) ~ (a | species) + (x0.est | :
not all parameter names are used in the nonlinear model expression
Dear list, I still have no idea how to specify or give start values
for vectors in a formula for nlmer() The technique works fine for an
approximate method I use in lmer() (That technique depends heavily
on some scaling of variables.) Here is a self-contained example
using some mock data whose origin is described.
I don't know if I got sensible answers or not, but I seem to have
succeeded in fitting a model. The key changes were: (1) starting
parameters are *single* values (these are starting values for the
mean/central values of the parameters, not of the random effects);
(2) leaving "xlong" out of namevec in the deriv() call, because it's
a predictor variable, not a parameter (you don't want its gradient)
Slight modifications:
## Example of emissions estimation
set.seed(101)
## Define a "true" background concentration ... 100 instnaces
x0 = rnorm(100, 20, 5)
## Define a perturbation of the key tracer -- log normal
xd = rlnorm( 100, log(8), 0.3)
## Define the measurable, x
x = x0 + xd
## Define some "emission factors" linearly relating several dependent variables
## as they are related to the key tracer (above background)
## 10 different dependent variables or "species"
a = 0.1*rexp(10,rate=0.6)
## Define the pure response variable (yun ... "y unscaled")
yun = t( outer( a,xd ) )
## Add some noise in both the response, both w.r.t. the dependent species
## and also the measurement instance
## for (i in 1:10 ) yun[,i] = yun[,i]*(1+rnorm(100,0.05,0.1))
## for (j in 1:100 ) yun[j,] = yun[j,]*(1+rnorm(10,0.05,0.1))
yun <- sweep(yun,1,rnorm(nrow(yun),0.05,0.1),"*")
yun <- sweep(yun,2,rnorm(ncol(yun),0.05,0.1),"*")
##
y = yun
## dimension a scaled response and fill it with scaled values
## (still all positive)
ysc = vector(mode="numeric", length=10)
asc = vector(mode="numeric", length=10)
for ( i in 1:10 ) {
ytemp = scale(yun[,i], center=FALSE) # get ytemp to define scaling factor
y[,i] = ytemp # scale
ysc[i] = attributes(ytemp)$`scaled:scale`
asc[i] = 1/ysc[i] # useful version of a to relate scaled quantities
}
##
## Make a "longi" list of the tracer-species, repeating once for every response
## Note: normally I use reshape() to do this. See: ?reshape
xlong = rep(x,times=10)
ylong = as.vector(y) # collapse the matrix into a similar "long" vector
## Make a factor variable describing each measurement
## instance and each dependent variable
id = as.factor(rep(paste("sample",as.character(1:100),sep=""),times=10))
species = as.factor(rep(paste("species",as.character(1:10),sep=""),times=100))
##
## Create the data-frame for nlmer
trial.dat = data.frame(xlong=xlong,ylong=ylong,id=id,species=species)
##
xinterform <- ~( a * xlong+ x0.est)
xinter <- deriv(xinterform, namevec=c("a","x0.est"),
function.arg=c("a","xlong","x0.est"))
##
library(lme4)
start.vec <- c(a=0.16,x0.est=20)
##
test.nlmer = nlmer( ylong ~
xinter(a,xlong,x0.est) ~ (a|species) + (x0.est |id),
start=start.vec, data=trial.dat )
Partial conclusions:
Ben's two observations have been a tremendous help to me. Use of nlmer
following (1) and (2) allow me to get a solution in the case of my original problem,
although some questions of interpretation still remain.
(1) Use single values as starting values, i.e., the mean/central values of the parameters
(2) including sensitivities to parameters only (not to additional independent variables)
in the deriv() function.
In summary for this problem, ...
=> Getting (apparently) good x0.est values from ylong ~ ( a*(xlong - x0.est) )
is the major accomplishment of nlmer()... one can work out the other questions.
=> Such success is likely to be data-set dependent since the problem can easily
be ill conditioned (I surmise).
==============================================================
Now,
If anyone has been following the discussion of the idealized examples, there are
some flaws in what's in the correspondence that remain
and that confuse my discussion and his for those who might follow.
Ask me for
(a) Importantly, the non-linear formula is best expressed
xinterform <- ~( a*(xlong - x0.est) )
(note sign of x0.est is changed and the parentheses
are all-important. a*xlong - x0.est is linear.)
(b) Analysis of the idealized problem suggests that it is
poorly conditioned, with the effects of x0.est and a being
too much correlated.
=> I surmise that the success of my use of nlmer with
the original data derives from the probability distributions
involved. More work would be needed to quantify this
understanding.
(c) Details -- I used "outer" incorrectly to express
for (i in 1:10 ) yun[,i] = yun[,i]*(1+rnorm(100,0.15,0.1))
for (j in 1:100 ) yun[j,] = yun[j,]*(1+rnorm(10,0.15,0.1))
while Ben left out the important (1+rnorm( ) ) in his version using sweep.
(d) Details: I still have difficulty in getting meaningfully non-0 random effects for the
slopes "a" ... Again, this could be my bad coding or it could reflect the
distributions of the y variables as they are normalized.
Thanks much from this newbie, ... fixed effects models have
solved that "there ought to be a way" quandary for me.
Bob Chatfield
On Jan 25, 2014, at 11:45 AM, Ben Bolker <bbolker at gmail.com> wrote:
Robert Chatfield <chatfield at ...> writes:
Dear list, I still have no idea how to specify or give start values
for vectors in a formula for nlmer() The technique works fine for an
approximate method I use in lmer() (That technique depends heavily
on some scaling of variables.) Here is a self-contained example
using some mock data whose origin is described.
I don't know if I got sensible answers or not, but I seem to have
succeeded in fitting a model. The key changes were: (1) starting
parameters are *single* values (these are starting values for the
mean/central values of the parameters, not of the random effects);
(2) leaving "xlong" out of namevec in the deriv() call, because it's
a predictor variable, not a parameter (you don't want its gradient)