Using $ accessor in GAM formula
On Fri, 2011-05-06 at 11:20 -0500, Gene Leynes wrote:
Hmmm.... After reading that email four times, I think I see what you mean. Checking for variables within particular scopes is probably one of the most challenging things in R, and I would guess in other languages too. In R it's compounded by situations when you're writing a function to accept variables as either a stand alone global variable, or as an element of a data.frame or list.
Dear Gene,
I think this is a new problem, and I'll switch to the lengthier data.frame[,'x'] syntax in gam for now.
No, No, No, No, No!!!!! If there is one thing you **should** take from this thread is that there is no need to perform subsetting like that in a model formula. Why would you want (or prefer) to do: gam(dat$y ~ s(dat$x)) or gam(dat[, "y"] ~ s(dat[, "x"])) when gam(y ~ s(x), dat) will suffice?
By the way, about the $ accessors. I can see why some people don't like them, but they are a part of the language. And, I think they're a good part. They make the code much more readable, and I have yet to make a mistake using them (which makes me think that they can be used responsibly). Making code harder to read is its own source of error, and is not something that I take lightly!
If you use R's formula notation properly, you'll get cleaner code than either of your suggestions.
Thanks for the replies. And, thank you Rolf for the detailed analysis. Do you think that your or I should submit a bug report to the package maintainer? I'm not sure how that works. Very few of my challenges turn out to be actual bugs, but I think this one is.
I would consider this a bug - but in the sense that Simon didn't foresee the strange formulas that users of his software might concoct. By the way, I think you perhaps meant Berwin (re the detailed analysis)? HTH G
On Fri, May 6, 2011 at 4:53 AM, Berwin A Turlach <Berwin.Turlach at gmail.com>wrote:
G'day Rolf, On Fri, 06 May 2011 09:58:50 +1200 Rolf Turner <rolf.turner at xtra.co.nz> wrote:
but it's strange that the dodgey code throws an error with gam(dat1$y ~ s(dat1$x)) but not with gam(dat2$cf ~ s(dat2$s))
Something a bit subtle is going on; it would be nice to be able to understand it.
Well,
R> traceback()
3: eval(expr, envir, enclos)
2: eval(inp, data, parent.frame())
1: gam(dat$y ~ s(dat$x))
So the lines leading up to the problem seem to be the following from
the gam() function:
vars <- all.vars(gp$fake.formula[-2])
inp <- parse(text = paste("list(", paste(vars, collapse = ","),
")"))
if (!is.list(data) && !is.data.frame(data))
data <- as.data.frame(data)
Setting
R> options(error=recover)
running the code until the error occurs, and then examining the frame
number for the gam() call shows that "inp" is
"expression(list( dat1,x ))" in your first example and
"expression(list( dat2,s ))" in your second example. In both
examples, "data" is "list()" (not unsurprisingly). When,
dl <- eval(inp, data, parent.frame())
is executed, it tries to eval "inp", in both cases "dat1" and "dat2"
are found, obviously, in the parent frame. In your first example "x" is
(typically) not found and an error is thrown, in your second example an
object with name "s" is found in "package:mgcv" and the call to eval
succeeds. "dl" becomes a list with two components, the first being,
respectively, "dat1" or "dat2", and the second the body of the function
"s". (To verify that, you should probably issue the command
"debug(gam)" and step through those first few lines of the function
until you reach the above command.)
The corollary is that you can use the name of any object that R will
find in the parent frame, if it is another data set, then that data
set will become the second component of "inp". E.g.:
R> dat=data.frame(min=1:100,cf=sin(1:100/50)+rnorm(100,0,.05))
R> gam(dat$cf ~ s(dat$min))
Family: gaussian
Link function: identity
Formula:
dat$cf ~ s(dat$min)
Estimated degrees of freedom:
3.8925 total = 4.892488
GCV score: 0.002704789
Or
R> dat=data.frame(BOD=1:100,cf=sin(1:100/50)+rnorm(100,0,.05))
R> gam(dat$cf ~ s(dat$BOD))
Family: gaussian
Link function: identity
Formula:
dat$cf ~ s(dat$BOD)
Estimated degrees of freedom:
3.9393 total = 4.939297
GCV score: 0.002666985
Just out of pure academic interest. :-)
Hope your academic curiosity is now satisfied. :)
HTH.
Cheers,
Berwin
========================== Full address ============================
A/Prof Berwin A Turlach Tel.: +61 (8) 6488 3338 (secr)
School of Maths and Stats (M019) +61 (8) 6488 3383 (self)
The University of Western Australia FAX : +61 (8) 6488 1028
35 Stirling Highway
Crawley WA 6009 e-mail: Berwin.Turlach at gmail.com
Australia http://www.maths.uwa.edu.au/~berwin
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%