Skip to content

Predict polynomial problem

11 messages · Barry Rowlingson, Bert Gunter, Gavin Simpson +3 more

#
I have a function that fits polynomial models for the orders in n:

lmn <- function(d,n){
  models=list()
  for(i in n){
    models[[i]]=lm(y~poly(x,i),data=d)
  }
  return(models)
}

 My data is:

 > d=data.frame(x=1:10,y=runif(10))

 So first just do it for a cubic:

 > mmn = lmn(d,3)
 > predict(mmn[[3]])
        1         2         3         4         5         6         7         8
0.6228353 0.5752811 0.5319524 0.4957381 0.4695269 0.4562077 0.4586691 0.4798001
        9        10
0.5224893 0.5896255

and lets extrapolate a bit:

 > predict(mmn[[3]],newdata=data.frame(x=c(9,10,11)))
        1         2         3
 0.5224893 0.5896255 0.6840976

 now let's to it for cubic to quintic:

 > mmn = lmn(d,3:5)

 check the cubic:

 > predict(mmn[[3]])
        1         2         3         4         5         6         7         8
0.6228353 0.5752811 0.5319524 0.4957381 0.4695269 0.4562077 0.4586691 0.4798001
        9        10
0.5224893 0.5896255

 - thats the same as last time. Extrapolate?

 > predict(mmn[[3]],newdata=data.frame(x=c(9,10,11)))
Error: variable 'poly(x, i)' was fitted with type "nmatrix.3" but type
"nmatrix.5" was supplied
In addition: Warning message:
In Z/rep(sqrt(norm2[-1L]), each = length(x)) :
  longer object length is not a multiple of shorter object length

it falls over. I can't see the difference between the objects,
summary() looks the same. Is something wrapped up in an environment
somewhere, or some lazy evaluation thing, or have I just done
something stupid?

Here's a complete example you can paste in - R --vanilla < this.R
gives the error above - R 2.10.1 on Ubuntu, and also on R 2.8.1 I had
lying around on a Windows box:

d = data.frame(x=1:10,y=runif(10))

lmn <- function(d,n){
  models=list()
  for(i in n){
    models[[i]]=lm(y~poly(x,i),data=d)
  }
  return(models)
}

mmn = lmn(d,3)
predict(mmn[[3]])
predict(mmn[[3]],newdata=data.frame(x=c(9,10,11)))

mmn2 = lmn(d,3:5)
predict(mmn2[[3]])
predict(mmn2[[3]],newdata=data.frame(x=c(9,10,11)))


Barry
#
Barry:

I reproduced your error on Windows.

However, as you know, in your code below, the first two components of your
list are NULL. This is a bit clumsy, so I modified your lmn function by
changing it from 

... for(i in n) ...

to

... for( i in 1:n) ...

With that change, there are no errors.

Other than that, no clue, but maybe it helps.

-- Bert

Bert Gunter
Genentech Nonclinical Statistics

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf Of Barry Rowlingson
Sent: Monday, January 18, 2010 3:31 PM
To: r-help at r-project.org
Subject: [R] Predict polynomial problem

 I have a function that fits polynomial models for the orders in n:

lmn <- function(d,n){
  models=list()
  for(i in n){
    models[[i]]=lm(y~poly(x,i),data=d)
  }
  return(models)
}

 My data is:

 > d=data.frame(x=1:10,y=runif(10))

 So first just do it for a cubic:

 > mmn = lmn(d,3)
 > predict(mmn[[3]])
        1         2         3         4         5         6         7
8
0.6228353 0.5752811 0.5319524 0.4957381 0.4695269 0.4562077 0.4586691
0.4798001
        9        10
0.5224893 0.5896255

and lets extrapolate a bit:

 > predict(mmn[[3]],newdata=data.frame(x=c(9,10,11)))
        1         2         3
 0.5224893 0.5896255 0.6840976

 now let's to it for cubic to quintic:

 > mmn = lmn(d,3:5)

 check the cubic:

 > predict(mmn[[3]])
        1         2         3         4         5         6         7
8
0.6228353 0.5752811 0.5319524 0.4957381 0.4695269 0.4562077 0.4586691
0.4798001
        9        10
0.5224893 0.5896255

 - thats the same as last time. Extrapolate?

 > predict(mmn[[3]],newdata=data.frame(x=c(9,10,11)))
Error: variable 'poly(x, i)' was fitted with type "nmatrix.3" but type
"nmatrix.5" was supplied
In addition: Warning message:
In Z/rep(sqrt(norm2[-1L]), each = length(x)) :
  longer object length is not a multiple of shorter object length

it falls over. I can't see the difference between the objects,
summary() looks the same. Is something wrapped up in an environment
somewhere, or some lazy evaluation thing, or have I just done
something stupid?

Here's a complete example you can paste in - R --vanilla < this.R
gives the error above - R 2.10.1 on Ubuntu, and also on R 2.8.1 I had
lying around on a Windows box:

d = data.frame(x=1:10,y=runif(10))

lmn <- function(d,n){
  models=list()
  for(i in n){
    models[[i]]=lm(y~poly(x,i),data=d)
  }
  return(models)
}

mmn = lmn(d,3)
predict(mmn[[3]])
predict(mmn[[3]],newdata=data.frame(x=c(9,10,11)))

mmn2 = lmn(d,3:5)
predict(mmn2[[3]])
predict(mmn2[[3]],newdata=data.frame(x=c(9,10,11)))


Barry

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
#
On Mon, 18 Jan 2010, Barry Rowlingson wrote:

            
Its the environment thing.

I think you want something like this:

 	models[[i]]=lm( bquote( y ~ poly(x,.(i)) ), data=d)

Use
 	terms( mmn[[3]] )

both with and without this change and


 	ls( env = environment( formula( mmn[[3]] ) ) )
 	get("i",env=environment(formula(mmn[[3]])))
 	sapply(mmn,function(x) environment( formula( x ) ) )


to see what gives.

HTH,

Chuck
Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901
#
On Tue, Jan 19, 2010 at 1:36 AM, Charles C. Berry <cberry at tajo.ucsd.edu> wrote:

            
Think I see it now. predict involves evaluating poly, and poly here
needs 'i' for the order. If the right 'i' isn't gotten when predict is
called then I get the error. Your fix sticks the right 'i' into the
environment when predict is called.

 I haven't quite got my head round _how_ it does it, and I have no
idea how I could have figured this out for myself. Oh well...

 The following lines are also illustrative:

d = data.frame(x=1:10,y=runif(10))

i=3
#1 naive model:
m1 = lm(y~poly(x,i),data=d)
#2,3 bquote, without or with i-wrapping:
m2 = lm(bquote(y~poly(x,i)),data=d)
m3 = lm(bquote(y~poly(x,.(i))),data=d)

#1 works, gets 'i' from global i=3 above:
predict(m1,newdata=data.frame(x=9:11))
#2 fails - why?
predict(m2,newdata=data.frame(x=9:11))
#3 works, gets 'i' from within:
predict(m3,newdata=data.frame(x=9:11))

rm(i)

#1 now fails because we removed 'i' from top level:
predict(m1,newdata=data.frame(x=9:11))
#2 still fails:
predict(m2,newdata=data.frame(x=9:11))
#3 still works:
predict(m3,newdata=data.frame(x=9:11))

Thanks
#
On Tue, 2010-01-19 at 08:27 +0000, Barry Rowlingson wrote:
Perhaps this Programmer's Niche article by Bill Venables might also be
useful as it discuss how to manipulate formulas to automate model
fitting...?

Bill Venables. Programmer's Niche. R News, 2(2):24-26, June 2002.

http://cran.r-project.org/doc/Rnews/Rnews_2002-2.pdf

HTH

G

  
    
#
and the values in those 
places are different:
On Tue, 19 Jan 2010, Barry Rowlingson wrote:

            
Per ?bquote, "bquote quotes its argument except that terms wrapped in 
'.()' are evaluated in the specified 'where' environment.

(which by default is the parent.frame)

Note:
y ~ poly(x, 20)
So, now 'i' is irrelevant as the expression returned by bquote has '20' as 
the 'degree' arg.
Well, the terms() objects are the same:
[1] TRUE
but they will look in different places for 'i':
<environment: 0x01b7c178>
<environment: R_GlobalEnv>
and the values in those places are different:
[1] 2
[1] 3
It doesn't need 'i', because the i was evaluated and substituted by 
bquote. That is, it doesn't get("i") as the expression returned by bquote 
has no 'i' in it.

HTH,

Chuck
Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901
#
On Tue, 19 Jan 2010, Charles C. Berry wrote:

            
And I should have mentioned that environment(terms(m2)) happens to have an 
object 'i' in it regardless of whether poly() is used.

Chuck
Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901
#
On Tue, 19 Jan 2010, Barry Rowlingson wrote:

            
Yes, bquote() was written to mimic the backquote macro in Lisp, hence its name.

      -thomas

Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle
#
Barry Rowlingson wrote:
And that might be why ?bquote says:
"An analogue of the LISP backquote macro."
:)

Cheers,
Peter
#
Charles C. Berry wrote:
Right. It might be worth pointing out that the 'i' in
environment(terms(m2)) or in environment(terms(m3)),
which also has an 'i', is there even if you use poly(x,j).
It is (if I'm not mistaken) the number of 'variables' in
the formula: response plus predictor terms. Thus

j <- 5
m4 <- lm(bquote(y ~ sqrt(x) + poly(x, .(j))), data=d)
environment(terms(m4))$i
[1] 3

  -Peter Ehlers