Message-ID: <4D9CE8F1.2060602@ohsu.edu>
Date: 2011-04-06T22:28:01Z
From: Brian Diggs
Subject: glm predict on new data
In-Reply-To: <1302124625814-3431855.post@n4.nabble.com>
On 4/6/2011 2:17 PM, dirknbr wrote:
> I am aware this has been asked before but I could not find a resolution.
>
> I am doing a logit
>
> lg<- glm(y[1:200] ~ x[1:200,1],family=binomial)
glm (and most modeling functions) are designed to work with data frames,
not raw vectors.
> Then I want to predict a new set
>
> pred<- predict(lg,x[201:250,1],type="response")
>
> But I get varying error messages or warnings about the different number of
> rows. I have tried data/newdata and also to wrap in data.frame() but cannot
> get to work.
I'll made up some data, show the way you approached it, show where it
went wrong, and then how it works more easily.
# data like what I think you had:
y <- rbinom(200, 1, prob=.8)
x <- data.frame(x=rnorm(250))
# your glm call:
lg <- glm(y[1:200]~x[1:200,1],family=binomial)
# take a look at print(lg). Notice that your independent variable
# name is "x[1:200, 1]", which is what you would need to match in
# a call to predict.
# Make data.frames of the given and testing data.
DF <- data.frame(y=y, x=x[1:200,1])
DF.new <- data.frame(x=x[200:250,1])
# Notice DF.new has the same name (x) as DF.
lg <- glm(y~x, data=DF, family=binomial)
pred <- predict(lg, newdata=DF.new, type="response")
summary(pred)
> Help would be appreciated.
>
> Dirk.
--
Brian S. Diggs, PhD
Senior Research Associate, Department of Surgery
Oregon Health & Science University