Need help to locate my mistake
On 3/03/2008, at 9:18 AM, Louise Hoffman wrote:
Dear readers I would like to make General Linear Model (GLM) for the following data set http://louise.hoffman.googlepages.com/fuel.csv The code I have written is fuelData<-read.table('fuel.csv',header=TRUE, sep=',') n<-dim(fuelData)[1] xOnes<- matrix(1,nrow=n,ncol=1) x<-cbind(xOnes,fuelData[,3]) y<-fuelData[,4] theta<-((t(x)%*%x)^(-1))%*%t(x)%*%y which gives
theta
[,1] [1,] 215.8374077 [2,] 0.1083491 When I do it in Matlab I get theta to be 79.69 0.18 which is correct. ~79 is the crossing of the y-axis.
This is certainly ***NOT*** correct. (If you really got those numbers from Matlab, then Matlab is up to Puttee.) Have you plotted your data? (1) Fitting a straight line is ridiculous. (2) If you are so foolish as to fit a straight line, you get theta to have entries -4197.96 (intercept) and 2.16 (slope). The line y = 79.69 + 0.18*x is off the edge of the graph and does not even appear.
Have I perhaps written theta wrong?
Yes. The expression (t(x)%*%x)^(-1) is the matrix of entry
by entry reciprocals of the entries of t(x)%*%x.
You want:
theta <- solve(t(x)%*%x))%*%t(x)%*%y
Anyhow, if you're going to use R, why not ***use R***?
fit <- lm(fpi ~ rtime,data=fuelData)
theta <- coef(fit)
This gives an answer identical to that from the corrected version of
your ``from scratch'' expression. (That expression, while
theoretically
correct, is numerically ill-advised. The cognoscenti use either the
Choleski or the ``qr'' decomposition of t(x)%*%x to effect the
calculations.
One of these is what is going on in the bowels of lm(). But here I
speak
of that of which I know little.)
cheers,
Rolf Turner
######################################################################
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}