Need help to locate my mistake - R-help

Sun, Mar 2, 2008 12:18 PM #

Dear readers

I would like to make General Linear Model (GLM) for the following data set
http://louise.hoffman.googlepages.com/fuel.csv

The code I have written is

fuelData<-read.table('fuel.csv',header=TRUE, sep=',')
n<-dim(fuelData)[1]
xOnes<- matrix(1,nrow=n,ncol=1)
x<-cbind(xOnes,fuelData[,3])
y<-fuelData[,4]
theta<-((t(x)%*%x)^(-1))%*%t(x)%*%y

which gives

[,1]
[1,] 215.8374077
[2,]   0.1083491

When I do it in Matlab I get theta to be
79.69
0.18

which is correct. ~79 is the crossing of the y-axis.

Have I perhaps written theta wrong? The formula for theta is
(alpha,beta)^T = (x^T * x)^(-1) * x^T * Y

where ^T means transposed.

Can someone see where the mistake is?

Hugs,
Louise

Rolf Turner

Sun, Mar 2, 2008 12:55 PM #

On 3/03/2008, at 9:18 AM, Louise Hoffman wrote:

This is certainly ***NOT*** correct. (If you really got those numbers
from Matlab, then Matlab is up to Puttee.)

Have you plotted your data?

	(1) Fitting a straight line is ridiculous.

	(2) If you are so foolish as to fit a straight line, you get
	theta to have entries -4197.96 (intercept) and 2.16 (slope).
	The line y = 79.69 + 0.18*x is off the edge of the graph and
	does not even appear.

Yes.  The expression (t(x)%*%x)^(-1) is the matrix of entry
	by entry reciprocals of the entries of t(x)%*%x.

	You want:

		theta <- solve(t(x)%*%x))%*%t(x)%*%y


	Anyhow, if you're going to use R, why not ***use R***?

	fit <- lm(fpi ~ rtime,data=fuelData)
	theta <- coef(fit)

	This gives an answer identical to that from the corrected version of
	your ``from scratch'' expression.  (That expression, while  
theoretically
	correct, is numerically ill-advised.  The cognoscenti use either the
	Choleski or the ``qr'' decomposition of t(x)%*%x to effect the  
calculations.
	One of these is what is going on in the bowels of lm().  But here I  
speak
	of that of which I know little.)

		cheers,

			Rolf Turner



######################################################################
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}

(Ted Harding)

Sun, Mar 2, 2008 1:05 PM #

On 02-Mar-08 20:18:29, Louise Hoffman wrote:

Unfortunately, x^(-1) is not the inverse of x:

x<-matrix(c(2,4,4,5),nrow=2)
x
#      [,1] [,2]
# [1,]    2    4
# [2,]    4    5

x^(-1)
#      [,1] [,2]
# [1,] 0.50 0.25
# [2,] 0.25 0.20

i.e. it is the matrix which you get by applying the operation
(...)^(-1) to each element of x.

In R, the inverse of a non-singular matrix is (somewhat obscurely)
denoted by solve(x):

solve(x)
#            [,1]       [,2]
# [1,] -0.8333333  0.6666667
# [2,]  0.6666667 -0.3333333

solve(x)%*%x
#      [,1]         [,2]
# [1,]    1 1.110223e-16
# [2,]    0 1.000000e+00
(Note the slight rounding error); whereas

(x^(-1))%*%x
#      [,1] [,2]
# [1,]  2.0 3.25
# [2,]  1.3 2.00

Best wishes,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 02-Mar-08                                       Time: 21:05:50
------------------------------ XFMail ------------------------------

Louise Hoffman

Sun, Mar 2, 2008 3:46 PM #

It was my mistake =) I had calculated the straight line so the edge of
the plot was the y-axis =)

Yes, I guess we have to realise that in the project, and explain why
it is not good enough =) Next step is to use exponential smoothing =)

It was clearly a bad mistake of mine! =(

Thanks =)

This is great stuff!

Louise Hoffman

Sun, Mar 2, 2008 3:49 PM #

Thanks for the very detailed explanation! I will never make that
mistake again =)

Louise Hoffman

Sun, Mar 2, 2008 4:58 PM #

What if I need to calculate the variance for the fuel data?

Are there a 'R' way to do that?

I have derived
variance = (Y-x*theta)^T * (Y - x*theta) / (n-p)

Rolf Turner

Sun, Mar 2, 2008 6:15 PM #

On 3/03/2008, at 1:58 PM, Louise Hoffman wrote:

This should be (something like)

	tr((Y - X%*%theta))%*%(Y-X%*%theta)/(n-p)

where X is your design matrix.

To get the information in R:

	fit <- lm(fpi ~ rtime,data=fuelData)
	summary(fit)

This tells you the residual standard error (standard deviation) is
7.681.  Typing

	summary(fit)$sigma

gives the answer to more decimal places --- 7.681288.

The residual variance can of course be obtained by

	summary(fit)$sigma^2

Note that the name ``sigma'' is bad --- this terminology should
be reserved for population quantities.  What we have is ``sigma-hat'',
an *estimate* of sigma.

Note also that for your fuel data the residual variance is pretty
much meaningless since your model is highly inappropriate for these  
data.

		cheers,

			Rolf Turner

######################################################################
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}