Skip to content

newbie problem using Design.rcs

7 messages · David Winsemius, Frank E Harrell Jr, sp

sp
#
Hi,

I read data from a file. I'm trying to understand how to use Design.rcs by using simple test data first. I use 1000 integer values (1,...,1000) for x (the predictor) with some noise (x+.02*x) and I set the response variable y=x. Then, I try rcs and ols as follows:

m = ( sqrt(y1) ~ ( rcs(x1,3) ) ); #I tried without sqrt also
f = ols(m, data=data_train.df);
print(f);

[I plot original x1,y1 vectors and the regression as in 
y <- coef2[1] + coef2[2]*x1 + coef2[3]*x1*x1]

But this gives me a VERY bad fit:
"
Linear Regression Model

ols(formula = m, data = data_train.df)

         n Model L.R.       d.f.         R2      Sigma 
      1000       4573          2     0.9897       0.76 

Residuals:
      Min        1Q    Median        3Q       Max 
-4.850930 -0.414008 -0.009648  0.418537  3.212079 

Coefficients:
             Value Std. Error      t Pr(>|t|)
Intercept  5.90958  0.0672612  87.86        0
x1         0.03679  0.0002259 162.88        0
x1'       -0.01529  0.0002800 -54.60        0

Residual standard error: 0.76 on 997 degrees of freedom
Adjusted R-Squared: 0.9897 
"

I appreciate any and all help!

Sincerely,
sp
#
On Dec 22, 2008, at 11:38 PM, sp wrote:

            
Not sure what sort of noise that is.
That does not look as though it would capture the structure of a  
restricted **cubic** spline. The usual method in Design for plotting a  
model prediction would be:

plot(f, x1 = NA)
Can you give some hint why you consider this to be a "VERY bad fit"?  
It appears a rather good fit to me, despite the test case apparently  
not being construct with any curvature which is what the rcs modeling  
strategy should be detecting.
#
In addition to David's excellent response, I'll add that your problems 
seem to be statistical and not programming ones.  I recommend that you 
spend a significant amount of time with a good regression text or course 
before using the software.  Also, with Design you can find out the 
algebraic form of the fit:

f <- ols(y ~ rcs(x,3), data=mydata)
Function(f)

Frank
David Winsemius wrote:

  
    
sp
#
Sincere thanks for both the replies.

0. I agree, I'm waiting for my copy of a regression book to arrive. Meanwhile, I'm trying to read on google.

1. My bad, I'm using Gaussian noise.

2. I didn't have x^3 b/c that co-efficient happens to be zero in this fitting.

3. I used lines() b/c I wanted to superimpose the curve from regression atop my first plot of the original data points (x,y). 
I'm not sure how to use plot(f, x1 = NA) after my first plot(). The examples I managed to find on google all use plot() followed by lines(). [In Matlab, I'd just say "hold" in between these calls.]

Also, I'm forced to call win.graph() before my first plot() to see the first plot. Is that normal?

4. I really could use some guidance on this part. I need to use rcs() to fit points in a high-dimensional space and I'm trying to understand and use it correctly. 

I started with testing it on just x,y dimensions so that I can visually evaluate the fitting. I tried y=x, y=x^2 etc, adding Gaussian noise each time (to the y). 

I plot original x,y and x,y' where y' is calculated using the co-efficients returned by rcs. I find that the regression curve differs from the actual points by as high as 10^5 with 3 knots and roughly -10^5 with 4 knots as I make y=x^2, y=x^3....

If this is NOT a good way to test fitting, could you pls tell me a better way?

Respectfully,
sp
--- On Tue, 12/23/08, Frank E Harrell Jr <f.harrell at vanderbilt.edu> wrote:

            
#
sp wrote:
That's strange.
plot(f, x1=NA)
plot(f, x2=NA, add=TRUE)
no
keep reading
wait until you have studied regression

Frank

  
    
sp
#
Dr. Harrell,

Thanks for your patient replies. I've two more questions:

1) Is your book appropriate for beginners or is it more for advanced users?

2) f <- ols(y ~ rcs(x,3), data=mydata)
Function(f)
does not produce anything for me (i.e.) it's empty.

Sincerely,
sp
--- On Tue, 12/23/08, Frank E Harrell Jr <f.harrell at vanderbilt.edu> wrote:

            
#
sp wrote:
It is in the middle of those two.
Here's a test I just ran

 > y <- rnorm(100)
 > x <- runif(100)
 > f <- ols(y ~ rcs(x,3))
 > Function(f)
function(x = NA) {-0.18307779+0.91343138* 
x-2.1908543*pmax(x-0.10568075,0)^3+3.8312836*pmax(x-0.45620427,0)^3-1.6404293*pmax(x-0.92434146,0)^3 
}
<environment: 0x2ca30a8>

If you are doing this inside { } you will need to do print(Function(f))

Note this is a simplified version of the rcs with 2 redundant terms to 
avoid writing in terms of differences in cubes.

Frank