Hi,
I read data from a file. I'm trying to understand how to use Design.rcs by using simple test data first. I use 1000 integer values (1,...,1000) for x (the predictor) with some noise (x+.02*x) and I set the response variable y=x. Then, I try rcs and ols as follows:
m = ( sqrt(y1) ~ ( rcs(x1,3) ) ); #I tried without sqrt also
f = ols(m, data=data_train.df);
print(f);
[I plot original x1,y1 vectors and the regression as in
y <- coef2[1] + coef2[2]*x1 + coef2[3]*x1*x1]
But this gives me a VERY bad fit:
"
Linear Regression Model
ols(formula = m, data = data_train.df)
n Model L.R. d.f. R2 Sigma
1000 4573 2 0.9897 0.76
Residuals:
Min 1Q Median 3Q Max
-4.850930 -0.414008 -0.009648 0.418537 3.212079
Coefficients:
Value Std. Error t Pr(>|t|)
Intercept 5.90958 0.0672612 87.86 0
x1 0.03679 0.0002259 162.88 0
x1' -0.01529 0.0002800 -54.60 0
Residual standard error: 0.76 on 997 degrees of freedom
Adjusted R-Squared: 0.9897
"
I appreciate any and all help!
Sincerely,
sp
newbie problem using Design.rcs
7 messages · David Winsemius, Frank E Harrell Jr, sp
On Dec 22, 2008, at 11:38 PM, sp wrote:
Hi, I read data from a file. I'm trying to understand how to use Design.rcs by using simple test data first. I use 1000 integer values (1,...,1000) for x (the predictor) with some noise (x+.02*x) and I set the response variable y=x. Then, I try rcs and ols as follows:
Not sure what sort of noise that is.
m = ( sqrt(y1) ~ ( rcs(x1,3) ) ); #I tried without sqrt also f = ols(m, data=data_train.df); print(f); [I plot original x1,y1 vectors and the regression as in y <- coef2[1] + coef2[2]*x1 + coef2[3]*x1*x1]
That does not look as though it would capture the structure of a restricted **cubic** spline. The usual method in Design for plotting a model prediction would be: plot(f, x1 = NA)
But this gives me a VERY bad fit: "
Can you give some hint why you consider this to be a "VERY bad fit"? It appears a rather good fit to me, despite the test case apparently not being construct with any curvature which is what the rcs modeling strategy should be detecting.
David Winsemius > Linear Regression Model > > ols(formula = m, data = data_train.df) > > n Model L.R. d.f. R2 Sigma > 1000 4573 2 0.9897 0.76 > > Residuals: > Min 1Q Median 3Q Max > -4.850930 -0.414008 -0.009648 0.418537 3.212079 > > Coefficients: > Value Std. Error t Pr(>|t|) > Intercept 5.90958 0.0672612 87.86 0 > x1 0.03679 0.0002259 162.88 0 > x1' -0.01529 0.0002800 -54.60 0 > > Residual standard error: 0.76 on 997 degrees of freedom > Adjusted R-Squared: 0.9897 > " > > I appreciate any and all help! > > Sincerely, > sp > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
In addition to David's excellent response, I'll add that your problems seem to be statistical and not programming ones. I recommend that you spend a significant amount of time with a good regression text or course before using the software. Also, with Design you can find out the algebraic form of the fit: f <- ols(y ~ rcs(x,3), data=mydata) Function(f) Frank
David Winsemius wrote:
On Dec 22, 2008, at 11:38 PM, sp wrote:
Hi, I read data from a file. I'm trying to understand how to use Design.rcs by using simple test data first. I use 1000 integer values (1,...,1000) for x (the predictor) with some noise (x+.02*x) and I set the response variable y=x. Then, I try rcs and ols as follows:
Not sure what sort of noise that is.
m = ( sqrt(y1) ~ ( rcs(x1,3) ) ); #I tried without sqrt also f = ols(m, data=data_train.df); print(f); [I plot original x1,y1 vectors and the regression as in y <- coef2[1] + coef2[2]*x1 + coef2[3]*x1*x1]
That does not look as though it would capture the structure of a restricted **cubic** spline. The usual method in Design for plotting a model prediction would be: plot(f, x1 = NA)
But this gives me a VERY bad fit: "
Can you give some hint why you consider this to be a "VERY bad fit"? It appears a rather good fit to me, despite the test case apparently not being construct with any curvature which is what the rcs modeling strategy should be detecting.
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University
Sincere thanks for both the replies. 0. I agree, I'm waiting for my copy of a regression book to arrive. Meanwhile, I'm trying to read on google. 1. My bad, I'm using Gaussian noise. 2. I didn't have x^3 b/c that co-efficient happens to be zero in this fitting. 3. I used lines() b/c I wanted to superimpose the curve from regression atop my first plot of the original data points (x,y). I'm not sure how to use plot(f, x1 = NA) after my first plot(). The examples I managed to find on google all use plot() followed by lines(). [In Matlab, I'd just say "hold" in between these calls.] Also, I'm forced to call win.graph() before my first plot() to see the first plot. Is that normal? 4. I really could use some guidance on this part. I need to use rcs() to fit points in a high-dimensional space and I'm trying to understand and use it correctly. I started with testing it on just x,y dimensions so that I can visually evaluate the fitting. I tried y=x, y=x^2 etc, adding Gaussian noise each time (to the y). I plot original x,y and x,y' where y' is calculated using the co-efficients returned by rcs. I find that the regression curve differs from the actual points by as high as 10^5 with 3 knots and roughly -10^5 with 4 knots as I make y=x^2, y=x^3.... If this is NOT a good way to test fitting, could you pls tell me a better way? Respectfully, sp
--- On Tue, 12/23/08, Frank E Harrell Jr <f.harrell at vanderbilt.edu> wrote:
From: Frank E Harrell Jr <f.harrell at vanderbilt.edu> Subject: Re: [R] newbie problem using Design.rcs To: "David Winsemius" <dwinsemius at comcast.net> Cc: to_rent_2000 at yahoo.com, r-help at r-project.org Date: Tuesday, December 23, 2008, 9:41 AM In addition to David's excellent response, I'll add that your problems seem to be statistical and not programming ones. I recommend that you spend a significant amount of time with a good regression text or course before using the software. Also, with Design you can find out the algebraic form of the fit: f <- ols(y ~ rcs(x,3), data=mydata) Function(f) Frank David Winsemius wrote:
On Dec 22, 2008, at 11:38 PM, sp wrote:
Hi, I read data from a file. I'm trying to
understand how to use Design.rcs by using simple test data first. I use 1000 integer values (1,...,1000) for x (the predictor) with some noise (x+.02*x) and I set the response variable y=x. Then, I try rcs and ols as follows:
Not sure what sort of noise that is.
m = ( sqrt(y1) ~ ( rcs(x1,3) ) ); #I tried without
sqrt also
f = ols(m, data=data_train.df); print(f); [I plot original x1,y1 vectors and the regression
as in
y <- coef2[1] + coef2[2]*x1 + coef2[3]*x1*x1]
That does not look as though it would capture the
structure of a restricted **cubic** spline. The usual method in Design for plotting a model prediction would be:
plot(f, x1 = NA)
But this gives me a VERY bad fit: "
Can you give some hint why you consider this to be a
"VERY bad fit"? It appears a rather good fit to me, despite the test case apparently not being construct with any curvature which is what the rcs modeling strategy should be detecting.
-- Frank E Harrell Jr Professor and Chair
School of Medicine
Department of Biostatistics
Vanderbilt University
sp wrote:
Sincere thanks for both the replies. 0. I agree, I'm waiting for my copy of a regression book to arrive. Meanwhile, I'm trying to read on google. 1. My bad, I'm using Gaussian noise. 2. I didn't have x^3 b/c that co-efficient happens to be zero in this fitting.
That's strange.
3. I used lines() b/c I wanted to superimpose the curve from regression atop my first plot of the original data points (x,y). I'm not sure how to use plot(f, x1 = NA) after my first plot(). The examples I managed to find on google all use plot() followed by lines(). [In Matlab, I'd just say "hold" in between these calls.]
plot(f, x1=NA) plot(f, x2=NA, add=TRUE)
Also, I'm forced to call win.graph() before my first plot() to see the first plot. Is that normal?
no
4. I really could use some guidance on this part. I need to use rcs() to fit points in a high-dimensional space and I'm trying to understand and use it correctly.
keep reading
I started with testing it on just x,y dimensions so that I can visually evaluate the fitting. I tried y=x, y=x^2 etc, adding Gaussian noise each time (to the y). I plot original x,y and x,y' where y' is calculated using the co-efficients returned by rcs. I find that the regression curve differs from the actual points by as high as 10^5 with 3 knots and roughly -10^5 with 4 knots as I make y=x^2, y=x^3....
wait until you have studied regression Frank
If this is NOT a good way to test fitting, could you pls tell me a better way? Respectfully, sp --- On Tue, 12/23/08, Frank E Harrell Jr <f.harrell at vanderbilt.edu> wrote:
From: Frank E Harrell Jr <f.harrell at vanderbilt.edu> Subject: Re: [R] newbie problem using Design.rcs To: "David Winsemius" <dwinsemius at comcast.net> Cc: to_rent_2000 at yahoo.com, r-help at r-project.org Date: Tuesday, December 23, 2008, 9:41 AM In addition to David's excellent response, I'll add that your problems seem to be statistical and not programming ones. I recommend that you spend a significant amount of time with a good regression text or course before using the software. Also, with Design you can find out the algebraic form of the fit: f <- ols(y ~ rcs(x,3), data=mydata) Function(f) Frank David Winsemius wrote:
On Dec 22, 2008, at 11:38 PM, sp wrote:
Hi, I read data from a file. I'm trying to
understand how to use Design.rcs by using simple test data first. I use 1000 integer values (1,...,1000) for x (the predictor) with some noise (x+.02*x) and I set the response variable y=x. Then, I try rcs and ols as follows:
Not sure what sort of noise that is.
m = ( sqrt(y1) ~ ( rcs(x1,3) ) ); #I tried without
sqrt also
f = ols(m, data=data_train.df); print(f); [I plot original x1,y1 vectors and the regression
as in
y <- coef2[1] + coef2[2]*x1 + coef2[3]*x1*x1]
That does not look as though it would capture the
structure of a restricted **cubic** spline. The usual method in Design for plotting a model prediction would be:
plot(f, x1 = NA)
But this gives me a VERY bad fit: "
Can you give some hint why you consider this to be a
"VERY bad fit"? It appears a rather good fit to
me, despite the test case apparently not being construct
with any curvature which is what the rcs modeling strategy
should be detecting.
-- Frank E Harrell Jr Professor and Chair
School of Medicine
Department of Biostatistics
Vanderbilt University
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University
Dr. Harrell, Thanks for your patient replies. I've two more questions: 1) Is your book appropriate for beginners or is it more for advanced users? 2) f <- ols(y ~ rcs(x,3), data=mydata) Function(f) does not produce anything for me (i.e.) it's empty. Sincerely, sp
--- On Tue, 12/23/08, Frank E Harrell Jr <f.harrell at vanderbilt.edu> wrote:
From: Frank E Harrell Jr <f.harrell at vanderbilt.edu> Subject: Re: [R] newbie problem using Design.rcs To: to_rent_2000 at yahoo.com Cc: "David Winsemius" <dwinsemius at comcast.net>, r-help at r-project.org Date: Tuesday, December 23, 2008, 4:57 PM sp wrote:
2. I didn't have x^3 b/c that co-efficient happens
to be zero in this fitting. That's strange.
Also, I'm forced to call win.graph() before my
first plot() to see the first plot. Is that normal? no
I started with testing it on just x,y dimensions so
that I can visually evaluate the fitting. I tried y=x, y=x^2 etc, adding Gaussian noise each time (to the y).
I plot original x,y and x,y' where y' is
calculated using the co-efficients returned by rcs. I find that the regression curve differs from the actual points by as high as 10^5 with 3 knots and roughly -10^5 with 4 knots as I make y=x^2, y=x^3.... wait until you have studied regression Frank
If this is NOT a good way to test fitting, could you
pls tell me a better way?
Respectfully, sp
sp wrote:
Dr. Harrell, Thanks for your patient replies. I've two more questions: 1) Is your book appropriate for beginners or is it more for advanced users?
It is in the middle of those two.
2) f <- ols(y ~ rcs(x,3), data=mydata) Function(f) does not produce anything for me (i.e.) it's empty.
Here's a test I just ran
> y <- rnorm(100)
> x <- runif(100)
> f <- ols(y ~ rcs(x,3))
> Function(f)
function(x = NA) {-0.18307779+0.91343138*
x-2.1908543*pmax(x-0.10568075,0)^3+3.8312836*pmax(x-0.45620427,0)^3-1.6404293*pmax(x-0.92434146,0)^3
}
<environment: 0x2ca30a8>
If you are doing this inside { } you will need to do print(Function(f))
Note this is a simplified version of the rcs with 2 redundant terms to
avoid writing in terms of differences in cubes.
Frank
Sincerely, sp --- On Tue, 12/23/08, Frank E Harrell Jr <f.harrell at vanderbilt.edu> wrote:
From: Frank E Harrell Jr <f.harrell at vanderbilt.edu> Subject: Re: [R] newbie problem using Design.rcs To: to_rent_2000 at yahoo.com Cc: "David Winsemius" <dwinsemius at comcast.net>, r-help at r-project.org Date: Tuesday, December 23, 2008, 4:57 PM sp wrote:
2. I didn't have x^3 b/c that co-efficient happens
to be zero in this fitting. That's strange.
Also, I'm forced to call win.graph() before my
first plot() to see the first plot. Is that normal? no
I started with testing it on just x,y dimensions so
that I can visually evaluate the fitting. I tried y=x, y=x^2 etc, adding Gaussian noise each time (to the y).
I plot original x,y and x,y' where y' is
calculated using the co-efficients returned by rcs. I find that the regression curve differs from the actual points by as high as 10^5 with 3 knots and roughly -10^5 with 4 knots as I make y=x^2, y=x^3.... wait until you have studied regression Frank
If this is NOT a good way to test fitting, could you
pls tell me a better way?
Respectfully, sp
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University