Message du 20/10/14 ? 16h50
De : "stephen sefick"
A : "Martin Weiser"
Copie ? : "V. Coudrain" , "r-sig-ecology"
Objet : Re: [R-sig-eco] Regression with few observations per factor level
You are more or less preforming an ANOVA/ANCOVA on your data? As pointed out earlier, all of the normal theory regression assumptions apply. Assuming all of those things are satisfied then if you have large confidence intervals and there are significant differences between groups I don't see why you couldn't correctly infer something about the treatments. Maybe I am missing something.
Stephen?
On Mon, Oct 20, 2014 at 8:43 AM, Martin Weiser wrote:
Hi,
coefficients and their p-values are reliable if your data are OK and you
do know enough about the process that generated them, so you can choose
appropriate model. With 4 points per line, it may be really difficult to
identify bad fit or outliers.
For example: simple linear regression needs constant variance of the
normal distribution from which residuals are drawn -? along the
regression line - to work properly.? With 4 points, you can hardly
estimate this, but if you know enough about the process that generated
the data, you are safe. If you do not know, it is not easy to say
anything about the nature of the process that generated the data.
If you know (or can assume) that there is simple linear relationship,
you can say: "slope of this relationship is such and such", but if you
want to estimate both the nature of the relationship ("A *linearly*
depends on B") and its magnitude ("the slope of this relationship
is ..."), p-values would not help you much.
Of course, I may be wrong - I am not a statistician, just a user.
Best,
Martin W.
V. Coudrain p??e v Po 20. 10. 2014 v 13:37 +0200:
Thank you very much. If I get it right, the CI get wider, my test has less power and the probability of getting a significant relation decreases. What about the significant coefficients, are they reliable?
Message du 20/10/14 ? 11h30
De : "Roman Lu?trik"
A : "V. Coudrain"
Copie ? : "r-sig-ecology at r-project.org"
Objet : Re: [R-sig-eco] Regression with few observations per factor level
I think you can, but the confidence intervals will be rather large due to number of samples.
Notice how standard errors change for sample size (per group) from 4 to 30.
pg <- 4 # pg = per group> my.df <- data.frame(var = c(rnorm(pg, mean = 3), rnorm(pg, mean = 1), rnorm(pg, mean = 11), rnorm(pg, mean = 30)), +? ? ? ? ? ? ? ? ? ? ?trt = rep(c("trt1", "trt2", "trt3", "trt4"), each = pg), +? ? ? ? ? ? ? ? ? ? ?cov = runif(pg*4)) # 4 groups> summary(lm(var ~ trt + cov, data = my.df))
Call:lm(formula = var ~ trt + cov, data = my.df)
Residuals:? ? ?Min? ? ? ?1Q? ?Median? ? ? ?3Q? ? ? Max -1.63861 -0.46080? 0.03332? 0.66380? 1.27974
Coefficients:? ? ? ? ? ? Estimate Std. Error t value Pr(>|t|)? ? (Intercept)? ?1.2345? ? ?1.0218? ?1.208? ? 0.252? ? trttrt2? ? ? -0.7759? ? ?0.8667? -0.895? ? 0.390? ? trttrt3? ? ? ?7.8503? ? ?0.8308? ?9.449? 1.3e-06 ***trttrt4? ? ? 28.2685? ? ?0.9050? 31.236? 4.3e-12 ***cov? ? ? ? ? ?1.4027? ? ?1.1639? ?1.205? ? 0.253? ? ---Signif. codes:? 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
Residual standard error: 1.154 on 11 degrees of freedomMultiple R-squared:? 0.9932,Adjusted R-squared:? 0.9908 F-statistic: 404.4 on 4 and 11 DF,? p-value: 7.467e-12
pg <- 30 # pg = per group> my.df <- data.frame(var = c(rnorm(pg, mean = 3), rnorm(pg, mean = 1), rnorm(pg, mean = 11), rnorm(pg, mean = 30)), +? ? ? ? ? ? ? ? ? ? ?trt = rep(c("trt1", "trt2", "trt3", "trt4"), each = pg), +? ? ? ? ? ? ? ? ? ? ?cov = runif(pg*4)) # 4 groups> summary(lm(var ~ trt + cov, data = my.df))
Call:lm(formula = var ~ trt + cov, data = my.df)
Residuals:? ? Min? ? ? 1Q? Median? ? ? 3Q? ? ?Max -2.5778 -0.6584 -0.0185? 0.6423? 3.2077
Coefficients:? ? ? ? ? ? Estimate Std. Error t value Pr(>|t|)? ? (Intercept)? 2.76961? ? 0.25232? 10.977? < 2e-16 ***trttrt2? ? ?-1.75490? ? 0.28546? -6.148 1.17e-08 ***trttrt3? ? ? 8.40521? ? 0.28251? 29.752? < 2e-16 ***trttrt4? ? ?27.04095? ? 0.28286? 95.599? < 2e-16 ***cov? ? ? ? ? 0.05129? ? 0.32523? ?0.158? ? 0.875? ? ---Signif. codes:? 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
Residual standard error: 1.094 on 115 degrees of freedomMultiple R-squared:? 0.9913,Adjusted R-squared:? 0.991 F-statistic:? 3269 on 4 and 115 DF,? p-value: < 2.2e-16
On Mon, Oct 20, 2014 at 10:53 AM, V. Coudrain? wrote:
Hi, I would like to test the impact of a treatment of some variable using regression (e.g. lm(var ~ trt + cov)).? However I only have four observations per factor level. Is it still possible to apply a regression with such a small sample size. I think that i should be difficult to correctly estimate variance.Do you think that I rather should compute a non-parametric test such as Kruskal-Wallis? However I need to include covariables in my models and I am not sure if basic non-parametric tests are suitable for this. Thanks for any suggestion.