Hello, I've two data.frames (data1 and data4), dec="." and sep=";". http://r.789695.n4.nabble.com/file/n4199964/data1.txt data1.txt http://r.789695.n4.nabble.com/file/n4199964/data4.txt data4.txt When I do plot(data1$nx,data1$ny, col="red") points(data4$nx,data4$ny, col="blue") , results seem very similar (at least to me) but the R-squared of summary(lm(data1$ny ~ data1$nx)) and summary(lm(data4$ny ~ data4$nx)) are very different (0.48 against 0.89). Could someone explain me the reason? To be complete, I am looking for an simple indicator telling me if it is worthwhile to keep the values provided by lm. I thought that R-squared could do the job. For me, if R-squared is far from 1, the data are not good enough to perform a linear fit. It seems that I'm wrong. Thanks for your explainations. Ptit Bleu. -- View this message in context: http://r.789695.n4.nabble.com/lm-and-R-squared-newbie-tp4199964p4199964.html Sent from the R help mailing list archive at Nabble.com.
lm and R-squared (newbie)
3 messages · PtitBleu, Gabor Grothendieck, David Winsemius
On Thu, Dec 15, 2011 at 8:35 AM, PtitBleu <ptit_bleu at yahoo.fr> wrote:
Hello, I've two data.frames (data1 and data4), dec="." and sep=";". http://r.789695.n4.nabble.com/file/n4199964/data1.txt data1.txt http://r.789695.n4.nabble.com/file/n4199964/data4.txt data4.txt When I do plot(data1$nx,data1$ny, col="red") points(data4$nx,data4$ny, col="blue") , ?results seem very similar (at least to me) but the R-squared of summary(lm(data1$ny ~ data1$nx)) and summary(lm(data4$ny ~ data4$nx)) are very different (0.48 against 0.89). Could someone explain me the reason? To be complete, I am looking for an simple indicator telling me if it is worthwhile to keep the values provided by lm. I thought that R-squared could do the job. For me, if R-squared is far from 1, the data are not good enough to perform a linear fit. It seems that I'm wrong.
The problem is the outliers. Try using a robust measure instead. If we replace Pearson correlations with Spearman (rank) correlations they are much closer:
# R^2 based on Pearson correlations cor(fitted(lm(ny ~ nx, data4)), data4$ny)^2
[1] 0.8916924
cor(fitted(lm(ny ~ nx, data1)), data1$ny)^2
[1] 0.4868575
# R^2 based on Spearman (rank) correlations cor(fitted(lm(ny ~ nx, data4)), data4$ny, method = "spearman")^2
[1] 0.8104026
cor(fitted(lm(ny ~ nx, data1)), data1$ny, method = "spearman")^2
[1] 0.7266705
Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
On Dec 15, 2011, at 8:35 AM, PtitBleu wrote:
Hello, I've two data.frames (data1 and data4), dec="." and sep=";". http://r.789695.n4.nabble.com/file/n4199964/data1.txt data1.txt http://r.789695.n4.nabble.com/file/n4199964/data4.txt data4.txt When I do plot(data1$nx,data1$ny, col="red") points(data4$nx,data4$ny, col="blue") , results seem very similar (at least to me) but the R-squared of summary(lm(data1$ny ~ data1$nx)) and summary(lm(data4$ny ~ data4$nx)) are very different (0.48 against 0.89). Could someone explain me the reason?
Because you failed to do an adequate assessment of your data. Try this ploting exercsie and I think you will see the reason for the differences: plot(data1$nx,data1$ny, col="red", xlim=range(c(data1$nx,data4$nx)), ylim=range(c(data1$ny,data4$ny)) )
David. > > To be complete, I am looking for an simple indicator telling me if > it is > worthwhile to keep the values provided by lm. I thought that R- > squared could > do the job. For me, if R-squared is far from 1, the data are not > good enough > to perform a linear fit. > It seems that I'm wrong. > > Thanks for your explainations. > Ptit Bleu. > > > > > > -- > View this message in context: http://r.789695.n4.nabble.com/lm-and-R-squared-newbie-tp4199964p4199964.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT