Skip to content

lm and R-squared (newbie)

3 messages · PtitBleu, Gabor Grothendieck, David Winsemius

#
Hello,

I've two data.frames (data1 and data4), dec="." and sep=";".
http://r.789695.n4.nabble.com/file/n4199964/data1.txt data1.txt 
http://r.789695.n4.nabble.com/file/n4199964/data4.txt data4.txt 

When I do
plot(data1$nx,data1$ny, col="red")
points(data4$nx,data4$ny, col="blue")
,  results seem very similar (at least to me) but the R-squared of
summary(lm(data1$ny ~ data1$nx))
and
summary(lm(data4$ny ~ data4$nx))
are very different (0.48 against 0.89).

Could someone explain me the reason?

To be complete, I am looking for an simple indicator telling me if it is
worthwhile to keep the values provided by lm. I thought that R-squared could
do the job. For me, if R-squared is far from 1, the data are not good enough
to perform a linear fit.
It seems that I'm wrong.

Thanks for your explainations.
Ptit Bleu.


 


--
View this message in context: http://r.789695.n4.nabble.com/lm-and-R-squared-newbie-tp4199964p4199964.html
Sent from the R help mailing list archive at Nabble.com.
#
On Thu, Dec 15, 2011 at 8:35 AM, PtitBleu <ptit_bleu at yahoo.fr> wrote:
The problem is the outliers. Try using a robust measure instead.  If
we replace Pearson correlations with Spearman (rank) correlations they
are much closer:
[1] 0.8916924
[1] 0.4868575
[1] 0.8104026
[1] 0.7266705
#
On Dec 15, 2011, at 8:35 AM, PtitBleu wrote:

            
Because you failed to do an adequate assessment of your data. Try this  
ploting exercsie and I think you will see the reason for the  
differences:

plot(data1$nx,data1$ny, col="red", xlim=range(c(data1$nx,data4$nx)),  
ylim=range(c(data1$ny,data4$ny)) )