Skip to content
Prev 27672 / 29559 Next

Alternate statistical test to linear regression?

Hi Greg and others,

Thank you for these explanations and clarifications, as they are much appreciated!?
Indeed, I do have some datasets that exhibit some distinct skewness. Simple scatter plots do show at least some linearity between my x and y variables (albeit weak, given the scattered nature of data points), but could this be sufficient to try simple linear regression? Also, if the data is overly skewed, could transforming it (such as logarithmically) justify the use of simple linear regression and/or correlation, if it causes the data to become mildly skewed in distribution? I have large sample sizes for all of my datasets, and the variables are continuous.?
That would pretty much cover all of my questions concerning this!
Thank you, once again, for your time!
-----Original Message-----
From: Greg Snow <538280 at gmail.com>
To: rain1290 <rain1290 at aim.com>
Cc: r-sig-geo <r-sig-geo at r-project.org>
Sent: Wed, Oct 23, 2019 3:49 pm
Subject: Re: [R-sig-Geo] Alternate statistical test to linear regression?

First, please expunge the "(N>30)" concept from your mind.? This is an
oversimplified rule of thumb used in introductory statistics courses
(I am guilty of doing this in intro stat as well, but I try to
emphasize to my students that it is only a rule of thumb for that
class and the truth is more complex once you are in the real world, so
consult with a statistician).? There is nothing magical about a sample
size of 30, I have seen cases where n=6 is large enough for the CLT
and cases where n=10,000 was not big enough.

If the data is not overly skewed and your sample size is large then
you can just use regression as is and the inference will be
approximately correct (with a really good approximation).? But with
skewness we often prefer the median over the mean and least squares
regression is equivalent to fitting a mean, some of the robust
regression options are equivalent to fitting a median, so they may be
preferable on that count.

Note that Pearson's correlation does not test linearity, it assumes
linearity (and bivariate normality).? Most issues with regression will
be the same for the correlation.
On Wed, Oct 23, 2019 at 11:25 AM <rain1290 at aim.com> wrote: