Skip to content

Queries on regression analysis

4 messages · Chitra, Roman Luštrik, Guillaume Adeux +1 more

#
We have the data of "gall_diameter" and "elevations". Our objective is to
see how does gall diameter vary along the elevation gradient. In our case
the elevation gradient refers to the range between 1500 to 2500 m asl with
data collected in every 250 m interval.
Our data did not follow normal distribution. Gall_diameter is a continuous
dependent variable. Can we apply "glm" to see the relationship between
gall-diameter and elevation in our case?
How can we decide the distribution in our data and its family?
Thank you all in advance for your help.
#
The assumption of linear regression is that residuals follow a normal
distribution. Fit a GLM and check the diagnostics plot.

Cheers,
Roman
On Thu, Aug 8, 2019 at 1:16 PM Chitra Baniya <cbbaniya at gmail.com> wrote:

            

  
    
#
As mentionned by Roman, linear regression does not imply that the response
is normally distributed, rather that the residuals of the model are
normally distributed.
The choice of the distribution family should mainly be based on knowledge,
that is to say poisson/negative binomial for counts (strictly positive
integers), beta for rates/proportions (bounded between 0 and 1), binomial
for success/failures, gaussian for continuous data which can be negative or
positive, gamma for continuous positive (usually doesn't work so well in my
case).
The choice of the link is a less flexible, all families have their default
link (poisson=log, binomial=logit...). The main purpose is to linearize the
relationship between X and Y.
Hope this helps,
GA2

Le jeu. 8 ao?t 2019 ? 13:16, Chitra Baniya <cbbaniya at gmail.com> a ?crit :

  
  
#
While the previous responders have provided some useful advice, it was a
bit misleading.  The linear model for continuous responses does not
automatically assume a normal distribution (of the errors, of which the
residuals are an estimate).  A specific way of estimating the conditional
mean in the linear model assumes a normal distribution of errors.  More
generally, you can estimate the quantiles of the empirical distribution of
the continuous responses with linear quantile regression, which makes no
assumption about a parametric form of the error distribution and naturally
accommodates heterorgeneity.  You can use the median estimate (0.50
quantile regression) as a measure of central tendency rather than the
mean.  But, almost always it is more informative to estimate some interval
of quantiles (say 0.10 to 0.90) to adequately characterize how the response
changes with covariates.   More advanced transformation approaches with
quantile regression will allow you to handle proportions (responses bounded
on [0, 1] interval) and discrete counts.

Brian

Brian S. Cade, PhD

U. S. Geological Survey
Fort Collins Science Center
2150 Centre Ave., Bldg. C
Fort Collins, CO  80526-8818

email:  cadeb at usgs.gov <brian_cade at usgs.gov>
tel:  970 226-9326
On Thu, Aug 8, 2019 at 5:16 AM Chitra Baniya <cbbaniya at gmail.com> wrote: