Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Jason Rupert
> Sent: Saturday, February 14, 2009 4:48 PM
> To: David Winsemius
> Cc: R-help at r-project.org
> Subject: Re: [R] Website, book, paper, etc. that shows example plots of
> distributions?
>
> Many thanks to Greg L. Snow and David Winsemius for their responses.
>
> First off I can safely say I don't know enough statistics to be
> dangerous, but hopefully I will get to that point:)
>
> Regarding the goal - ultimately I would like to use linear regression
> (constrained for using linear regression at this point) for my data.? I
> thought the requirements for using linear regression was the following
> (I pulled this list from
> www.utexas.edu/courses/schwab/sw318_spring_2004/SolvingProblems/Class27
> _RegressionNCorrHypoTest.ppt):
>
> The assumptions required for utilizing a regression equation are the
> same as the assumptions for the test of significance of a correlation
> coefficient.
> Both variables are interval level.
> Both variables are normally distributed.
> The relationship between the two variables is linear.
> The variance of the values of the dependent variable is uniform for all
> values of the independent variable (equality of variance).
>
> Thus, I was going to attempt to (1) identify which distribution my data
> most closely represents, (2) translate my data so that it is normal,
> and (3) then use linear regression on the data.
>
> However, if
> "The assumptions of most regression methods is that the *errors* need
> to have the desired relationship between means and variance, and not
> that the dependent variable be "normal". Many times the apparent non-
> normality will be "explained" or "captured" by the regression model."
>
> Does this mean I can just "do" linear regression without translating my
> data and it will be okay?
>
> Note that I was using "lm" from R to access the errors, however, I had
> not an opportunity to do much analysis of those results to determine if
> they are Gaussian or not.
>
> I guess I am going to try to track down the following documents:
> (1) Statistical Distributions (Paperback)
> by Merran Evans (Author), Nicholas Hastings (Author), Brian Peacock
> (Author)
> # ISBN-10: 0471371246
> # ISBN-13: 978-0471371243
>
> (2) Regression Modeling Strategies (Hardcover)
> by Frank E. Jr. Harrell (Author)
> # ISBN-10: 0387952322
> # ISBN-13: 978-0387952321
>
> Maybe electronic versions of those documents are available.? My wife is
> already giving me a hard time the volume of books around.
>
> Thank you again for all your feedback and insights.
>
>
> --- On Fri, 2/13/09, David Winsemius <dwinsemius at comcast.net> wrote:
> From: David Winsemius <dwinsemius at comcast.net>
> Subject: Re: [R] Website, book, paper, etc. that shows example plots of
> distributions?
> To: jasonkrupert at yahoo.com
> Cc: "Gabor Grothendieck" <ggrothendieck at gmail.com>, R-help at r-
> project.org
> Date: Friday, February 13, 2009, 9:10 AM
>
> This is probably the right time to issue a warning about the error of
> making
> transformations on the dependent variable before doing your analysis.
> The
> classic error that newcomers to statistics commit is to decide that
> they want to
> "make their data normal". The assumptions of most regression methods
> is that the *errors* need to have the desired relationship between
> means and
> variance, and not that the dependent variable be "normal". Many times
> the apparent non-normality will be "explained" or "captured"
> by the regression model. Other methods of modeling non-linear
> dependence are
> also available.
>
> I found Harrell's book "Regression Modeling Strategies" to be an
> excellent source for alternatives. My copy of V&R's MASS is only the
> second edition but chapters 5 & 6 in that edition on linear models also
> had
> examples of using QQ plots on residuals. Checking that text's website I
> see
> that chapters 6 at least is probably similar. They include the scripts
> from
> their chapters along with the MASS package (installed as part of the VR
> bundle).
> My copy is entitled "ch06.r" and resides in the scripts subdirectory:
> /Library/Frameworks/R.framework/Versions/2.8/Resources/library/MASS/scr
> ipts/ch06.R
>
> --David Winsemius
>
>
> On Feb 13, 2009, at 8:11 AM, Jason Rupert wrote:
>
> > Thank you very much. Thank you again regarding the suggestion below.
> I
> will give that a shot and I guess I've got my work counted out for me.
> I
> counted 45 different distributions.
> >
> > Is the best way to get a QQPlot of each, to run through producing a
> data
> set for each distribution and then using the qqplot function to get a
> QQplot of
> the distribution and then compare it with my data distribution?
> >
> > As you can tell I am not a trained statistician, so any guidance or
> suggested further reading is greatly appreciated.
> >
> > I guess I am pretty sure my data is not a normal distribution due to
> doing
> some of the empirical "Goodness of Fit" tests and comparing the QQplot
> of my data against the QQPlot of a normal distribution with the same
> number of
> points. I guess the next step is to figure out which distribution my
> data most
> closely matches.
> >
> > Also, I guess I could also fool around and take the log, sqrt, etc.
> of my
> data and see if it will then more closely resemble a normal
> distribution.
> >
> > Thank you again for assisting this novice data analyst who is trying
> to
> gain a better understanding of the techniques using this powerful
> software
> package.
> >
> >
> >
> >
> > --- On Fri, 2/13/09, Gabor Grothendieck <ggrothendieck at gmail.com>
> wrote:
> > From: Gabor Grothendieck <ggrothendieck at gmail.com>
> > Subject: Re: [R] Website, book, paper, etc. that shows example plots
> of
> distributions?
> > To: jasonkrupert at yahoo.com
> > Cc: R-help at r-project.org
> > Date: Friday, February 13, 2009, 5:43 AM
> >
> > You can readily create a dynamic display for using qqplot and similar
> functions
> > in conjunction with either the playwith or TeachingDemos packages.
> >
> > For example, to investigate the effect of the shape parameter in the
> skew
> > normal distribution on its qqplot relative to the normal
> distribution:
> >
> > library(playwith)
> > library(sn)
> > playwith(qqnorm(rsn(100, shape = shape)),
> > parameters = list(shape = seq(-3, 3, .1)))
> >
> > Now move the slider located at the bottom of the window that
> > appears and watch the plot change in response to changing
> > the shape value.
> >
> > You can find more distributions here:
> > http://cran.r-project.org/web/views/Distributions.html
> >
> > On Thu, Feb 12, 2009 at 1:04 PM, Jason Rupert
> <jasonkrupert at yahoo.com>
> > wrote:
> >> By any chance is any one aware of a website, book, paper, etc. or
> > combinations of those sources that show plots of different
> distributions?
> >>
> >> After reading a pretty good whitepaper I became aware of the benefit
> of I
> > the benefit of doing Q-Q plots and histograms to help assess a
> distribution.
> > The whitepaper is called:
> >> "Univariate Analysis and Normality Test Using SAS, Stata, and
> > SPSS*" , (c) 2002-2008 The Trustees of Indiana University Univariate
> > Analysis and Normality Test: 1, Hun Myoung Park
> >>
> >> Unfortunately the white paper does not provide an extensive amount
> of
> > example distributions plotted using Q-Q plots and histograms, so I am
> curious if
> > there is a "portfolio"-type website or other whitepaper shows
> > examples of various types of distributions.
> >>
> >> It would be helpful to see a bunch of Q-Q plots and their associated
> > histograms to get an idea of how the distribution looks in comparison
> against
> > the Gaussian.
> >>
> >> I think seeing the plot really helps.
> >>
> >> Thank you for any insights.
> >>
> >>
> >>
> >> [[alternative HTML version deleted]]
> >>
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >>
> >
> >
> >
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>
>
>
>
> [[alternative HTML version deleted]]