sciplot question

17 messages · Jarle Bjørgeengen, Manuel Morales, Spencer Graves +1 more

Original

1

17

Jarle Bjørgeengen

Fri, May 22, 2009 9:38 AM #

Hi,

I would like to have lineplot.CI and barplot.CI to actually plot  
confidence intervals , instead of standard error.

I understand I have to use the ci.fun option, but I'm not quite sure  
how.

Like this :

 >  qt(0.975,df=n-1)*s/sqrt(n)

but how can I apply it to visualize the length of the student's T  
confidence intervals rather than the stdandard error of the plotted  
means ?

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Best regards                                       .~.
Jarle Bj?rgeengen                                  /V\
Mob: +47 9155 7978                                // \\
http://www.uio.no/sok?person=jb                  /(   )\
while(<>){if(s/^(.*\?)$/42 !/){print "$1 $_"}}    ^`~'^
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1 day later

Sun, May 24, 2009 6:02 AM #

You define your own function for the confidence intervals. The function
needs to return the two values representing the upper and lower CI
values. So:

qt.fun <- function(x) qt(p=.975,df=length(x)-1)*sd(x)/sqrt(length(x))
my.ci <- function(x) c(mean(x)-qt.fun(x), mean(x)+qt.fun(x))

lineplot.CI(x.factor = dose, response = len, data = ToothGrowth,
    ci.fun=my.ci)

Manuel

On Fri, 2009-05-22 at 18:38 +0200, Jarle Bj?rgeengen wrote:

http://mutualism.williams.edu

Jarle Bjørgeengen

Sun, May 24, 2009 6:13 AM #

Great,

thanks Manuel.

Just for curiosity, any particular reason you chose standard error ,  
and not confidence interval as the default (the naming of the plotting  
functions associates closer to the confidence interval .... ) error  
indication .

- Jarle Bj?rgeengen

On May 24, 2009, at 3:02 , Manuel Morales wrote:

Frank E Harrell Jr

Sun, May 24, 2009 6:34 AM #

Jarle Bj?rgeengen wrote:

Minor improvement: mean(x) + qt.fun(x)*c(-1,1) but in general confidence 
limits should be asymmetric (a la bootstrap).

I'm not sure how NAs are handled.

Frank

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

Jarle Bjørgeengen

Sun, May 24, 2009 6:52 AM #

On May 24, 2009, at 3:34 , Frank E Harrell Jr wrote:

Thanks,

if the date is normally distributed , symmetric confidence interval  
should be ok , right ?

When plotting the individual sample , it looks normally distributed.

Best regards.
Jarle Bj?rgeengen

Frank E Harrell Jr

Sun, May 24, 2009 7:42 AM #

Jarle Bj?rgeengen wrote:

Yes; I do see a normal distribution about once every 10 years.

An appropriate qqnorm plot is a better way to check, but often the data 
cannot tell you about the normality of themselves.   It's usually better 
to use methods (e.g., bootstrap) that do not assume normality and that 
provide skewed confidence intervals if the data are skewed.

Frank

Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

Sun, May 24, 2009 8:13 AM #

Dear Frank, et al.:

Frank E Harrell Jr wrote:

To what do you attribute the nonnormality you see in most cases?  


           (1) Unmodeled components of variance that can generate errors 
in interpretation if ignored, even with bootstrapping? 


           (2) Honest outliers that do not relate to the phenomena of 
interest and would better be removed through improved checks on data 
quality, but where bootstrapping is appropriate (provided the data are 
not also contaminated with (1))? 


           (3) Situations where the physical application dictates a 
different distribution such as binomial, lognormal, gamma, etc., 
possibly also contaminated with (1) and (2)? 


      I've fit mixtures of normals to data before, but one needs to be 
careful about not carrying that to extremes, as the mixture may be a 
result of (1) and therefore not replicable. 


      George Box once remarked that he thought most designed experiments 
included split plotting that had been ignored in the analysis.  That is 
only a special case of (1). 


      Thanks,
      Spencer Graves

Frank E Harrell Jr

Sun, May 24, 2009 8:39 AM #

spencerg wrote:

Spencer,

Those are all important reasons for non-normality of margin 
distributions.  But the biggest reason of all is that the underlying 
process did not know about the normal distribution.  Normality in raw 
data is usually an accident.

Frank

Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

Jarle Bjørgeengen

Mon, May 25, 2009 12:05 AM #

On May 24, 2009, at 4:42 , Frank E Harrell Jr wrote:

Is it not true that the students-T (qt(... and so on) confidence  
intervals is quite robust against non-normality too ?

A teacher told me that, the students-T symmetric confidence intervals  
will give a adequate picture of the variability of the data in this  
particular case.

Best rgds
Jarle Bj?rgeengen

Frank E Harrell Jr

Mon, May 25, 2009 4:22 AM #

Jarle Bj?rgeengen wrote:

Incorrect.  Try running some simulations on highly skewed data.  You 
will find situations where the confidence coverage is not very close of 
the stated level (e.g., 0.95) and more situations where the overall 
coverage is 0.95 because one tail area is near 0 and the other is near 0.05.

The larger the sample size, the more skewness has to be present to cause 
this problem.

Frank

Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

Mon, May 25, 2009 7:01 AM #

Frank E Harrell Jr wrote:

Frank: 


      Might there be a difference between the physical and social 
sciences on this issue? 


      The central limit effect works pretty well with many kinds of 
manufacturing data, except that it is often masked by between-lot 
components of variance.  The first differences in log(prices) are often 
long-tailed and negatively skewed.  Standard GARCH and similar models 
handle the long tails well but miss the skewness, at least in what I've 
seen.  I think that can be fixed, but I have not yet seen it done. 


      Social science data, however, often involve discrete scales where 
the raters' interpretations of the scales rarely match any standard 
distribution.  Transforming to latent variables, e.g., via factor 
analysis, may help but do not eliminate the problem. 


      Thanks for your comments. 

      Spencer

Frank E Harrell Jr

Mon, May 25, 2009 12:06 PM #

spencerg wrote:

Hi Spencer,

I doubt that the difference is large, but biological measurements seem 
to be more of a problem.

The central limit theorem in and of itself doesn't help because it 
doesn't tell you how large N must be before normality holds well enough.

Good example.  Many of the scales I've seen are non-normal or even 
multi-modal.

Thanks for yours
Frank

Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

Mon, May 25, 2009 5:25 PM #

On Mon, 2009-05-25 at 06:22 -0500, Frank E Harrell Jr wrote:

Jarle Bj?rgeengen wrote:

On May 24, 2009, at 4:42 , Frank E Harrell Jr wrote:

Jarle Bj?rgeengen wrote:

On May 24, 2009, at 3:34 , Frank E Harrell Jr wrote:

Jarle Bj?rgeengen wrote:

Great,
thanks Manuel.
Just for curiosity, any particular reason you chose standard error 
, and not confidence interval as the default (the naming of the 
plotting functions associates closer to the confidence interval 
.... ) error indication .
- Jarle Bj?rgeengen
On May 24, 2009, at 3:02 , Manuel Morales wrote:

You define your own function for the confidence intervals. The 
function
needs to return the two values representing the upper and lower CI
values. So:

qt.fun <- function(x) qt(p=.975,df=length(x)-1)*sd(x)/sqrt(length(x))
my.ci <- function(x) c(mean(x)-qt.fun(x), mean(x)+qt.fun(x))

Minor improvement: mean(x) + qt.fun(x)*c(-1,1) but in general 
confidence limits should be asymmetric (a la bootstrap).

Thanks,
if the date is normally distributed , symmetric confidence interval 
should be ok , right ?

Yes; I do see a normal distribution about once every 10 years.

Is it not true that the students-T (qt(... and so on) confidence 
intervals is quite robust against non-normality too ?

A teacher told me that, the students-T symmetric confidence intervals 
will give a adequate picture of the variability of the data in this 
particular case.

Incorrect.  Try running some simulations on highly skewed data.  You 
will find situations where the confidence coverage is not very close of 
the stated level (e.g., 0.95) and more situations where the overall 
coverage is 0.95 because one tail area is near 0 and the other is near 0.05.

The larger the sample size, the more skewness has to be present to cause 
this problem.

OK - I'm convinced. It turns out that the first change I made to sciplot
was to allow for asymmetric error bars. Is there an easy way (i.e.,
existing package) to bootstrap confidence intervals in R. If so, I'll
try to incorporate this as an option in sciplot.

BTW Jarle - to answer an earlier question, standard error is "the
standard" in my field, ecology, and that's why it's the current default
in sciplot.

Manuel

http://mutualism.williams.edu

Frank E Harrell Jr

Mon, May 25, 2009 7:37 PM #

Manuel Morales wrote:

On Mon, 2009-05-25 at 06:22 -0500, Frank E Harrell Jr wrote:

Jarle Bj?rgeengen wrote:

On May 24, 2009, at 4:42 , Frank E Harrell Jr wrote:

Jarle Bj?rgeengen wrote:

On May 24, 2009, at 3:34 , Frank E Harrell Jr wrote:

Jarle Bj?rgeengen wrote:

Great,
thanks Manuel.
Just for curiosity, any particular reason you chose standard error 
, and not confidence interval as the default (the naming of the 
plotting functions associates closer to the confidence interval 
.... ) error indication .
- Jarle Bj?rgeengen
On May 24, 2009, at 3:02 , Manuel Morales wrote:

You define your own function for the confidence intervals. The 
function
needs to return the two values representing the upper and lower CI
values. So:

qt.fun <- function(x) qt(p=.975,df=length(x)-1)*sd(x)/sqrt(length(x))
my.ci <- function(x) c(mean(x)-qt.fun(x), mean(x)+qt.fun(x))

Minor improvement: mean(x) + qt.fun(x)*c(-1,1) but in general 
confidence limits should be asymmetric (a la bootstrap).

Thanks,
if the date is normally distributed , symmetric confidence interval 
should be ok , right ?

Yes; I do see a normal distribution about once every 10 years.

Is it not true that the students-T (qt(... and so on) confidence 
intervals is quite robust against non-normality too ?

A teacher told me that, the students-T symmetric confidence intervals 
will give a adequate picture of the variability of the data in this 
particular case.

Incorrect.  Try running some simulations on highly skewed data.  You 
will find situations where the confidence coverage is not very close of 
the stated level (e.g., 0.95) and more situations where the overall 
coverage is 0.95 because one tail area is near 0 and the other is near 0.05.

The larger the sample size, the more skewness has to be present to cause 
this problem.

OK - I'm convinced. It turns out that the first change I made to sciplot
was to allow for asymmetric error bars. Is there an easy way (i.e.,
existing package) to bootstrap confidence intervals in R. If so, I'll
try to incorporate this as an option in sciplot.

library(Hmisc)
?smean.cl.boot

Too bad.
Frank

Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

Jarle Bjørgeengen

Tue, May 26, 2009 1:07 AM #

On May 26, 2009, at 4:37 , Frank E Harrell Jr wrote:

Manuel Morales wrote:

On Mon, 2009-05-25 at 06:22 -0500, Frank E Harrell Jr wrote:

Jarle Bj?rgeengen wrote:

On May 24, 2009, at 4:42 , Frank E Harrell Jr wrote:

Jarle Bj?rgeengen wrote:

On May 24, 2009, at 3:34 , Frank E Harrell Jr wrote:

Jarle Bj?rgeengen wrote:

Great,
thanks Manuel.
Just for curiosity, any particular reason you chose standard  
error , and not confidence interval as the default (the  
naming of the plotting functions associates closer to the  
confidence interval .... ) error indication .
- Jarle Bj?rgeengen
On May 24, 2009, at 3:02 , Manuel Morales wrote:

You define your own function for the confidence intervals.  
The function
needs to return the two values representing the upper and  
lower CI
values. So:

qt.fun <- function(x) qt(p=.975,df=length(x)-1)*sd(x)/ 
sqrt(length(x))
my.ci <- function(x) c(mean(x)-qt.fun(x), mean(x)+qt.fun(x))

Minor improvement: mean(x) + qt.fun(x)*c(-1,1) but in general  
confidence limits should be asymmetric (a la bootstrap).

Thanks,
if the date is normally distributed , symmetric confidence  
interval should be ok , right ?

Yes; I do see a normal distribution about once every 10 years.

Is it not true that the students-T (qt(... and so on) confidence  
intervals is quite robust against non-normality too ?

A teacher told me that, the students-T symmetric confidence  
intervals will give a adequate picture of the variability of the  
data in this particular case.

Incorrect.  Try running some simulations on highly skewed data.   
You will find situations where the confidence coverage is not very  
close of the stated level (e.g., 0.95) and more situations where  
the overall coverage is 0.95 because one tail area is near 0 and  
the other is near 0.05.

The larger the sample size, the more skewness has to be present to  
cause this problem.

OK - I'm convinced. It turns out that the first change I made to  
sciplot
was to allow for asymmetric error bars. Is there an easy way (i.e.,
existing package) to bootstrap confidence intervals in R. If so, I'll
try to incorporate this as an option in sciplot.

library(Hmisc)
?smean.cl.boot

H(arrel)misc :-)

Thanks for valuable input Frank.

This seems to work fine. (slightly more time consuming , but what do  
we have CPU power for )

library(Hmisc)
library(sciplot)
my.ci <- function(x) c(smean.cl.boot(x)[2],smean.cl.boot(x)[3])

lineplot 
.CI 
(V1 
,V2 
,data 
= 
d 
,col 
= 
c 
(4 
),err 
.col 
= 
c 
(1 
),err 
.width 
= 
0.02 
,legend=FALSE,xlab="Timeofday",ylab="IOPS",ci.fun=my.ci,cex=0.5,lwd=0.7)

Have I understood you correct in that this is a more accurate way of  
visualizing variability in any dataset , than the students T  
confidence intervals, because it does not assume normality  ?

Can you explain the meaning of B, and how to find a sensible value (if  
not the default is sufficient) ?

Best regards
Jarle Bj?rgeengen

Frank E Harrell Jr

Tue, May 26, 2009 6:02 AM #

Jarle Bj?rgeengen wrote:

On May 26, 2009, at 4:37 , Frank E Harrell Jr wrote:

Manuel Morales wrote:

On Mon, 2009-05-25 at 06:22 -0500, Frank E Harrell Jr wrote:

Jarle Bj?rgeengen wrote:

On May 24, 2009, at 4:42 , Frank E Harrell Jr wrote:

Jarle Bj?rgeengen wrote:

On May 24, 2009, at 3:34 , Frank E Harrell Jr wrote:

Jarle Bj?rgeengen wrote:

Great,
thanks Manuel.
Just for curiosity, any particular reason you chose standard 
error , and not confidence interval as the default (the naming 
of the plotting functions associates closer to the confidence 
interval .... ) error indication .
- Jarle Bj?rgeengen
On May 24, 2009, at 3:02 , Manuel Morales wrote:

You define your own function for the confidence intervals. The 
function
needs to return the two values representing the upper and 
lower CI
values. So:

qt.fun <- function(x) 
qt(p=.975,df=length(x)-1)*sd(x)/sqrt(length(x))
my.ci <- function(x) c(mean(x)-qt.fun(x), mean(x)+qt.fun(x))

Minor improvement: mean(x) + qt.fun(x)*c(-1,1) but in general 
confidence limits should be asymmetric (a la bootstrap).

Thanks,
if the date is normally distributed , symmetric confidence 
interval should be ok , right ?

Yes; I do see a normal distribution about once every 10 years.

Is it not true that the students-T (qt(... and so on) confidence 
intervals is quite robust against non-normality too ?

A teacher told me that, the students-T symmetric confidence 
intervals will give a adequate picture of the variability of the 
data in this particular case.

Incorrect.  Try running some simulations on highly skewed data.  You 
will find situations where the confidence coverage is not very close 
of the stated level (e.g., 0.95) and more situations where the 
overall coverage is 0.95 because one tail area is near 0 and the 
other is near 0.05.

The larger the sample size, the more skewness has to be present to 
cause this problem.

OK - I'm convinced. It turns out that the first change I made to sciplot
was to allow for asymmetric error bars. Is there an easy way (i.e.,
existing package) to bootstrap confidence intervals in R. If so, I'll
try to incorporate this as an option in sciplot.

library(Hmisc)
?smean.cl.boot


H(arrel)misc :-)

Thanks for valuable input Frank.

This seems to work fine. (slightly more time consuming , but what do we 
have CPU power for )

library(Hmisc)
library(sciplot)
my.ci <- function(x) c(smean.cl.boot(x)[2],smean.cl.boot(x)[3])

Don't double the executing time by running it twice!  And this way you 
might possibly get an upper confidence interval that is lower than the 
lower one.  Do function(x) smean.cl.boot(x)[-1]

Yes but instead of saying variability (which quantiles are good at) we 
are talking about the precision of the mean.

For most purposes the default is sufficient.  There are great books and 
papers on the bootstrap for more info, including improved variations on 
the simple bootstrap percentile confidence interval used here.

Frank

Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

Jarle Bjørgeengen

Tue, May 26, 2009 6:34 AM #

On May 26, 2009, at 3:02 , Frank E Harrell Jr wrote:

Jarle Bj?rgeengen wrote:

On May 26, 2009, at 4:37 , Frank E Harrell Jr wrote:

Manuel Morales wrote:

On Mon, 2009-05-25 at 06:22 -0500, Frank E Harrell Jr wrote:

Jarle Bj?rgeengen wrote:

On May 24, 2009, at 4:42 , Frank E Harrell Jr wrote:

Jarle Bj?rgeengen wrote:

On May 24, 2009, at 3:34 , Frank E Harrell Jr wrote:

Jarle Bj?rgeengen wrote:

Great,
thanks Manuel.
Just for curiosity, any particular reason you chose  
standard error , and not confidence interval as the default  
(the naming of the plotting functions associates closer to  
the confidence interval .... ) error indication .
- Jarle Bj?rgeengen
On May 24, 2009, at 3:02 , Manuel Morales wrote:

You define your own function for the confidence intervals.  
The function
needs to return the two values representing the upper and  
lower CI
values. So:

qt.fun <- function(x) qt(p=.975,df=length(x)-1)*sd(x)/ 
sqrt(length(x))
my.ci <- function(x) c(mean(x)-qt.fun(x), mean(x)+qt.fun(x))

Minor improvement: mean(x) + qt.fun(x)*c(-1,1) but in  
general confidence limits should be asymmetric (a la  
bootstrap).

Thanks,
if the date is normally distributed , symmetric confidence  
interval should be ok , right ?

Yes; I do see a normal distribution about once every 10 years.

Is it not true that the students-T (qt(... and so on)  
confidence intervals is quite robust against non-normality too ?

A teacher told me that, the students-T symmetric confidence  
intervals will give a adequate picture of the variability of  
the data in this particular case.

Incorrect.  Try running some simulations on highly skewed data.   
You will find situations where the confidence coverage is not  
very close of the stated level (e.g., 0.95) and more situations  
where the overall coverage is 0.95 because one tail area is near  
0 and the other is near 0.05.

The larger the sample size, the more skewness has to be present  
to cause this problem.

OK - I'm convinced. It turns out that the first change I made to  
sciplot
was to allow for asymmetric error bars. Is there an easy way (i.e.,
existing package) to bootstrap confidence intervals in R. If so,  
I'll
try to incorporate this as an option in sciplot.

library(Hmisc)
?smean.cl.boot

H(arrel)misc :-)
Thanks for valuable input Frank.
This seems to work fine. (slightly more time consuming , but what  
do we have CPU power for )
library(Hmisc)
library(sciplot)
my.ci <- function(x) c(smean.cl.boot(x)[2],smean.cl.boot(x)[3])

Don't double the executing time by running it twice!  And this way  
you might possibly get an upper confidence interval that is lower  
than the lower one.  Do function(x) smean.cl.boot(x)[-1]

D'oh

Right.

Once again, thanks.

Best regards
- Jarle Bj?rgeengen