Skip to content

sciplot question

17 messages · Jarle Bjørgeengen, Manuel Morales, Spencer Graves +1 more

#
Hi,

I would like to have lineplot.CI and barplot.CI to actually plot  
confidence intervals , instead of standard error.

I understand I have to use the ci.fun option, but I'm not quite sure  
how.

Like this :

 >  qt(0.975,df=n-1)*s/sqrt(n)

but how can I apply it to visualize the length of the student's T  
confidence intervals rather than the stdandard error of the plotted  
means ?
1 day later
#
You define your own function for the confidence intervals. The function
needs to return the two values representing the upper and lower CI
values. So:

qt.fun <- function(x) qt(p=.975,df=length(x)-1)*sd(x)/sqrt(length(x))
my.ci <- function(x) c(mean(x)-qt.fun(x), mean(x)+qt.fun(x))

lineplot.CI(x.factor = dose, response = len, data = ToothGrowth,
    ci.fun=my.ci)

Manuel
On Fri, 2009-05-22 at 18:38 +0200, Jarle Bj?rgeengen wrote:
#
Great,

thanks Manuel.

Just for curiosity, any particular reason you chose standard error ,  
and not confidence interval as the default (the naming of the plotting  
functions associates closer to the confidence interval .... ) error  
indication .

- Jarle Bj?rgeengen
On May 24, 2009, at 3:02 , Manuel Morales wrote:

            
#
Jarle Bj?rgeengen wrote:
Minor improvement: mean(x) + qt.fun(x)*c(-1,1) but in general confidence 
limits should be asymmetric (a la bootstrap).

I'm not sure how NAs are handled.

Frank

  
    
#
On May 24, 2009, at 3:34 , Frank E Harrell Jr wrote:

            
Thanks,

if the date is normally distributed , symmetric confidence interval  
should be ok , right ?

When plotting the individual sample , it looks normally distributed.

Best regards.
Jarle Bj?rgeengen
#
Jarle Bj?rgeengen wrote:
Yes; I do see a normal distribution about once every 10 years.
An appropriate qqnorm plot is a better way to check, but often the data 
cannot tell you about the normality of themselves.   It's usually better 
to use methods (e.g., bootstrap) that do not assume normality and that 
provide skewed confidence intervals if the data are skewed.

Frank

  
    
#
Dear Frank, et al.:
Frank E Harrell Jr wrote:
To what do you attribute the nonnormality you see in most cases?  


           (1) Unmodeled components of variance that can generate errors 
in interpretation if ignored, even with bootstrapping? 


           (2) Honest outliers that do not relate to the phenomena of 
interest and would better be removed through improved checks on data 
quality, but where bootstrapping is appropriate (provided the data are 
not also contaminated with (1))? 


           (3) Situations where the physical application dictates a 
different distribution such as binomial, lognormal, gamma, etc., 
possibly also contaminated with (1) and (2)? 


      I've fit mixtures of normals to data before, but one needs to be 
careful about not carrying that to extremes, as the mixture may be a 
result of (1) and therefore not replicable. 


      George Box once remarked that he thought most designed experiments 
included split plotting that had been ignored in the analysis.  That is 
only a special case of (1). 


      Thanks,
      Spencer Graves
#
spencerg wrote:
Spencer,

Those are all important reasons for non-normality of margin 
distributions.  But the biggest reason of all is that the underlying 
process did not know about the normal distribution.  Normality in raw 
data is usually an accident.

Frank
#
On May 24, 2009, at 4:42 , Frank E Harrell Jr wrote:

            
Is it not true that the students-T (qt(... and so on) confidence  
intervals is quite robust against non-normality too ?

A teacher told me that, the students-T symmetric confidence intervals  
will give a adequate picture of the variability of the data in this  
particular case.

Best rgds
Jarle Bj?rgeengen
#
Jarle Bj?rgeengen wrote:
Incorrect.  Try running some simulations on highly skewed data.  You 
will find situations where the confidence coverage is not very close of 
the stated level (e.g., 0.95) and more situations where the overall 
coverage is 0.95 because one tail area is near 0 and the other is near 0.05.

The larger the sample size, the more skewness has to be present to cause 
this problem.

Frank

  
    
#
Frank E Harrell Jr wrote:
Frank: 


      Might there be a difference between the physical and social 
sciences on this issue? 


      The central limit effect works pretty well with many kinds of 
manufacturing data, except that it is often masked by between-lot 
components of variance.  The first differences in log(prices) are often 
long-tailed and negatively skewed.  Standard GARCH and similar models 
handle the long tails well but miss the skewness, at least in what I've 
seen.  I think that can be fixed, but I have not yet seen it done. 


      Social science data, however, often involve discrete scales where 
the raters' interpretations of the scales rarely match any standard 
distribution.  Transforming to latent variables, e.g., via factor 
analysis, may help but do not eliminate the problem. 


      Thanks for your comments. 

      Spencer
#
spencerg wrote:
Hi Spencer,

I doubt that the difference is large, but biological measurements seem 
to be more of a problem.
The central limit theorem in and of itself doesn't help because it 
doesn't tell you how large N must be before normality holds well enough.
Good example.  Many of the scales I've seen are non-normal or even 
multi-modal.
Thanks for yours
Frank

  
    
#
On Mon, 2009-05-25 at 06:22 -0500, Frank E Harrell Jr wrote:
OK - I'm convinced. It turns out that the first change I made to sciplot
was to allow for asymmetric error bars. Is there an easy way (i.e.,
existing package) to bootstrap confidence intervals in R. If so, I'll
try to incorporate this as an option in sciplot.

BTW Jarle - to answer an earlier question, standard error is "the
standard" in my field, ecology, and that's why it's the current default
in sciplot.

Manuel
#
Manuel Morales wrote:
library(Hmisc)
?smean.cl.boot
Too bad.
Frank

  
    
#
On May 26, 2009, at 4:37 , Frank E Harrell Jr wrote:

            
H(arrel)misc :-)

Thanks for valuable input Frank.

This seems to work fine. (slightly more time consuming , but what do  
we have CPU power for )

library(Hmisc)
library(sciplot)
my.ci <- function(x) c(smean.cl.boot(x)[2],smean.cl.boot(x)[3])

lineplot 
.CI 
(V1 
,V2 
,data 
= 
d 
,col 
= 
c 
(4 
),err 
.col 
= 
c 
(1 
),err 
.width 
= 
0.02 
,legend=FALSE,xlab="Timeofday",ylab="IOPS",ci.fun=my.ci,cex=0.5,lwd=0.7)

Have I understood you correct in that this is a more accurate way of  
visualizing variability in any dataset , than the students T  
confidence intervals, because it does not assume normality  ?

Can you explain the meaning of B, and how to find a sensible value (if  
not the default is sufficient) ?

Best regards
Jarle Bj?rgeengen
#
Jarle Bj?rgeengen wrote:
Don't double the executing time by running it twice!  And this way you 
might possibly get an upper confidence interval that is lower than the 
lower one.  Do function(x) smean.cl.boot(x)[-1]
Yes but instead of saying variability (which quantiles are good at) we 
are talking about the precision of the mean.
For most purposes the default is sufficient.  There are great books and 
papers on the bootstrap for more info, including improved variations on 
the simple bootstrap percentile confidence interval used here.

Frank

  
    
#
On May 26, 2009, at 3:02 , Frank E Harrell Jr wrote:

            
D'oh
Right.
Once again, thanks.

Best regards
- Jarle Bj?rgeengen