Skip to content

how to plot a distribution of mean and standard deviation

3 messages · gj, R. Michael Weylandt, Ben Bolker

gj
#
Hi,
I have the following data about courses (504) in a university, two
attributes about the proportion of resources used (#resources_used /
#resources_available), namely the average and the standard deviation.
Thus I have:
[1] n=504 rows
[2] 1 id column and 2 attributes

Here's a sample of the data:

courseid,average,std
12741,1,0
17161,1,0
12514,1,0
12316,0.8666666692648178,0.26090261464799325
2467,0.8623188442510107,0.24920700355307424
3047,0.85,0.2314550249431379
1747,0.8481481481481481,0.23078446747051584
2487,0.8383838455333854,	0.20429589057565342
13869,0.8181818181818182,0.2522624895547565
1706,0.8158730235364702,0.19332287915878024
2041,0.8095238095238095,0.24880667576405963
1864,0.8080808141014793,0.17456052968726046
2106,0.784444437623024,0.2475808839379094
....
.....

My question is how can I sensibly visualise this data.

In this context, it does not make sense to go find the population mean
or population std. However, what would sense is showing the cdf of the
mean. So, I'm thinking of doing this using ecdf(). But what about the
standard deviation? How can I include visualise the standard deviation
as well as the mean? Would that make sense on just one plot?

Any idea?

Thanks
Gawesh
#
It seems like the relevant plot would depend on what you are trying to
investigate, but usually a scatterplot would well work for bivariate
data with no other assumptions needed. I usually find ecdf() plots
rather hard to interpret without playing around with the data
elsewhere first and I'm not sure they make an enormous amount of sense
for bivariate data in your case since they reorder inputs.

Michael
On Sun, Oct 23, 2011 at 6:51 AM, gj <gawesh at gmail.com> wrote:
#
R. Michael Weylandt <michael.weylandt <at> gmail.com> writes:
[snip]
[snip]


  You could make a "caterpillar plot" as follows:

X <- read.csv("coursetmp.dat")
library(ggplot2)
X <- transform(X,courseid=reorder(courseid,average))
ggplot(X,aes(x=courseid,y=average,
   ymin=average-2*std,ymax=average+2*std))+geom_point()+
  geom_linerange()+coord_flip()

  (Here the x and y axes are flipped because it's easier to plot & read
the course ID labels that way)

  Of course, the answer to "how should I visualize these data?" always
depends on what you want to find out ...