Skip to content

plot of Bernoulli data

4 messages · Bill Simpson, Ben Bolker, Frank E Harrell Jr

#
I have some Bernoulli data something like this:
 x<-sort(runif(100,1,20))
 p<-pnorm(x,10,3)
 y<-as.numeric(runif(x)<p)
 plot(x,y)
 lines(x,p)

This plot is not very satisfactory because the ogive does not visually
fit the (0,1) points very well, and also because the points tend to fall
on top of one another. The second problem can be eliminated by adding
vertical jitter. However I was thinking about the following plot. Instead
of plotting all the 0,1 points, instead divide the x axis into bins. In
each bin, find the average y value. Then plot (x=average of x values in
bin, y=average of 0,1 values in bin). So if I use 10 bins I have 10 points
in the plot and now the y-values are proportions instead of 0/1.  Is this
a plot that other people have used (refs appreciated)? If so maybe someone
has code to do this. Otherwise, I am not sure of how to do this in R.
Could someone help me?  Thanks very much.

Bill

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
df<-data.frame(x,y)
aggregate(df,list(x=(x<5),(x>5)&(x<10),(x>10) & (x<15),(x>15)), FUN=mean)
gives me what I want but if anyone has a better way to collect the
observations into bins I'd like to hear it. It would be nice to
pass along something like
breaks<-c(5,10,15,20)

Thanks

Bill Simpson

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
cut()?
On Tue, 2 Oct 2001, Bill Simpson wrote:

            

  
    
#
The loess smoother, with outlier detection turned off,
is an excellent way to estimate the relationship
between a continuous variable and the probability
of an event, based on a binary dependent variable.
Use lowess(x,y,iter=0).  I do wonder about the way in
which your data are generated, however.  You
might think about

 p <- whatever  # and check that if you use pnorm the
                # first arg to pnorm spans the right metric
 y <- 1*(runif(n) <= p)  # n=100 in your example
Bill Simpson wrote: