plot of Bernoulli data - R-help

Tue, Oct 2, 2001 2:23 AM #

I have some Bernoulli data something like this:
 x<-sort(runif(100,1,20))
 p<-pnorm(x,10,3)
 y<-as.numeric(runif(x)<p)
 plot(x,y)
 lines(x,p)

This plot is not very satisfactory because the ogive does not visually
fit the (0,1) points very well, and also because the points tend to fall
on top of one another. The second problem can be eliminated by adding
vertical jitter. However I was thinking about the following plot. Instead
of plotting all the 0,1 points, instead divide the x axis into bins. In
each bin, find the average y value. Then plot (x=average of x values in
bin, y=average of 0,1 values in bin). So if I use 10 bins I have 10 points
in the plot and now the y-values are proportions instead of 0/1.  Is this
a plot that other people have used (refs appreciated)? If so maybe someone
has code to do this. Otherwise, I am not sure of how to do this in R.
Could someone help me?  Thanks very much.

Bill

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Bill Simpson

Tue, Oct 2, 2001 4:57 AM #

df<-data.frame(x,y)
aggregate(df,list(x=(x<5),(x>5)&(x<10),(x>10) & (x<15),(x>15)), FUN=mean)
gives me what I want but if anyone has a better way to collect the
observations into bins I'd like to hear it. It would be nice to
pass along something like
breaks<-c(5,10,15,20)

Thanks

Bill Simpson

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Ben Bolker

Tue, Oct 2, 2001 6:05 AM #

cut()?

On Tue, 2 Oct 2001, Bill Simpson wrote:

318 Carr Hall                                bolker at zoo.ufl.edu
Zoology Department, University of Florida    http://www.zoo.ufl.edu/bolker
Box 118525                                   (ph)  352-392-5697
Gainesville, FL 32611-8525                   (fax) 352-392-3704

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Frank E Harrell Jr

Tue, Oct 2, 2001 6:15 AM #

The loess smoother, with outlier detection turned off,
is an excellent way to estimate the relationship
between a continuous variable and the probability
of an event, based on a binary dependent variable.
Use lowess(x,y,iter=0).  I do wonder about the way in
which your data are generated, however.  You
might think about

 p <- whatever  # and check that if you use pnorm the
                # first arg to pnorm spans the right metric
 y <- 1*(runif(n) <= p)  # n=100 in your example

Bill Simpson wrote:

Frank E Harrell Jr              Prof. of Biostatistics & Statistics
Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences
U. Virginia School of Medicine  http://hesweb1.med.virginia.edu/biostat
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._