Skip to content
Prev 155367 / 398503 Next

Plot of large dataset

I'd start with scatterplots of the two subsets (pass vs fail), but with 
280k points, those are likely to be fairly uninformative masses of black 
ink).  However, there might be enough separation between them that you 
don't need anything else.

If not, then a pair of hexbin plots (from the Bioconductor hexbin 
package), e.g.

plot(hexbin(rnorm(280000), rnorm(280000)))

may work.  Other possibilities are to use partially transparent points, 
and possibly to use jittering if there are a lot of ties.

I would avoid 3D histograms; they aren't nearly as informative.

Duncan Murdoch
On 9/8/2008 11:40 AM, Jason Thibodeau wrote: