Plot of large dataset
I'd start with scatterplots of the two subsets (pass vs fail), but with 280k points, those are likely to be fairly uninformative masses of black ink). However, there might be enough separation between them that you don't need anything else. If not, then a pair of hexbin plots (from the Bioconductor hexbin package), e.g. plot(hexbin(rnorm(280000), rnorm(280000))) may work. Other possibilities are to use partially transparent points, and possibly to use jittering if there are a lot of ties. I would avoid 3D histograms; they aren't nearly as informative. Duncan Murdoch
On 9/8/2008 11:40 AM, Jason Thibodeau wrote:
I apologize, I forgot to type the title. On Mon, Sep 8, 2008 at 11:39 AM, Jason Thibodeau <jbloudg20 at gmail.com>wrote:
Hello all, I have a very large file (280k lines) containing three comma separated variables. The first variable is a 0 or 1 depicting a pass or fail. The other two are X and Y coordinates. Is there a good way I can represent this data in a chart/plot form other than using a 3d histogram? If I need to use the histogram, should I base my chart off the example contained in the RGL package? Thanks a lot. -- Jason Thibodeau ECE Dept., University of Connecticut 371 Fairfield Way, Storrs, CT 06269 Phone: 860-486-5274 , Fax: 860-486-2447 Email: jpt03002 at engr.uconn.edu URL: www.engr.uconn.edu/~jpt03002 <http://www.engr.uconn.edu/%7Ejpt03002>