Skip to content

Scatterplot Showing All Points

9 messages · Wayne Aldo Gavioli, jim holtman, Johannes Hüsing +5 more

#
Hello all,


I'm trying to graph a scatterplot of a large (5,000 x,y coordinates) of data
with the caveat that many of the data points overlap with each other (share the
same x AND y coordinates).  In using the usual "plot" command,
it seems that the overlap of points is not shown in the graph.  Namely, there
are 5,000 points that should be plotted, as I mentioned above, but because so
many of the points overlap with each other exactly, only about 50-60 points are
actually plotted on the graph.  Thus, there's no indication that Point A shares
its coordinates with 200 other pieces of data and thus is very common while
Point B doesn't share its coordinates with any other pieces of data and thus
isn't common at all.  Is there anyway to indicate the frequency of such points
on such a graph?  Should I be using a different command than "plot"?


Thanks,


Wayne
#
Use 'hexbin' from bioconductor to show how many points are in a grid
on the graph.
On Dec 17, 2007 8:14 PM, Wayne Aldo Gavioli <wgavioli at fas.harvard.edu> wrote:

  
    
#
On 17/12/2007 8:14 PM, Wayne Aldo Gavioli wrote:
The jitter() function can add a bit of noise to your data, so that 
repeated points show up as groupings instead of isolated points.

Duncan Murdoch
#
Wayne Aldo Gavioli <wgavioli at fas.harvard.edu> [Tue, Dec 18, 2007 at 02:14:23AM CET]:
?sunflowerplot
#
Wayne,

I am fond of the bagplot (think 2D box plot) to replace scatter plots
for large N. See
http://www.wiwi.uni-bielefeld.de/~wolf/software/aplpack/ and aplpack
in CRAN.
#
Wayne Aldo Gavioli wrote:
Hi Wayne,
While this is not a really pretty picture, you can get a viewable plot 
with count.overplot if the first two elements of "education" are named 
"x" and "y" and they are the coordinates you want to plot. Otherwise, 
pass the x and y coordinates separately.

library(plotrix)
count.overplot(education,
  tol=c(diff(range(education$x))/10,
  diff(range(education$y))/10))

Jim
#
Wayne Aldo Gavioli <wgavioli <at> fas.harvard.edu> writes:
One suggestion seems to be still missing: 'sunflowerplot' of base R. May look
taggy, though, if you have 200 "petals". 

Actually the documentation of sunflowerplot is wrong in botanical sense.
Sunflowers have composite flowers in capitula, and the things called 'petals' in
documentation are ligulate, sterile ray-florets (each with vestigial petals
which are not easily visible in sunflower, but in some other species you may see
three (occasionally two) teeth). 

cheers, jari oksanen
#
Jari Oksanen wrote:
Could you please put together a patch that replaces "petals" with 
"ligulate, sterile ray-florets" in appropriate places?

;-)

Duncan Murdoch
#
On 12/17/07, Jim Porzak <jporzak at gmail.com> wrote:
The big drawback of the bagplot, like the boxplot, is that it's
difficult to see multimodality.

Hadley