scatterplot of 100000 points and pdf file format
On Wed, 24 Nov 2004 Ted.Harding at nessie.mcc.ac.uk wrote:
On 24-Nov-04 Witold Eryk Wolski wrote:
Hi, I want to draw a scatter plot with 1M and more points and save it as pdf. This makes the pdf file large. So i tried to save the file first as png and than convert it to pdf. This looks OK if printed but if viewed e.g. with acrobat as document figure the quality is bad. Anyone knows a way to reduce the size but keep the quality?
If you want the PDF file to preserve the info about all the 1M points then the problem has no solution. The png file will already have suppressed most of this (which is one reason for poor quality). I think you should give thought to reducing what you need to plot. Think about it: suppose you plot with a resolution of 1/200 points per inch (about the limit at which the eye begins to see rough edges). Then you have 40000 points per square inch. If your 1M points are separate but as closely packed as possible, this requires 25 square inches, or a 5x5 inch (= 12.7x12.7 cm) square. And this would be solid black! Presumably in your plot there is a very large number of points which are effectively indistinguisable from other points, so these could be eliminated without spoiling the plot. I don't have an obviously best strategy for reducing what you actually plot, but perhaps one line to think along might be the following: 1. Multiply the data by some factor and then round the results to an integer (to avoid problems in step 2). Factor chosen so that the result of (4) below is satisfactory. 2. Eliminate duplicates in the result of (1). 3. Divide by the factor you used in (1). 4. Plot the result; save plot to PDF. As to how to do it in R: the critical step is (2), which with so many points could be very heavy unless done by a well-chosen procedure. I'm not expert enough to advise about that, but no doubt others are.
unique will eat that for breakfast
x <- runif(1e6) system.time(xx <- unique(round(x, 4)))
[1] 0.55 0.09 0.64 0.00 0.00
length(xx)
[1] 10001
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595