Skip to content

ps or pdf

3 messages · Francois Pepin, Brian Ripley

#
Hi everyone,

I have been making a fair amount of figures in R recently that I've
been touching up with Illustrator and I've found a difference between
pdf and ps files and I was wondering if someone could enlighten me
about them.

While the figures look the same, the ps version tends to have
truncated strings. The last character of short strings tends to be on
a string of its own, located right beside the rest. This makes it a bit 
awkward to manipulate, especially if scaling is involved. Is there a 
reason for this differences?

There also seems to be somewhat arbitrary grouping of the last column 
cells in heatmaps in ps files.

I used to prefer the ps because they embed more easily in latex
documents (although pdf are not difficult and conversions are trivial
anyhow), but I'm curious if there are other reasons why one format might
be preferred over the other in this context.

This is with R 2.6 on linux, and I've seen this behavior with older R
version also.

Francois

sessionInfo()
R version 2.6.0 (2007-10-03)
x86_64-unknown-linux-gnu

locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] rcompgen_0.1-15
#
On Mon, 31 Mar 2008, Francois Pepin wrote:

            
Please see the footer of this message.  Neither postscript() nor pdf() 
graphics devices split up strings they are passed (by e.g. text()), so 
this is being done either by the code used to create the plot (and we have 
no idea what that is) or by the viewer.  I suspect the problem is rather 
in the viewer, but without the example we asked for it is impossible to 
know.
Again, we need an example.
The graphics devices are very similar (they share a lot of code).  One 
small difference is that PostScript has an arc primitive, and PDF does 
not.
Nothing has changed at that level for a long time -- not even in 
current versions of R (and 2.6.0 is obsolete).

  
    
#
Prof Brian Ripley wrote:
Sorry, here is an example. For some reason, I cannot reproduce it 
without using actual gene names.

set.seed(1)
##The row names were originally obtained using the hgug4112a library 
##from bioconductor. I set it manually for people who don't have it 
##installed.
##library(hgug4112a);row<-sample(na.omit(unlist(as.list(hgug4112aSYMBOL))),50)
row<-c("BDNF", "EMX2", "ZNF207", "HELLS", "PWP1", "PDXDC1",  "BTD", 
"NETO1", "SLCO4C1", "FZD7", "NICN1", "TMSB4Y", "PSMB7",  "CADM2", 
"SIRT3", "ADH6", "TM6SF1", "AARS", "TMEM88", "CP110",  "ADORA2A", 
"ATAD3A", "VAPA", "NXPH3", "IL27RA", "NEBL", "FANCF",  "PTPRG", 
"HSU79275", "CCDC34", "EPDR1", "FBLN1", "PCAF", "AP1B1",  "TXNRD2", 
"MUC20", "MBNL1", "STAU2", "STK32C", "PPIAL4", "TGFBR2",  "DPY19L2P3", 
"TMEM50B", "ENY2", "MAN2A2", "ZFYVE26", "TECTA",  "CD55", "LOC400794", 
"SLC19A3")
postscript('/tmp/heatmap.ps',paper='letter',horizontal=F)
heatmap(matrix(rnorm(2500),50),labRow=row)
dev.off()
Example of row names that are truncated in Illustrator (* denoting 
truncation):
CCDC3*4 (2nd row)
MUC2*0 (3rd row)
MBNL*1 (8th row)
...

It is likely that Illustrator (CS 3, OS X version) is at fault.  I do 
not see any truncation if I look at the ps file by hand (lines 4801 and 
4802):

540.22 545.88 (MUC20) 0 0 0 t
540.22 553.90 (CCDC34) 0 0 0 t
The top right cell (26, TXNRD2) is grouped with the cell just below it 
(26, CCDC34). It's more of a curiosity than anything else.
This is what I thought at first, which is why I found these differences 
surprising. I think your idea of blaming the viewer is correct. I 
thought that Adobe of all people could deal with Postscript files 
properly, but I guess I was overly trusting.

Thanks for the help,

Francois