Skip to content

Sweave & xtable

9 messages · Hedderik van Rijn, A.J. Rossini, Peter Dalgaard +1 more

#
I'm trying to get Sweave running for automatic report generation, and 
it seems to run fine when just using verbatim output. However, I've ran 
into a problem with xtable. I would like to print the following matrix 
using xtable:

 > dim(counts)
[1] 19 15

All columns are filled with real/integer numbers > 0 and < 1000. Just 
typing xtable(counts) gives correct LaTeX output, but running Sweave on:

<<results=tex>>=
library(xtable)
xtable(counts)
@

Generates truncated TeX output:

% latex table generated in R 1.6.1 by xtable 1.0-10 package
% Fri Dec 20 14:50:08 2002
\begin{table}
[...]
5 & 105.00 & 400.00 & 0.00 & 0.00 & 0.00 & 0.00 & 1000.00 & 542.00 & 
181.00 & 858.00 & 4
62.00 & 103.00 & 744.00 & 449.00 & 93.00 \\
6 & 201.00 & 400.00 & 0.00 &

xtable(counts[1:4,]) does work. Do I have to flush the output of 
xtables in some way? Or is my table just too large?

  - Hedderik.
#
hedderik> I'm trying to get Sweave running for automatic report generation, and
    hedderik> it seems to run fine when just using verbatim output. However, I've
    hedderik> ran into a problem with xtable. I would like to print the following
    hedderik> matrix using xtable:

I've had problems with large matrices, but never got around to
figuring out why (the cheap hack solution was to split matrices and
present in different tables).  

(matrices representing components of a model, not raw data, so it made
some sense, but wasn't optimal).

i.e. I think it might be a bug.

best,
-tony
1 day later
#
A quick solution (as long as the table is not too wide) is to include 
the following code after library(xtable), replacing the original 
print.string

print.string <- function(x,...) {
   
lapply(strsplit(x$text,"\n")[[1]],cat,"\n",file=x$file,append=x$append)
   return(invisible())
}

The problem seems to be that textConnection() (used in the Sweave code) 
is not able to process the long strings that sometimes get sink'ed to 
it. When sending a string directly to it, a warning is triggered:

## Error/truncation with warning:
con <- textConnection("output","w");
sink(file=con);
paste(rep("123456789!",1000),collapse="");
## Warning message:
## line truncated in output text connection
sink()
rm(last.warning)

However, when a long string is send to textConnection via xtable, no 
warning is shown:

## Error/truncation without warning:
library("xtable")
con <- textConnection("output","w")
sink(file=con)
xtable(matrix(rnorm(1000),100))
sink()
## print(output) would show the truncated LaTeX table, warnings() 
doesn't show any warning

I'm not sure whether textConnection or xtable should be blamed, but 
changing the single cat() with the lapply/strsplit/cat combination in 
print.string solved the/my problem.

If the truncation of long strings is official/known behavior of 
textConnection, the following text in textConnection's help page might 
need some revision, i.e., some more explicit statement that long 
strings might get truncated. (And, maybe also a definition of what a 
"completed line of output" is, i.e., ending in a "\n".)

      An output text connection is opened and creates an R character
      vector of the given name in the user's workspace.  This object
      will at all times hold the completed lines of output to the
      connection, and `isIncomplete' will indicate if there is an
      incomplete final line.  Closing the connection will output the
      final line, complete or not.

   - Hedderik.

P.S. I've seen a lot of bus errors and segmentation faults while trying 
to find the cause of the problem. If anyone is interested, I can try to 
see if I can reproduce those. I'm using:

R : Copyright 2002, The R Development Core Team
Version 1.6.1  (2002-11-01)

% latex table generated in R 1.6.1 by xtable 1.0-11 package
#
Connections do correctly give you a warning if the internal line limit is
exceeded.  This is docuemnted in the source code, which is there for you
to read. It is naive to assume that systems have arbitrary line-length
limits: many do not including the Unix terminal/shells I use. (And that is
not in their man pages either.)

Looks to me as if text connections are being used where anonymous file
connections would be much more appropriate.
On Sat, 21 Dec 2002, Hedderik van Rijn wrote:
[...]
Whatever else could it mean?  I doubt if the end user would know what \n
means if (s)he is so naive as not to know what an incomplete line is!
#
I know, discovered that it did give warning if the string is sink'ed 
directly, instead of going through xtable (and which was illustrated 
with the examples in the previous email). After some more explorations, 
it seems to be caused by cat'ing instead of print'ing a long string to 
a textConnection using sink.

This code triggers a warning: (Same behavior of course if the 
paste(...) is explicitly embedded in a print(...) statement)

con <- textConnection("output","w");
sink(file=con);
paste(rep("123456789!",1000),collapse="")
## Warning message:
## line truncated in output text connection
sink()
close(con)

Whereas this code snippet "silently" truncates the string, without 
warning:

con <- textConnection("output","w");
sink(file=con);
cat(paste(rep("123456789!",1000),collapse=""))
sink()
close(con)

I'm not sure which function (if any) to blame, but I definitely think 
that either cat or textConnection should have made sure that a warning 
"came through". As you mentioned, it is naive to assume an arbitrary 
line-length, but if the above code is not incorrect, my opinion is that 
it should warn users of incorrect output, or state it in the help pages.
Indeed, changing Sweave's RweaveLatexRuncode (line 1596 of tools, R 
1.6.1) to use file/readLines instead of textConnection:

         ## HvR replaced: tmpcon <- textConnection("output", "w")
         tmpcon <- file()
         sink(file=tmpcon)
         err <- NULL
         if(options$eval) err <- RweaveEvalWithOpt(ce, options)
         ## HvR added (make sure the final line is complete (with final 
EOL marker):
         cat("\n")
         sink()
         ## HvR added:
         output <- readLines(tmpcon)
         close(tmpcon)

solves the truncation problem for the Sweave/xtable/cat combination.
While trying to figure out how to use anonymous file connections, I 
come acros the following reference in the readLines help page:

      If the final line is incomplete (no final EOL marker) the
      behaviour depends on whether the connection is blocking or not.

I guess the addition of "(no final EOL marker)" would at least for me 
be a useful extension to the textConnection help page.
After having spend a couple of hours with textConnections and other 
redirections, and knowing more about how they work, I certainly see 
your point. However, it might have saved me some initial confusion if 
this would have been there in the first place.

At the same time, it might be valuable to add an explicit reference to 
file() (besides the "See also: connections"), stating something along 
the line of the combination file/readLines being preferred over 
textConnection if the purpose is to process large chunks of output. (If 
I gathered correctly from your remarks that anonymous file connections 
are more appropriate in these situations.)

Thanks for the valuable comments, again, I learned a lot.

   - Hedderik.
#
Hedderik van Rijn <hedderik at cmu.edu> writes:
The warning should be inside "output", shouldn't it? It actually isn't
there, so perhaps it is getting appended to the already overlong
string??

As I was trying to dig deeper, I ran into the following interesting
segfault on RH8.0:
[1]
"123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!!
123456789"
Program received signal SIGSEGV, Segmentation fault.
0x4207a4cb in strlen () from /lib/i686/libc.so.6

(The cat(...) call being obtained with up-arrow command recall) 

Apparently this is not quite reproducible and might depend on stuff in
the workspace, but it looks suspicious. Will have a better look.
#
Peter Dalgaard BSA <p.dalgaard at biostat.ku.dk> writes:
To be precise, it happens *next* time I press "up"
It does seem to depend on my workspace and/or my history file, neither
of which are particularly interesting. The segfault happens deep into
readline calls, so it's not really appealing to try and track it down...

Apparently, it still happens with r-devel from Dec. 9 (which is the
most recent I have on that particular machine...).
#
This is the same behavior I encountered. The first time _always_ goes 
fine, the next time (also by pressing "up", "enter") sometimes(?) 
results in a segv and sometimes in a bus error. (Using R 1.6.1, Mac OS 
X, latest OS X update, 10.2.3? and just released fink installed.)

As I was not sure if I could replicate it always, I only referred to it 
in a "P.S." in the second email I sent on this topic.

  - Hedderik.
#
This should be fixed in R-patched and R-devel (and has been since
Saturday).  Certainly it has gone away for me.
On Mon, 23 Dec 2002, Hedderik van Rijn wrote: