I'm trying to get Sweave running for automatic report generation, and
it seems to run fine when just using verbatim output. However, I've ran
into a problem with xtable. I would like to print the following matrix
using xtable:
> dim(counts)
[1] 19 15
All columns are filled with real/integer numbers > 0 and < 1000. Just
typing xtable(counts) gives correct LaTeX output, but running Sweave on:
<<results=tex>>=
library(xtable)
xtable(counts)
@
Generates truncated TeX output:
% latex table generated in R 1.6.1 by xtable 1.0-10 package
% Fri Dec 20 14:50:08 2002
\begin{table}
[...]
5 & 105.00 & 400.00 & 0.00 & 0.00 & 0.00 & 0.00 & 1000.00 & 542.00 &
181.00 & 858.00 & 4
62.00 & 103.00 & 744.00 & 449.00 & 93.00 \\
6 & 201.00 & 400.00 & 0.00 &
xtable(counts[1:4,]) does work. Do I have to flush the output of
xtables in some way? Or is my table just too large?
- Hedderik.
Sweave & xtable
9 messages · Hedderik van Rijn, A.J. Rossini, Peter Dalgaard +1 more
"hedderik" == Hedderik van Rijn <hedderik at cmu.edu> writes:
hedderik> I'm trying to get Sweave running for automatic report generation, and
hedderik> it seems to run fine when just using verbatim output. However, I've
hedderik> ran into a problem with xtable. I would like to print the following
hedderik> matrix using xtable:
I've had problems with large matrices, but never got around to
figuring out why (the cheap hack solution was to split matrices and
present in different tables).
(matrices representing components of a model, not raw data, so it made
some sense, but wasn't optimal).
i.e. I think it might be a bug.
best,
-tony
A.J. Rossini Rsrch. Asst. Prof. of Biostatistics U. of Washington Biostatistics rossini at u.washington.edu FHCRC/SCHARP/HIV Vaccine Trials Net rossini at scharp.org -------------- http://software.biostat.washington.edu/ ---------------- FHCRC: M: 206-667-7025 (fax=4812)|Voicemail is pretty sketchy/use Email UW: Th: 206-543-1044 (fax=3286)|Change last 4 digits of phone to FAX (my tuesday/wednesday/friday locations are completely unpredictable.)
1 day later
I'm trying to get Sweave running for automatic report generation, and it seems to run fine when just using verbatim output. However, I've ran into a problem with xtable.
A quick solution (as long as the table is not too wide) is to include
the following code after library(xtable), replacing the original
print.string
print.string <- function(x,...) {
lapply(strsplit(x$text,"\n")[[1]],cat,"\n",file=x$file,append=x$append)
return(invisible())
}
The problem seems to be that textConnection() (used in the Sweave code)
is not able to process the long strings that sometimes get sink'ed to
it. When sending a string directly to it, a warning is triggered:
## Error/truncation with warning:
con <- textConnection("output","w");
sink(file=con);
paste(rep("123456789!",1000),collapse="");
## Warning message:
## line truncated in output text connection
sink()
rm(last.warning)
However, when a long string is send to textConnection via xtable, no
warning is shown:
## Error/truncation without warning:
library("xtable")
con <- textConnection("output","w")
sink(file=con)
xtable(matrix(rnorm(1000),100))
sink()
## print(output) would show the truncated LaTeX table, warnings()
doesn't show any warning
I'm not sure whether textConnection or xtable should be blamed, but
changing the single cat() with the lapply/strsplit/cat combination in
print.string solved the/my problem.
If the truncation of long strings is official/known behavior of
textConnection, the following text in textConnection's help page might
need some revision, i.e., some more explicit statement that long
strings might get truncated. (And, maybe also a definition of what a
"completed line of output" is, i.e., ending in a "\n".)
An output text connection is opened and creates an R character
vector of the given name in the user's workspace. This object
will at all times hold the completed lines of output to the
connection, and `isIncomplete' will indicate if there is an
incomplete final line. Closing the connection will output the
final line, complete or not.
- Hedderik.
P.S. I've seen a lot of bus errors and segmentation faults while trying
to find the cause of the problem. If anyone is interested, I can try to
see if I can reproduce those. I'm using:
R : Copyright 2002, The R Development Core Team
Version 1.6.1 (2002-11-01)
% latex table generated in R 1.6.1 by xtable 1.0-11 package
Connections do correctly give you a warning if the internal line limit is exceeded. This is docuemnted in the source code, which is there for you to read. It is naive to assume that systems have arbitrary line-length limits: many do not including the Unix terminal/shells I use. (And that is not in their man pages either.) Looks to me as if text connections are being used where anonymous file connections would be much more appropriate.
On Sat, 21 Dec 2002, Hedderik van Rijn wrote:
[...]
If the truncation of long strings is official/known behavior of textConnection, the following text in textConnection's help page might need some revision, i.e., some more explicit statement that long strings might get truncated. (And, maybe also a definition of what a "completed line of output" is, i.e., ending in a "\n".)
Whatever else could it mean? I doubt if the end user would know what \n means if (s)he is so naive as not to know what an incomplete line is!
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Connections do correctly give you a warning if the internal line limit is exceeded. This is docuemnted in the source code, which is there for you to read.
I know, discovered that it did give warning if the string is sink'ed
directly, instead of going through xtable (and which was illustrated
with the examples in the previous email). After some more explorations,
it seems to be caused by cat'ing instead of print'ing a long string to
a textConnection using sink.
This code triggers a warning: (Same behavior of course if the
paste(...) is explicitly embedded in a print(...) statement)
con <- textConnection("output","w");
sink(file=con);
paste(rep("123456789!",1000),collapse="")
## Warning message:
## line truncated in output text connection
sink()
close(con)
Whereas this code snippet "silently" truncates the string, without
warning:
con <- textConnection("output","w");
sink(file=con);
cat(paste(rep("123456789!",1000),collapse=""))
sink()
close(con)
I'm not sure which function (if any) to blame, but I definitely think
that either cat or textConnection should have made sure that a warning
"came through". As you mentioned, it is naive to assume an arbitrary
line-length, but if the above code is not incorrect, my opinion is that
it should warn users of incorrect output, or state it in the help pages.
Looks to me as if text connections are being used where anonymous file connections would be much more appropriate.
Indeed, changing Sweave's RweaveLatexRuncode (line 1596 of tools, R
1.6.1) to use file/readLines instead of textConnection:
## HvR replaced: tmpcon <- textConnection("output", "w")
tmpcon <- file()
sink(file=tmpcon)
err <- NULL
if(options$eval) err <- RweaveEvalWithOpt(ce, options)
## HvR added (make sure the final line is complete (with final
EOL marker):
cat("\n")
sink()
## HvR added:
output <- readLines(tmpcon)
close(tmpcon)
solves the truncation problem for the Sweave/xtable/cat combination.
[...]
If the truncation of long strings is official/known behavior of textConnection, the following text in textConnection's help page might need some revision, i.e., some more explicit statement that long strings might get truncated. (And, maybe also a definition of what a "completed line of output" is, i.e., ending in a "\n".)
Whatever else could it mean? I doubt if the end user would know what \n means if (s)he is so naive as not to know what an incomplete line is!
While trying to figure out how to use anonymous file connections, I
come acros the following reference in the readLines help page:
If the final line is incomplete (no final EOL marker) the
behaviour depends on whether the connection is blocking or not.
I guess the addition of "(no final EOL marker)" would at least for me
be a useful extension to the textConnection help page.
After having spend a couple of hours with textConnections and other
redirections, and knowing more about how they work, I certainly see
your point. However, it might have saved me some initial confusion if
this would have been there in the first place.
At the same time, it might be valuable to add an explicit reference to
file() (besides the "See also: connections"), stating something along
the line of the combination file/readLines being preferred over
textConnection if the purpose is to process large chunks of output. (If
I gathered correctly from your remarks that anonymous file connections
are more appropriate in these situations.)
Thanks for the valuable comments, again, I learned a lot.
- Hedderik.
Hedderik van Rijn <hedderik at cmu.edu> writes:
Whereas this code snippet "silently" truncates the string, without
warning:
con <- textConnection("output","w");
sink(file=con);
cat(paste(rep("123456789!",1000),collapse=""))
sink()
close(con)
I'm not sure which function (if any) to blame, but I definitely think
that either cat or textConnection should have made sure that a warning
"came through". As you mentioned, it is naive to assume an arbitrary
line-length, but if the above code is not incorrect, my opinion is
that it should warn users of incorrect output, or state it in the help
pages.
The warning should be inside "output", shouldn't it? It actually isn't there, so perhaps it is getting appended to the already overlong string?? As I was trying to dig deeper, I ran into the following interesting segfault on RH8.0:
con <- textConnection("output","w");
sink(file=con);
cat(paste(rep("123456789!",1000),collapse=""))
sink()
close(con)
output
[1] "123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!123456789!! 123456789"
cat(paste(rep("123456789!",1000),collapse=""))
Program received signal SIGSEGV, Segmentation fault. 0x4207a4cb in strlen () from /lib/i686/libc.so.6 (The cat(...) call being obtained with up-arrow command recall) Apparently this is not quite reproducible and might depend on stuff in the workspace, but it looks suspicious. Will have a better look.
O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
Peter Dalgaard BSA <p.dalgaard at biostat.ku.dk> writes:
cat(paste(rep("123456789!",1000),collapse=""))
Program received signal SIGSEGV, Segmentation fault. 0x4207a4cb in strlen () from /lib/i686/libc.so.6 (The cat(...) call being obtained with up-arrow command recall)
To be precise, it happens *next* time I press "up"
Apparently this is not quite reproducible and might depend on stuff in the workspace, but it looks suspicious. Will have a better look.
It does seem to depend on my workspace and/or my history file, neither of which are particularly interesting. The segfault happens deep into readline calls, so it's not really appealing to try and track it down... Apparently, it still happens with r-devel from Dec. 9 (which is the most recent I have on that particular machine...).
O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
cat(paste(rep("123456789!",1000),collapse=""))
Program received signal SIGSEGV, Segmentation fault. 0x4207a4cb in strlen () from /lib/i686/libc.so.6 (The cat(...) call being obtained with up-arrow command recall)
To be precise, it happens *next* time I press "up"
This is the same behavior I encountered. The first time _always_ goes fine, the next time (also by pressing "up", "enter") sometimes(?) results in a segv and sometimes in a bus error. (Using R 1.6.1, Mac OS X, latest OS X update, 10.2.3? and just released fink installed.) As I was not sure if I could replicate it always, I only referred to it in a "P.S." in the second email I sent on this topic. - Hedderik.
This should be fixed in R-patched and R-devel (and has been since Saturday). Certainly it has gone away for me.
On Mon, 23 Dec 2002, Hedderik van Rijn wrote:
cat(paste(rep("123456789!",1000),collapse=""))
Program received signal SIGSEGV, Segmentation fault. 0x4207a4cb in strlen () from /lib/i686/libc.so.6 (The cat(...) call being obtained with up-arrow command recall)
To be precise, it happens *next* time I press "up"
This is the same behavior I encountered. The first time _always_ goes fine, the next time (also by pressing "up", "enter") sometimes(?) results in a segv and sometimes in a bus error. (Using R 1.6.1, Mac OS X, latest OS X update, 10.2.3? and just released fink installed.) As I was not sure if I could replicate it always, I only referred to it in a "P.S." in the second email I sent on this topic. - Hedderik.
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595