Huge performance difference between implicit and explicit print
Minor point and probably not relevant to the speed issue, but df() is the density function for the F distribution, so I have (recently) stopped using it for referring to data.frames. Sean
On 30 October 2013 23:32, Gabriel Becker <gmbecker at ucdavis.edu> wrote:
Hadley, As far as I can tell from a quick look, it is because implicit printing uses a different mechanism which does a fair bit more work.
From comments in print.c in the R sources:
* print.default() -> do_printdefault (with call tree below) * * auto-printing -> PrintValueEnv * -> PrintValueRec * -> call print() for objects * Note that auto-printing does not call print.default. * PrintValue, R_PV are similar to auto-printing. PrintValueEnv includes, among other things, checks for functions, S4 objects, and s3 objects before constructing (in C code) an R call to print for S3 objects and show for S4 objects and evaluating it using Rf_eval. So there is an extra trip to the R evaluator. I imagine that extra work is where the hangup is but that is a slightly-informed guess as I haven't done any detailed timings or checks. Basically my understanding of the processes is as follows: print(df) print call is evaluated, S3 dispatch happens, print.default in C is called, result printed to terminal, print call returns df expression "df" evaluated, auto-print initiated, type of object returned by expression is determined, print call is constructed in C code, print call is evaluated in C code, THEN all the stuff above happens. I dunno if that helps or not as I can't speak to how to change/fix it atm. ~G On Wed, Oct 30, 2013 at 3:22 PM, Hadley Wickham <h.wickham at gmail.com> wrote:
Hi all,
Can anyone help me understand why an implicit print (i.e. just typing
df at the console), is so much slower than an explicit print (i.e.
print(df)) in the example below? I see the difference in both Rstudio
and in a terminal.
# Construct large df as quickly as possible
dummy <- 1:18e6
df <- lapply(1:10, function(x) dummy)
names(df) <- letters[1:10]
class(df) <- c("myobj", "data.frame")
attr(df, "row.names") <- .set_row_names(18e6)
print.myobj <- function(x, ...) {
print.data.frame(head(x, 2))
}
start <- proc.time(); df; flush.console(); proc.time() - start
# user system elapsed
# 0.408 0.557 0.965
start <- proc.time(); print(df); flush.console(); proc.time() - start
# user system elapsed
# 0.019 0.002 0.020
sessionInfo()
# R version 3.0.2 (2013-09-25)
# Platform: x86_64-apple-darwin10.8.0 (64-bit)
#
# locale:
# [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#
# attached base packages:
# [1] stats graphics grDevices utils datasets methods base
Thanks!
Hadley
--
Chief Scientist, RStudio
http://had.co.nz/
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
--
Gabriel Becker
Graduate Student
Statistics Department
University of California, Davis
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel