Skip to content

Don't dput() data frames?

4 messages · R. Michael Weylandt, Simon Urbanek

#
/src/main/attrib.c contains this comment in row_names_gets():

  /* This should not happen, but if a careless user dput()s a
           data frame and sources the result, it will */

which svn blame says Prof Ripley placed there in r39830 with the
commit message "correct the work of dput() on the row names of a data
frame with compact representation."

Is there a problem / better way to use the result of a hefty dput than
source()ing it? This seems to work rather robustly:

data(iris)
source(textConnection(paste0("iris2 <- ", capture.output(dput(iris)))))
identical(iris, iris2)

Cheers,
Michael
#
On Aug 28, 2012, at 1:51 PM, R. Michael Weylandt wrote:

            
It's pretty much the least efficient and most dangerous (as in insecure) way. That's why there is serialization instead ...

Cheers,
Simon
#
On Tue, Aug 28, 2012 at 1:00 PM, Simon Urbanek
<simon.urbanek at r-project.org> wrote:
My most common use of dput() is for sending plain text data over
r-help; would this be an official/unofficial advisement to push folks
to use

serialize(x, NULL, ascii = TRUE)

instead? At first blush that seems to be less space efficient:

sum(nchar(capture.output(dput(iris)))) # 3767

sum(nchar(serialize(iris, NULL, ascii = TRUE))) # 5922: probably even
more if we dump it properly to plain text in a copy+pasteable form

Michael
#
On Aug 28, 2012, at 2:14 PM, "R. Michael Weylandt" <michael.weylandt at gmail.com> wrote:

            
No, if you want small, readable snippets you can certainly use dput(), but when you say data frame I don't imagine anything that can be sent by e-mail :). Obviously, for toy examples you don't care about performance ...

As for size efficiency:
[1] 1100
so in base64 that would be about 1.5k - much less than any of the above.

Cheers,
Simon