Skip to content

Saving R-objects to a database

9 messages · Christian Ruckert, Sean Davis, Brian Ripley +2 more

#
Someone solved the problem of saving R-objects to a database or can give 
me an answer like "this is not possible at the moment"?

All my attempts runned into problems with conversion between blob, raw 
and character representations. Saving the R-object to a file and then 
getting it into the db leads to similar problems. Is saving to the file 
system the only practicable solution?

In my special case I'm using a MySQL database with RMySQL package, but 
suggestions for other combinations are welcome.

Greetings,
Christian Ruckert
#
On Wed, Oct 1, 2008 at 5:53 AM, Christian Ruckert
<cruckert at uni-muenster.de> wrote:
Look at the serialize function.  You can serialize any R object which
can be stored into the database of your choice.  Give that a try and
let us know with a code example if you have problems.

Sean
#
This is what I posted on r-mailing list before:

 > ser = serialize(obj, NULL)
 > dbSendQuery(link, paste("insert into table values(1, '",ser,"')",sep=''))
 >
 > The field to save the object in the MySQL Database is of type blob.
 >
 > result = dbGetQuery(link,"select * from table where id=1")
 >
 > unser = unserialize(result[1,"object_column"], NULL)
 > Error in unserialize(result[1, "object_column"], NULL) : unknown 
input format
 > In addition: Warning message:
 > In unserialize(result[1, "object_column"], NULL) :
 >  unserialize()from a character string is deprecated and will be 
withdrawn in R 2.8.0

Brian Ripley answered:
"DBI needs to convert blobs to raw not character."

So saving to the db works fine, but function dbGetQuery seems to convert 
the result to a character string. Can I avoid this convertion or is this 
a problem with DBI package and will it get fixed?
#
On Wed, Oct 1, 2008 at 6:42 AM, Christian Ruckert
<cruckert at uni-muenster.de> wrote:
What happens if you serialize using "ascii=TRUE"?  Then, you could
probably save into a text field.  Admittedly, I haven't tried this.

Sean
#
On Wed, 1 Oct 2008, Sean Davis wrote:

            
The result is still a raw vector.  You need to encode the raw vector as 
text before saving to the DBMS, and unencode after retrieval.

AFAIK the problem referred to is specific to raw vectors and DBI, not 
general to all R-DBMS interfaces.
#
On Wed, Oct 1, 2008 at 8:54 AM, Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote:
Thanks for the clarification, again.

Christian, it might be easier (and likely faster) to store the URI in
the database and the object in Rda format on disk with a UUID as a
file name.  In the database world, I think the general consensus is
that it is best not to store objects like this in the database as the
overhead of backup, storage, retrieval, etc. are higher than storing
on the file system.

Sean

Sean
#
Sean Davis schrieb:
(Un-)encode with rawToChar now worked for a small object, but for bigger 
objects the string is truncated.

Thanks for your efforts, I've received an impression whats possible and 
what not. I think I'll give the file system solution a try.

Christian
#
On 1 Oct 2008, at 11:42, Christian Ruckert wrote:

            
we have used something like this (these are snippets which will not  
work as is... but should give you a starting point)

To Save:
     s_rdata <- if(is.null(rdata)) "" else serialize(rdata,  
connection=NULL, ascii=TRUE)

     if(class(s_rdata) == "raw") s_rdata <- rawToChar(s_rdata)

     sql <- paste("INSERT INTO my_table (dataset_id, rdata) ", "VALUES  
(", dataset_id, ", E'", s_rdata, "')", sep="")
     dbSendQuery(dbh, sql)

To Load:
   sql <- paste("SELECT * FROM my_table WHERE dataset_id =  
",dataset_id, sep="")
   dataset <- dbGetResult(dbSendQuery(dbh, sql))

   for(n in 1:length(dataset$rdata_type)) {
     rdata_type <- dataset$rdata_type[n]
     s_rdata <- dataset$rdata[n]

     rdata <- if(nchar(s_rdata) > 0) unserialize(s_rdata) else NULL
   }

(this was for use with PostgreSQL by the way, hence the E'' syntax  
above)

HTH

adam
#
Hi Christian,
Christian Ruckert wrote:
[...]
You need to serialize with ascii=TRUE otherwise you get a raw vector with
lots of nul bytes that you won't be able to turn into a text without losing
part of it:
   - rawToChar() will truncate the text at the first nul byte
   - intToUtf8(as.integer()) will silently ignore those nul bytes.
This is because embedded nuls are not supported in character vectors anymore
(which is a good thing).

I'm not 100% sure but I think you won't get any nul bytes if you use
serialize( , ascii=TRUE). Try this:

   objToText <- function(object)
       rawToChar(serialize(object, NULL, ascii=TRUE))

   textToObj <- function(text)
       unserialize(charToRaw(text))

 > x0 <- list(1, 2, 3)
 > objToText(x0)
[1] "A\n2\n133120\n131840\n19\n3\n14\n1\n1\n14\n1\n2\n14\n1\n3\n"
 > textToObj(objToText(x0))
[[1]]
[1] 1

[[2]]
[1] 2

[[3]]
[1] 3

I've not tried this on big objects though.

Note that, despite serialize() man page claiming that ascii=FALSE will produce
a more compact binary representation of the object, my experience turns out
to be the opposite.

Cheers,
H.