Skip to content

saveRDS() and readRDS() Why?

14 messages · Eric Berger, Jan van der Laan, Robert David Burbidge +2 more

#
31736 bytes  # from scraping a non-reproducible web address.
Then copy to a Linux session
[1] "rawData"
112 bytes
[1] "rawData" # only the name and something to make up 112 bytes
Have I misunderstood the syntax?

It's an old version on Windows.  I haven't used Windows R since then.

major          3                                          
minor          2.4                                        
year           2016                                       
month          03                                         
day            16                                         


I've tried R-3.5.0 and R-3.5.1 Linux versions.

In case it's material ... 

I couldn't get the scraping to work on either of the R installations
but Windows users told me it worked for them.  So I thought I'd get
the R object and use it.  I could understand accessing the web address
could have different permissions for different OSes, but should that
affect the R objects?

TIA
#
What do you see at the OS level?
i.e. on windows
DIR rawData.rds
on linux
ls -l rawData.rds
compare the file sizes on both.


On Wed, Nov 7, 2018 at 9:56 AM Patrick Connolly <p_connolly at slingshot.co.nz>
wrote:

  
  
#
Hi Patrick,

 From the help: "save writes a single line header (typically "RDXs\n") 
before the serialization of a single object".

If the file sizes are the same (see Eric's message), then the problem 
may be due to different line terminators. Try serialize and unserialize 
for low-level control of saving/reading objects.

Rgds,

Robert
On 07/11/18 08:13, Eric Berger wrote:
#
They're both about 3kb.
On 7/11/18 9:13 PM, Eric Berger wrote:

  
  
#
On Wed, 07-Nov-2018 at 08:27AM +0000, Robert David Burbidge wrote:
|> Hi Patrick,
|> 
|> From the help: "save writes a single line header (typically
|> "RDXs\n") before the serialization of a single object".
|> 
|> If the file sizes are the same (see Eric's message), then the
|> problem may be due to different line terminators. Try serialize and
|> unserialize for low-level control of saving/reading objects.

I'll have to find out what 'serialize' means.

On Windows, it's a huge table, looks like it's all hexadecimal.  

On Linux, it's just the text string 'rawData' -- a lot more than line
terminators.

Have I misunderstood what the idea is?  I thought I'd get an identical
object, irrespective of how different the OS stores and zips it.



|> 
|> Rgds,
|> 
|> Robert
|> 
|>
|> On 07/11/18 08:13, Eric Berger wrote:
|> >What do you see at the OS level?
|> >i.e. on windows
|> >DIR rawData.rds
|> >on linux
|> >ls -l rawData.rds
|> >compare the file sizes on both.
|> >
|> >
|> >On Wed, Nov 7, 2018 at 9:56 AM Patrick Connolly <p_connolly at slingshot.co.nz>
|> >wrote:
|> >
|> >> From a Windows R session, I do
|> >>
|> >>>object.size(rawData)
|> >>31736 bytes  # from scraping a non-reproducible web address.
|> >>>saveRDS(rawData, file = "rawData.rds")
|> >>Then copy to a Linux session
|> >>
|> >>>rawData <- readRDS(file = "rawData.rds")
|> >>>rawData
|> >>[1] "rawData"
|> >>>object.size(rawData)
|> >>112 bytes
|> >>>rawData
|> >>[1] "rawData" # only the name and something to make up 112 bytes
|> >>Have I misunderstood the syntax?
|> >>
|> >>It's an old version on Windows.  I haven't used Windows R since then.
|> >>
|> >>major          3
|> >>minor          2.4
|> >>year           2016
|> >>month          03
|> >>day            16
|> >>
|> >>
|> >>I've tried R-3.5.0 and R-3.5.1 Linux versions.
|> >>
|> >>In case it's material ...
|> >>
|> >>I couldn't get the scraping to work on either of the R installations
|> >>but Windows users told me it worked for them.  So I thought I'd get
|> >>the R object and use it.  I could understand accessing the web address
|> >>could have different permissions for different OSes, but should that
|> >>affect the R objects?
|> >>
|> >>TIA
|> >>
|> >>--
|> >>~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.
|> >>    ___    Patrick Connolly
|> >>  {~._.~}                   Great minds discuss ideas
|> >>  _( Y )_                 Average minds discuss events
|> >>(:_~*~_:)                  Small minds discuss people
|> >>  (_)-(_)                              ..... Eleanor Roosevelt
|> >>
|> >>~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.
|> >>
|> >>______________________________________________
|> >>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
|> >>https://stat.ethz.ch/mailman/listinfo/r-help
|> >>PLEASE do read the posting guide
|> >>http://www.R-project.org/posting-guide.html
|> >>and provide commented, minimal, self-contained, reproducible code.
|> >>
|> >	[[alternative HTML version deleted]]
|> >
|> >______________________________________________
|> >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
|> >https://stat.ethz.ch/mailman/listinfo/r-help
|> >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
|> >and provide commented, minimal, self-contained, reproducible code.
#
Your understanding is correct. It works fine for me.

On Wed, Nov 7, 2018 at 10:48 AM Patrick Connolly <p_connolly at slingshot.co.nz>
wrote:

  
  
#
If the file sizes are the same, then presumably both contain the binary data. From the serialize function help:

"As almost all systems in current use are little-endian, xdr = FALSE can be used to avoid byte-shuffling at both ends when transferring data from one little-endian machine to another (or between processes on the same machine). Depending on the system, this can speed up serialization and unserialization by a factor of up to 3x."

So you could try:

# windows (not run)
f <- file("rawData.rds", open="w")
serialize(rawData, f, xdr = FALSE)
close(f)

# linux
rawData <- unserialize(file = "rawData.rds")

HTH
On 07/11/18 08:45, Patrick Connolly wrote:

            
#
Are you sure you didn't do saveRDS("rawData", file = "rawData.rds") 
instead of saveRDS(rawData, file = "rawData.rds") ? This would explain 
the result you have under linux.

In principle saveRDS and readRDS can be used to copy objects between 
R-sessions without loosing information.

What does readRDS return on windows with the same file?

What type of object is rawData? Do str(rawData). Some objects created by 
packages cannot be serialized, e.g. objects that point to memory 
allocated by a package. The pointer is then serialized not the memory 
pointed to.

Also, if the object is generated by a package, you might need to load 
the package to get the printing etc. of the object right.

HTH,

Jan
On 07-11-18 09:45, Patrick Connolly wrote:
#
Patrick,

I cannot reproduce this behaviour. I'm using:

Windows 8.1; R 3.5.1; RStudio 1.1.463

running in a VirtualBox on Ubuntu 18.04 with R 3.4.4; RStudio 1.1.456

The file size of rawData.rds is always 88 bytes in my example and od 
gives the same results on Windows and Linux.

I am using a VirtualBox shared folder to transfer from Windows to Linux.

Could you provide details of your machines?

Rgds,

Robert
On 07/11/18 07:56, Patrick Connolly wrote:
#
Many thanks to Berwin, Eric, Robert, and Jan for their input.

I had hoped it was as simple as because I typed 

saveRDS("rawData", file = "rawData.rds") on the Windows side.
but that wasn't the case.

Robert Burbridge suggested:

 windows (not run)
f <- file("rawData.rds", open="w")
serialize(rawData, f, xdr = FALSE)
close(f)

# linux
rawData <- unserialize(file = "rawData.rds")

That didn't work: 
Error in unserialize(file = "rawData.rds") : 
  unused argument (file = "rawData.rds")
(the argument isn't 'file')

Nor did
Error in unserialize("rawData.rds") : 
  character vectors are no longer accepted by unserialize()

However 

readRDS(file = "rawData.rds") did!

So what I needed was serialize but not unserialize.

I still don't know Why, but I know How.
#
Hmm.. and nobody has been able to reproduce your problem, right?

IIUC, currently you are suggesting that [on Windows], if you do

      saveRDS(rawdata, file="rawdata.rds")

the resulting file is does not work with    readRDS()  on Linux.
What again are your R versions on the two platforms?

Could you  dput() -- provide a (short if possible) version of rawdata where
that problem occurs ?

Best,
Martin
#
Apologies, unserialize takes a connection, not a file, so you would need 
something like:

# linux (not run)
f <- file("rawData.rds", open="r")
rawData <- unserialize(f)
close(f)

The help file states that readRDS will read a file created by serialize 
(saveRDS is a wrapper for serialize).

It appears that the problem was "byte-shuffling at both ends when 
transferring data from one little-endian machine to another" and was 
worked around by using xdr = FALSE. So, this wouldn't necessarily work 
when transferring between big-endian and little-endian machines.
On 08/11/18 07:27, Patrick Connolly wrote:
1 day later
#
On Thu, 08-Nov-2018 at 11:06AM +0100, Martin Maechler wrote:
|> >>>>> Patrick Connolly 
|> >>>>>     on Thu, 8 Nov 2018 20:27:24 +1300 writes:

[...]

|> > 
|> > I still don't know Why, but I know How.
|> 
|> Hmm.. and nobody has been able to reproduce your problem, right?
|> 
|> IIUC, currently you are suggesting that [on Windows], if you do
|> 
|>       saveRDS(rawdata, file="rawdata.rds")
|> 
|> the resulting file is does not work with    readRDS()  on Linux.
|> What again are your R versions on the two platforms?


It's an old version on Windows.  I haven't used Windows R since then.

major          3                                          
minor          2.4                                        
year           2016                                       
month          03                                         
day            16                                         


I've tried R-3.5.0 and R-3.5.1 Linux versions.  The problem might be
entirely because of the ancient Windows version. 


|> 
|> Could you  dput() -- provide a (short if possible) version of rawdata where
|> that problem occurs ?

I can't make a smaller version of rawdata which comes from scraping a
non-public web address, but next week when I'm back where those
machines are, I'll try it with a small data frame which is
reproducible.



|> 
|> Best,
|> Martin
|> 
|> 
|> > ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.   
|> >    ___    Patrick Connolly   
|> >  {~._.~}                   Great minds discuss ideas    
|> >  _( Y )_  	         Average minds discuss events 
|> > (:_~*~_:)                  Small minds discuss people  
|> >  (_)-(_)  	                      ..... Eleanor Roosevelt
|> > 	  
|> > ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.
|> > 
|> > ______________________________________________
|> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
|> > https://stat.ethz.ch/mailman/listinfo/r-help
|> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
|> > and provide commented, minimal, self-contained, reproducible code.
1 day later
#
The solution was very simple.  Don't use the same name for the rds
file  as used for the R object, viz a vie:

saveRDS(x, file = "x.rds")
and
x <- readRDS(file = "x.rds")

will not work; however

saveRDS(x, file = "y.rds")
and
x <- readRDS(file = "y.rds")
will work.

An undocumented feature?

Thanks to all who contributed.
On Sat, 10-Nov-2018 at 08:48PM +1300, Patrick Connolly wrote:

        
|> On Thu, 08-Nov-2018 at 11:06AM +0100, Martin Maechler wrote:
|> 
|> |> >>>>> Patrick Connolly 
|> |> >>>>>     on Thu, 8 Nov 2018 20:27:24 +1300 writes:
|> 
|> [...]
|> 
|> |> > 
|> |> > I still don't know Why, but I know How.
|> |> 
|> |> Hmm.. and nobody has been able to reproduce your problem, right?
|> |> 
|> |> IIUC, currently you are suggesting that [on Windows], if you do
|> |> 
|> |>       saveRDS(rawdata, file="rawdata.rds")
|> |> 
|> |> the resulting file is does not work with    readRDS()  on Linux.
|> |> What again are your R versions on the two platforms?
|> 
|> 
|> It's an old version on Windows.  I haven't used Windows R since then.
|> 
|> major          3                                          
|> minor          2.4                                        
|> year           2016                                       
|> month          03                                         
|> day            16                                         
|> 
|> 
|> I've tried R-3.5.0 and R-3.5.1 Linux versions.  The problem might be
|> entirely because of the ancient Windows version. 
|> 
|> 
|> |> 
|> |> Could you  dput() -- provide a (short if possible) version of rawdata where
|> |> that problem occurs ?
|> 
|> I can't make a smaller version of rawdata which comes from scraping a
|> non-public web address, but next week when I'm back where those
|> machines are, I'll try it with a small data frame which is
|> reproducible.
|> 
|> 
|> 
|> |> 
|> |> Best,
|> |> Martin
|> |> 
|> |> 
|> |> > ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.   
|> |> >    ___    Patrick Connolly   
|> |> >  {~._.~}                   Great minds discuss ideas    
|> |> >  _( Y )_  	         Average minds discuss events 
|> |> > (:_~*~_:)                  Small minds discuss people  
|> |> >  (_)-(_)  	                      ..... Eleanor Roosevelt
|> |> > 	  
|> |> > ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.
|> |> > 
|> |> > ______________________________________________
|> |> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
|> |> > https://stat.ethz.ch/mailman/listinfo/r-help
|> |> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
|> |> > and provide commented, minimal, self-contained, reproducible code.
|> 
|> -- 
|> ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.   
|>    ___    Patrick Connolly   
|>  {~._.~}                   Great minds discuss ideas    
|>  _( Y )_  	         Average minds discuss events 
|> (:_~*~_:)                  Small minds discuss people  
|>  (_)-(_)  	                      ..... Eleanor Roosevelt
|> 	  
|> ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.
|> 
|> ______________________________________________
|> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
|> https://stat.ethz.ch/mailman/listinfo/r-help
|> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
|> and provide commented, minimal, self-contained, reproducible code.