Skip to content

parse_Rd and/or lazyload problem

10 messages · Mark Bravington, Duncan Murdoch, Henrik Bengtsson +2 more

#
Yes, still there in R-patched.

(Still haven't got to your code, this was in
I did try, but it's not completely possible, because 'makeLazyLoadDB' is internal and there is no public alternative (a pity-- it's useful). However, the problem(s) can be demonstrated without directly calling 'parse_Rd', and with 'lazyLoad' (public) instead of 'fetchRdDB' (private), as per "pointer 1" below. If you have a look at 'tools:::.install_package_Rd_objects', you'll see that my use of 'makeLazyLoadDB' is quite standard.

The problem is not easy to reproduce. It took 4-5 hours work to get the 3-line reproducible example that I posted, plus another couple since, so I'm also reluctant to spend more time...

The examples in my previous post still apply-- the first one involves just 3 statements-- but here are some more pointers I've unearthed since:


1. Sometimes 'fetchRdDB' or 'lazyLoad' called directly from the prompt doesn't work, but public 'Rd_db' (which directly calls 'fetchRdDB') does. I've experimented with copying the installed 'tools' package into a new library "d:/temp/fakelib", then stuff like this:

test> e <- new.env()
test> lazyLoad( 'd:/temp/fakelib/tools/help/tools', e) # original files tools.rdx, tools.rdb
test> e <- as.list( e) # force evaluation
test> tools:::makeLazyLoadDB( e, 'd:/temp/fakelib/tools/help/tools') # modify tools.rd*
test> e1 <- new.env()
test> lazyLoad( 'd:/temp/fakelib/tools/help/tools', e1)
test> as.list( e1) # try to force evaluation...
Error in as.list.environment(e1) : 
  cannot allocate memory block of size 2.7 Gb
test> 
test> Rd_db( 'tools', 'd:/temp/fakelib') # no probs !?


2. Sometimes 'fetchRdDB' or 'lazyLoad' will fail in one R session, but will work in a fresh session on exactly the same files. For example, after restarting R, the previous commands involving 'e1' work fine.

Mark
#
On 31/10/2009 10:18 PM, Mark.Bravington at csiro.au wrote:
Okay, then we both agree we should drop it.

Duncan Murdoch
#
No we don't. I can't provide a functioning mvbutils, or debug, until this is resolved.

I am trying to be a good citizen and prepare reproducible bug reports-- e.g. the 3 line example. It would be quicker for me to write some ugly hack that modifies base R and gets round the problem *for me*, but that doesn't seem the best outcome for R. A culture which discourages careful bug reporting is unhealthy culture.

Mark Bravington
#
On 01/11/2009 3:12 PM, Mark.Bravington at csiro.au wrote:
Sorry.  What I thought you said was that you had spent several hours on 
it and didn't want to spend more time on it.  I've told you I don't want 
to work on it either.

If there is no way to trigger this bug without using internals, then it 
has not been demonstrated to be a bug in R.  It might be one, or it 
might be a bug in your code.  Often I'll work on things that are 
demonstrated bugs, but I won't commit several hours to debugging your code.

Duncan Murdoch
#
If of any help, I got a related error message in my browser (Firefox),
from doing this:

1. In .Rprofile I set options(help.ports=6850)

2. I have one "R HELP" session running where I do help.start() so that
I always have one session providing the help pages.  This is to avoid
having to redo help.start() if I restart R etc, which tend to do alot
when I do package development.

3. While installing a package that I develop, I did reload (Ctrl-R)
multiple times on a HTML help page during the process.  This caused
the following message to be display in the browswer:

  Error in tools:::fetchRdDB(RdDB, helpdoc) :
    cannot allocate memory block of size 2.3 Gb

This message remains the same if I do reload.

4. After restarting the "R HELP" session above, the above page is
displayed properly.
R version 2.10.0 Patched (2009-10-26 r50212)
i386-pc-mingw32

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] tools_2.10.0

/Henrik
On Sun, Nov 1, 2009 at 12:28 PM, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
2 days later
#
I sympathize with not wanting to spend hours on other people's code-- and I appreciate that you have spent a lot of time off-list trying to help me with 'parse_Rd' recently.

But in this case: 

(i) there were only 3 lines of code in the first example! If I've done something wrong in those 3 lines, it shouldn't take several hours to diagnose...

(ii) the real problem may well not be in 'parse_Rd' but in 'lazyLoad' etc, as the subject line says. Presumably you picked up the original thread because you're the 'parse_Rd' author. If you're sure it's not 'parse_Rd', or if you don't want to look at the code for other reasons, perhaps you could alert the author of the lazyloading routines (Luke Tierney?) to see if he's willing to look into it.

(iii) I deliberately haven't submitted a formal bug report, because my reproducible examples need to call 'makeLazyLoadDB'. (Though Henrik B is able to trigger the same problem without it.) As you say, by R's definition of a bug (which  certainly isn't the same as mine) I cannot demonstrate this is a "bug". So the R-bug lens may not be the correct filter for you to apply here.

Further to the problem itself: Henrik Bengtsson's report seems symptomatic of the same thing. I've generally hit the bug (damn!) only on the second or subsequent time in a session that I've lazyloaded, which is one reason it's hard to make reproducible. If you want a reproducible example to help track the bug down, then my original 3-liner would be easier to work with. However, while that one does reliably trigger an error on my laptop with 2GB R-usable memory, it doesn't on my 4GB-usable desktop. For that machine, a reproducible sequence with the only internal function being 'makeLazyLoadDB' is: 

file.copy( 'd:/temp/Rdiff.Rd', 'd:/temp/scrunge.Rd') # Rdiff.Rd from 'tools' package source

eglist <- list( scrunge=parse_Rd( ?'d:/temp/scrunge.Rd'))
tools:::makeLazyLoadDB( eglist, 'd:/temp/ll')
e <- new.env()
lazyLoad( 'd:/temp/ll', e)
as.list( e) # force; OK

eglist1 <- list( scrunge=parse_Rd( ?'d:/temp/Rdiff.Rd'))
tools:::makeLazyLoadDB( eglist1, 'd:/temp/ll')
e <- new.env()
lazyLoad( 'd:/temp/ll', e)
as.list( e) # Splat

It doesn't make any difference which file I process first; the error comes the second time round.


Mark
#
Hi,
On 11/3/09 6:51 PM, Mark.Bravington at csiro.au wrote:
If I adjust this example in terms of paths and run on OS X, I get the 
following error on the second run:

 > as.list(e) # Splat
Error in as.list.environment(e) : internal error -3 in R_decompress1

I haven't looked further yet.

+ seth
#
Here is a more stripped down variant generates and error on OS X for me:

     mkEg <- function(tm) list(scrunge = as.POSIXct(tm))

     extract <- function(db) {
 	e<- new.env()
 	lazyLoad(db, e)
 	as.list( e)
     }

     eg <- mkEg("2009-11-04 12:49:53")
     eg1 <- mkEg("2009-11-04 12:49:28")

     tools:::makeLazyLoadDB( eg, '/tmp/ll')
     extract('/tmp/ll') # force; OK

     tools:::makeLazyLoadDB( eg1, '/tmp/ll')
     extract('/tmp/ll')

Changing the second set of /tmp/ll makes the symptom go away.

I believe this comes down to unintended use of the lazyload mechanism
-- in particular it is not intended that a database be rewritten after
it has been loaded.  There is a chaching mechanism for improved
performance on slow file systems, and I believe what is happening is
that the new indices are being used to look in the old chached
data. There might be some merit in having lazyLoad call
R_lazyLoadDBflush.

luke
On Tue, 3 Nov 2009, Seth Falcon wrote:

            

  
    
#
Great-- thanks for the info.

For now, hopefully I can get the behaviour I want by sticking a .Call( 'R_lazyLoadDBflush'...) [as per 'detach'] before calling 'lazyLoad'. Seems to work on my examples, but please let me know if you don't think it'll work generally-- if not, I could presumably create the files under different names and then change them.

Is there merit in making 'makeLazyLoadDB' public, just as 'lazyLoad' already is? It's useful.

Mark
#
On Thu, 5 Nov 2009, Mark.Bravington at csiro.au wrote:

            
The internals of the lazy load mechanism are intended to be private.
This is now emphasized in the lazyLoad help page.  This is under
active development, and we need the freedom to be able to change these
internals as needed.  They may change without notice, so using them
directly in user code or packages is not a good idea and is strongly
discouraged.

luke