Skip to content

Speeding up library loading

15 messages · Martin Maechler, Uwe Ligges, Ali - +3 more

#
(1) When R tries to load a library, does it load 'everything' in the library 
at once?

(2) Is there any options to 'load as you go'?
#
Ali - wrote:

            
No, see ?lazyLoad
Well, this is the way R does it....

Uwe Ligges
#

        
UweL> Ali - wrote:
>> (1) When R tries to load a library, does it load 'everything' in the 
    >> library at once?

    UweL> No, see ?lazyLoad

are you sure Ali is talking about *package*s.
He did use the word "library" though, and most of us (including
Uwe!) know the difference...

    >> (2) Is there any options to 'load as you go'?

    UweL> Well, this is the way R does it....

for packages yes, because of lazyloading, as Uwe mentioned above.

For libraries, (you know: the things you get from compiling and
linking C code ..), it may be a bit different.

What do you really mean, packages or libraries,
Ali?
#
Well, the terminology used here is a bit confusing. ?library shows something 
like 'library(package)' and that's why I used the term 'library' for loading 
packages. The package does load some dll's but what I meant by library was 
actually package.

The package I am working on currently has one big R file (~ 4 Mb) and this 
causes at least 2 troubles:

(1) Things are slow:

    (a) Installation with (LazyLoad = Yes) is slow. Then when the library is 
loaded into R, the loading is slow too. So LazyLoad is of not big help.

    (b) Installation with (SaveImage = Yes) is -extremely- slow. To give you 
some idea, compiling the associated C++ code takes around 10 mins while 
saving the R images takes more than 40 mins (the package is a wrapper for 
some C++ libraries. All the R functions do is to call .Call). this doesn't 
improve the loading speed as well.
    (c) Installation with (LazyLoad = Yes) AND (SaveImage = Yes) causes this 
error:

    preparing package <package_name> for lazy loading
    make: *** [lazyload] Error 1
    *** Installation of <package_name> failed ***

    It is likely that this happens because of some memory problems.

(2) After all, when the package is loaded, not surprisingly, loads of memory 
is taken. It seems that the whole (huge) file is loaded into R at once and 
turning LazyLoad on or off doesn't make a difference when the package is 
big.
#
Ali - wrote:
4Mb R file just containing .Call()s? Never seen something like that.
If these are all very small functions, lazy load won't be of that 
advantage, because you have to load the index file anyway.

You know, R including all base and recommended packages has just ~ 6Mb 
of R code. Are you really sure about your code?

Uwe Ligges
#
Positively. The wrapped library is actually much bigger than R, it brings a 
few hundered new classes to R. The library has been already wrapped to other 
languages like java, and the loading speed for these other languages is 
quite reasonable. I cannot see any reasons why not this can be done with R 
too -- as a computational application R is supposed to be efficient in all 
ways.

It seems that, so far, no packages as big as this one have been created for 
R. I would appreciate any clues from the development team for improving the 
performance of big packages in R.
#
Is it possible to break the package into multiple parts, perhaps 
like a bundle?  Then you could only load the parts that you need 
at any particular time.

-roger
Ali - wrote:

  
    
#
It could be done, but the question is, what if one of the packages in the 
bundle depends on all of the rest? And the bigger question is, why lazy 
loading is not efficient when it comes to many small functions?
#
I think the reason, as Uwe already said, is that you have to load 
the lazyload index file, and in your case that file is likely to 
be as large as the R file itself.

-roger
Ali - wrote:

  
    
#
Ali - wrote:
Lazy loading just converts an object into a small instruction to load 
the object. If the object was already small, there's no advantage to 
that.  It's mainly designed to avoid memory use (some rarely used 
objects can be gigantic).

Duncan Murdoch
#
On Mon, 25 Apr 2005, Duncan Murdoch wrote:

            
loading is trying to solve. We didn't have a problem with packages that 
have huge number of small objects, but we did have a problem with packages 
that had a moderate number of moderately large objects.

In addition, trying to optimize performance is not usually a good idea 
unless you can measure the performance of different implementations on 
real applications, and we didn't have applications like that.


 	-thomas
#
Assume 100 C++ classes each class having 100 member functions. After 
wrapping these classes into R, if the wrapping design is class-oriented we 
should have like 100 objects. At the same time, if the wrapping design is 
function-oriented we have like 10`000 objects which are too lazy for lazy 
loading.

I have tried wrapping exactly the same classes by R.oo based on S3 and the 
outcome package was much faster in both installation and loading. The 
package went slow once I tried it with S4. I guess R.oo makes the package 
more class-oriented while S4 object-orientation is really function-oriented 
causing all this friction in installation and loading.

Is there any way to ask R to lazy-load each object as a 'bundle of S4 
methods with the same class'?
#
Ali - wrote:
I don't think so.  There are ways to load a bundle of objects all at 
once (put them in an environment, attach the environment), but S4 
methods aren't self-contained, they need to be registered with the 
system.   You could probably write a function to load them and register 
them all at once, but I don't think it exists now.

Duncan Murdoch
#
(1) What is the difference between loading and registering objects in R?

(2) You are talking about 'loading and registering at once'. Isn't this 'at 
once' the cause of slow loading?

(3) Doesn't having many environments mean lose of efficiency again?
#
Ali - wrote:
Loading just creates the object.  Registering it is what setMethod() and 
such calls do.  They allow the system to know that it should call that 
function in response to a call to the generic with a certain signature, 
and so on.
I haven't done any profiling, but I would guess the registering is the 
slow part.
Yes, I'd guess that looking things up in a chain of 100 environments is 
slower than looking them up in one gigantic environment.  Again, I 
haven't done any profiling, but I'd guess it would come close to being 
100 times worse, i.e. in practice order N time instead of order 1 time 
(but I'm sure these aren't the theoretical limits).

But you were asking about delayed loading, so I was assuming that in 
most cases you would only load a small subset of those 100 environments. 
  I haven't tried any big problems like yours, but I would be willing to 
guess that registering is slower than O(N), so cutting down on the 
number of things you register will give a big improvement on loading speed.

But you do have to remember the two pieces of advice you've been given 
in this thread:

   - nobody else has written a package with ten thousand methods, so 
you're likely to find things out that nobody else knows about.

   - The S4 object model is quite different from that of C++, so it 
probably doesn't make sense to have a direct correspondence between C++ 
classes and methods and R classes and methods.  There are probably much 
more efficient ways to get access to the functionality of your C++ library.

Duncan Murdoch