Skip to content

[Rcpp-devel] Rcpp:wrap - any limitations for vector size?

7 messages · Mathias Bader, Davor Cubranic, Dirk Eddelbuettel +2 more

#
Hello

 I am working with Rcpp as part of my master thesis to speed up an MCMC 
 simulation. I happen to have a wired bug which I am trying to find for 
 some days now and I think I narrowed it down to the part of the code, 
 where C++ returns a list back to R.
 I can output the list in C++ before returning it, everything looks fine 
 then. When trying to output the same data after receiving it in R, R 
 Studio crashes.

 I am connecting R and C++ using the following code:

 # define the C++ compiled function for the MCMC simulation
 require('inline')
 settings <- getPlugin("Rcpp")
 settings$env$PKG_CXXFLAGS <- paste("-I", getwd(), sep="")
 src_dppc <- paste(readLines("dppClustering.cpp"), collapse="\n")
 header_includes <- paste(readLines("headerIncludes.h"), collapse="\n")
 compiledClustering <- cxxfunction(signature(
   R_data="numeric",
   R_initial_point_assignment="numeric",
   R_alpha="numeric",
   R_mcmc_steps="integer",
   R_sd_min="numeric",
   R_next_cluster_index="integer",
   R_cluster_index_mapping="numeric",
   R_cluster_ids="numeric",
   R_cluster_ids_silhouette_values="numeric",
   R_performed_steps_count="integer",
   R_likelihood_method="character"
 ), src_dppc, plugin="Rcpp", settings=settings, 
 includes=header_includes)



 The call in R looks as follows:

 # perform clustering
 results <- compiledClustering(
   R_data                     = data,
   R_initial_point_assignment = d$final.clustering,
   R_alpha                    = dppc.config.alpha,
   R_mcmc_steps               = steps.to.perform,
   R_sd_min                   = dppc.config.sd.min,
   R_next_cluster_index       = d$next.cluster.index,
   R_cluster_index_mapping    = d$cluster.index.mapping,
   R_cluster_ids              = as.vector(t(d$cluster.ids)),
   R_cluster_ids_silhouette_values = d$cluster.ids.silhouette.values,
   R_performed_steps_count    = d$performed.steps.count,
   R_likelihood_method        = d$method.to.use
 )


 If I try to output the returned list after that call using 
 "cat(results)", it will output the returned list till the third last 
 element. From that element it outputs only the name of the element, but 
 not the value and then it crashes before outputting the last two 
 elements. Always at that point.
 In my C++ code I create the return value using the following code:


 // create return object
 Rcpp::List results = Rcpp::List::create(
 	Named("clusterAssignment")          = wrap(r_cluster_assignment),
 	Named("debugOutput")                = wrap(r_debug_output),
 	Named("clusterCountIncreases")      = wrap(r_cluster_count_increases),
 	Named("clusterCountDecreases")      = wrap(r_cluster_count_decreases),
 	Named("dataMovementPointIndices")   = 
 wrap(r_data_movement_point_indices),
 	Named("dataMovementNewClusters")    = 
 wrap(r_data_movement_new_clusters),
 	Named("clusterIndexMappingRev")     = 
 wrap(r_cluster_index_mapping_rev),
 	Named("nextClusterIndex")           = 
 wrap(r_next_cluster_index_return),
 	Named("clusterCounts")              = wrap(r_cluster_counts),
 	Named("McmcClusterIds")             = wrap(r_mcmc_cluster_ids),
 	Named("clusterIds")                 = wrap(r_cluster_ids),
 	Named("clusterIdsSilhouetteValues") = 
 wrap(r_cluster_ids_silhouette_values)
 );


 My question: Is there any restriction on the size of the vectors which 
 I hand from C++ to R? Because during the MCMC simulation the vectors 
 might become really big.

 Thank you very much
 Mathias
#
On 12-08-09 05:08 AM, Mathias Bader wrote:
How big are we talking about? R uses 32-bit ints for indexes, which 
would be over 2 billion elements.

What is the crash message? Segmentation fault? Have you tried running 
with valgrind to check for memory allocation problems?

Davor
#
Hi Mathias,
On 9 August 2012 at 14:08, Mathias Bader wrote:
|  My question: Is there any restriction on the size of the vectors which 
|  I hand from C++ to R? Because during the MCMC simulation the vectors 
|  might become really big.

As Rcpp objects are 'proxy objects' for the underlying R objects, we are
bound by the exact same limits that R objects are bound.  So currently
vectors are limited to 2^31 - 1 elements, and as matrices are internally
stored as vectors with a dimension argument, this holds for matrices too. 

R 2.16.0 may bring a chance, but for now this is the hard limit.

Dirk
#
Hello Davor

 Thank you for your message.

 I came up with similar numbers: 4.29 billion elements for unsigned int. 
 My simulation goes easily to one million in some seconds, so I might 
 reach that number, but I think that is not the reason why my program 
 crashes.

 I think the problem has something to do with the List-creation of Rcpp, 
 someone had a similar error:
     
 http://lists.r-forge.r-project.org/pipermail/rcpp-devel/2012-March/003657.html
 Does anyone know whether there is a workaround for this problem?

 Mathias
On Thu, 09 Aug 2012 06:16:03 -0700, Davor Cubranic wrote:
#
If you need more than 2^31 - 1, have a look at bigmemory as a possibility
(particularly if you aren't shy about C++ and want to work directly with
the object).

Jay
On Thu, Aug 9, 2012 at 9:23 AM, Mathias Bader <mail at mathiasbader.de> wrote:

            

  
    
#
For more on this:
http://stat.ethz.ch/R-manual/R-devel/doc/html/NEWS.html and scroll down
to the LONG VECTORS section.
Also see: http://developer.r-project.org/216update.txt

Darren
#
Hello

 First, thank you very much for you very fast and helpful answers - I 
 appreciate it a lot!

 I found the error, the problem was that the method Rcpp::List::create() 
 does not always do what it should under Rcpp version 0.9.10. The reason 
 why this error is kind of hard to spot is that it doesn't occure 
 consistently but just about every tenth time. An update to Rcpp 0.9.13 
 fixes that issue.

 I tought I would have the newest version of the package since I 
 performed an "Update pakages" in my R Studio and it told me that all 
 packages are up to date. Unfortunately I have not been aware of the fact 
 that not all CRAN mirrors contain the same packages which I naively 
 assumed. Switching between mirrors randomly I couldn't find one that 
 provides Rcpp in version 0.9.13. I manually downloaded it from the Rcpp 
 website and installed it from the download. Now it works.

 Just writing it down here since someone might have the same problem and 
 might stumble upon this post. Sorry for spam.

 Hav a nice day and thanks again for your fast relplies.
 Sincerely,
 Mathias
On Thu, 9 Aug 2012 08:17:01 -0500, Dirk Eddelbuettel wrote: