Skip to content

[Rcpp-devel] List of Lists to List of Vectors

18 messages · William Dunlap, Matt D., Tim Keitt +3 more

#
If I have data like

list(list('a', 1, T), list('b', 2, F), ...)

(list of lists with same types)

and I want to convert to

list(c('a', 'b'), c(1, 2), c(T, F))

(list of vectors)

where the types are not known in advance (its trivial if you know the
sequence of types in advance).

Anyone know how to do that in Rcpp?

THK
#
Here is one way to do it in R.  Much of the code is checking that the
format of 'x'
is what you describe and you may omit it if you know that does not need
checking.
The call to 'unlist' in the call to lapply is responsible for setting the
classes of the outputs.

Since translation from R to Rcpp is "seamless" I will leave that to you.

First an example:
  > X <- list(list(factor("a",levels=letters[1:5]), 1+1i, FALSE),
                list(factor("b",levels=letters[1:5]), 2+2i, TRUE))
  > f(X, check=TRUE)
  [[1]]
  [1] a b
  Levels: a b c d e

  [[2]]
  [1] 1+1i 2+2i

  [[3]]
  [1] FALSE  TRUE

Then the function:

f <- function(x, check = FALSE) {
   if (check) {
      stopifnot(
         is.list(x),
         all(vapply(x, is.list, FUN.VALUE=FALSE)),
         length(unique(vapply(x, length, FUN.VALUE=0)))==1)
   }
   listLength <- length(x)
   if (listLength == 0) {
      return(list())
   }
   elementLength <- length(x[[1]])
   xMatrix <- array(unlist(x, recursive=FALSE), dim=c(elementLength,
listLength))
   if (check) {
      # TODO: this rejects mixtures of numerics and integers,
      # but accepts mixtures of factors with different levels.
      elementSignature <- function(e) paste(collapse=",", unlist(lapply(e,
class)))
      stopifnot(length(unique(vapply(x, elementSignature,
FUN.VALUE="")))==1)
   }
   lapply(seq_len(nrow(xMatrix)), function(i)unlist(xMatrix[i,]))
}


Bill Dunlap
TIBCO Software
wdunlap tibco.com
On Fri, May 1, 2015 at 7:28 PM, Tim Keitt <tkeitt at utexas.edu> wrote:

            
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20150502/ba24d088/attachment.html>
#
On 2 May 2015 at 10:49, William Dunlap wrote:
| Since translation from R to Rcpp is "seamless" I will leave that to you.

One problem is with the strongly typed nature of C++.  Rearranging dynamicly
growing objects can be done.  I think I used Boost's variant type a few
years.  I am sure there are other possibilities.  We should collect a few and
compare.   But in C++ please :)

Dirk
#
On 5/2/2015 20:18, Dirk Eddelbuettel wrote:
As for the other possibilities / alternatives to Boost.Variant -- the 
good news is that there a plenty! :-)
The, um, related news is that the abundance means a maze of twisty 
little passages, not quite all alike ;-)
There are speed / ease of use / flexibility / memory usage (and more -- 
e.g., involving floating-point accuracy) trade-offs.
It doesn't seem like there's any "generally best" winner -- what's best 
is really use-case-specific.

"Dynamic C++" article series by Alex Fabijanic and Richard Saunders 
provides a good overview with comparisons:
Part 1: http://accu.org/index.php/journals/1855
Part 2: http://accu.org/index.php/journals/1841
// Focus: "In this article series, both externals and internals of 
boost::[variant, any, type_erasure], folly::dynamic, Poco::Dynamic::Var, 
Qt QVariant and adobe::any_regular are explored and compared. Design, 
capabilities, ease of use as well as pros and cons of each solution will 
be examined. Performance benchmark comparisons results will be provided 
as well. "

There's also a related presentation by Alex:
Slides: 
https://github.com/boostcon/cppnow_presentations_2013/blob/master/thu/DynamicCpp.pdf?raw=true
// Online slides: http://www.slideshare.net/aleks-f/dynamic-caccu2013
Video: 
https://www.youtube.com/watch?v=QySTK4cSq7o&list=PLAq3rthfTjp4O2YZ7K2Y6AzW_zx5DtUth
Code: https://github.com/aleks-f/articles/tree/master/ACCU-2013

ACCU 2014 presentation, "Dynamic C++ performance", includes more on 
speed comparisons (and also floating-point conversions, including their 
accuracy).
Code & slides: https://github.com/aleks-f/articles/tree/master/ACCU2014

Perhaps "Dynamic, Recursive, Heterogeneous Types in Statically-Typed 
Languages" by Richard Saunders, Clinton Jeffery may also be of interest, 
although it's only somewhat related to the problem (issue at hand: allow 
Python and C++ to share Python dictionaries across language boundaries).
Article: http://cppnow.org/files/2013/03/saunders-jeffery.pdf
Slides: 
https://github.com/boostcon/cppnow_presentations_2013/blob/master/fri/DynRec.pdf?raw=true
Video: https://www.youtube.com/watch?v=W3TsQtnMtqg

Best,

Matt
#
Here's a really bare-bones version. Not very pretty or complete, but it
does do the job. Could the 'unlist' part be converted to Rcpp?

THK

library(Rcpp)

code = '
SEXP test(List a)
{
  int l = Rf_length(a[0]);
  typedef std::vector<SEXP> svec;
  std::vector<svec> x(l);
  for (int i = 0; i != l; ++i)
    for (int j = 0; j != a.size(); ++j)
    {
      List b = a[j];
      x[i].push_back(b[i]);
    }
  return wrap(x);
}'

f = cppFunction(code = code)

res = lapply(f(list(list(1, 'a'), list(2, 'b'))), unlist)
On Sat, May 2, 2015 at 1:18 PM, Dirk Eddelbuettel <edd at debian.org> wrote:

            

  
    
#
A slightly improved version:

library(Rcpp)

code = '
SEXP test(List a)
{
  auto l = Rf_length(a[0]);
  using svec = std::vector<SEXP>;
  std::vector<svec> x(l);
  for (List b : a)
  {
    if (b.size() != l)
      stop("Ragged input");
    for (int i = 0; i != l; ++i)
      x[i].push_back(b[i]);
  }
  return wrap(x);
}'

f = cppFunction(code = code, plugins = "cpp11")

res = lapply(f(list(list(T, 1, 'a'),
                    list(F, 2, 'b'))),
             unlist)
On Sun, May 3, 2015 at 11:57 AM, Tim Keitt <tkeitt at utexas.edu> wrote:

            

  
    
#
Check here
<https://github.com/RevolutionAnalytics/rmr2/blob/master/pkg/src/t-list.cpp>
for something similar to Tim's solution that preallocates all vectors to
avoid the costly push_back. Still needs the unlists in R, so it's expensive
in that dimension, the number of lists in the output.


Antonio
On Sun, May 3, 2015 at 4:22 PM, Tim Keitt <tkeitt at utexas.edu> wrote:

            
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20150503/66952b76/attachment-0001.html>
#
On 3 May 2015 at 21:29, Antonio Piccolboni wrote:
| Check here? for something similar to Tim's solution that preallocates all

Please resend with an explicit URL, the embedded link got lost.

Remember, friends don't let friends send html email.

Dirk

| vectors to avoid the costly push_back. Still needs the unlists in R, so it's
| expensive in that dimension, the number of lists in the output.
#
Embedded link was <
https://github.com/RevolutionAnalytics/rmr2/blob/master/pkg/src/t-list.cpp>
On Mon, May 4, 2015 at 7:39 AM, Dirk Eddelbuettel <edd at debian.org> wrote:

            

  
    
#
On Sun, May 3, 2015 at 11:29 PM, Antonio Piccolboni <antonio at piccolboni.info
I may have a way around the unlist part; still needs testing.

push_back is amortized constant so only a little costly.

THK

  
    
#
On 4 May 2015 at 14:23, Tim Keitt wrote:
| 
| 
| On Sun, May 3, 2015 at 11:29 PM, Antonio Piccolboni <antonio at piccolboni.info>
| wrote:
| 
|     Check here? for something similar to Tim's solution that preallocates all
|     vectors to avoid the costly push_back. Still needs the unlists in R, so
|     it's expensive in that dimension, the number of lists in the output.
| 
| 
| I may have a way around the unlist part; still needs testing.

Keep us posted.
 
| push_back is amortized constant so only a little costly.

But going from std::list to SEXP is one full copy.  Not so bad in the grand
scheme of things.

Dirk
#
On Mon, May 4, 2015 at 2:37 PM, Dirk Eddelbuettel <edd at debian.org> wrote:

            
Copying is where all the costs accumulate for sure. I find it mostly
guesswork whether something copies or not with Rcpp data structures.

THK

  
    
#
You have a point about push_back being amortized constant. In my code I do
everything with Lists, which may explain why I needed to preallocate them.
Unfortunately I forgot the motivation for using Lists instead of
std:vector, which I must have considered at some point. In fact if you look
at this commit (
https://github.com/RevolutionAnalytics/rmr2/commit/f8d4b0766ea71fe056a4b3eb0a4ef5ed0b7c6eec
for the html-challenged)  I had the push_back in a previous version, but it
didn't work. I was using a vector<vector<RObject>> to hold the data, and if
I can remember that hit some performance wall. The commit message doesn't
really refresh my memory, but it says something about protecting SEXPs.
That may be why I didn't want to use a vector<SEXP>. I may have had
stability issues and, right or wrong, thought that change was helpful.
On Mon, May 4, 2015 at 12:23 PM, Tim Keitt <tkeitt at utexas.edu> wrote:

            
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20150504/ffbdc110/attachment.html>
#
On 4 May 2015 at 14:42, Tim Keitt wrote:
| Copying is where all the costs accumulate for sure. I find it mostly guesswork
| whether something copies or not with Rcpp data structures.

By now we have said about 1e4 times that Rcpp data structures are ("no
copies") whereas other (ie Boost, STL, ...) data structures generally need
copies.

Dirk
#
On Mon, May 4, 2015 at 2:37 PM, Dirk Eddelbuettel <edd at debian.org> wrote:

            
After all that, I ended up with

  lapply(lapply(1:length(x[[1]]),
                function(i) lapply(x, function(a) a[[i]])),
         unlist)

Probably fast enough. Its just too painful to deal with the possible types
in C/C++. (Browsing the R source shows hundreds of lines of C code to
support 'unlist').

THK
#
On Tue, May 5, 2015 at 9:11 AM, Dirk Eddelbuettel <edd at debian.org> wrote:

            
I have no idea what that means. I just know its very easy for me to figure
out when things are copied using STL. Not so much with Rcpp.

THK

  
    
#
On 5 May 2015 at 15:18, Tim Keitt wrote:
| On Tue, May 5, 2015 at 9:11 AM, Dirk Eddelbuettel <edd at debian.org> wrote:
| On 4 May 2015 at 14:42, Tim Keitt wrote:
|     | Copying is where all the costs accumulate for sure. I find it mostly
|     guesswork
|     | whether something copies or not with Rcpp data structures.
| 
|     By now we have said about 1e4 times that Rcpp data structures are ("no
|     copies") whereas other (ie Boost, STL, ...) data structures generally need
|     copies.
| 
| I have no idea what that means. I just know its very easy for me to figure out
| when things are copied using STL. Not so much with Rcpp.

In the editing process I dropped a 'proxies' -- sorry. Let's try again:

   By now we have said about 1e4 times that Rcpp data structures are proxy
   objects ("no copies") whereas other (ie Boost, STL, ...) data structures
   generally need copies.

Dirk
#
On Tue, May 5, 2015 at 4:36 PM, Dirk Eddelbuettel <edd at debian.org> wrote:

            
I understand that. Its during insertion, extraction, coercion and function
boundaries (wrap/as) where it gets a bit hard to know when you've done
shallow or deep copy.

THK