Skip to content
Prev 2708 / 10988 Next

[Rcpp-devel] add new components to list without specifying list size initially

Ok, thanks for your answer, but I wasn't clear enough. So here are more
details of what I want to do.

I have one list named "probes":
probes <- list(chr1=data.frame(name=c("p1","p2"),
                 start=c(81,95),
                 end=c(85,100),
                 stringsAsFactors=FALSE))

I also have one list named "genes":
genes <- list(chr1=data.frame(name=c("g1","g2"), start=c(11,111),
end=c(90,190)),
                chr2=data.frame(name="g3", start=11, end=90))

I need to compare those two lists in order to obtain the following list
which contains, for each gene, the name of the probes included in it:
links <- list(chr1=list(g1=c("p1")))

Here is my R function (assuming that the probes are sorted based on their
start and end coordinates):

fun.l <- function(genes, probes){
  links <- lapply(names(genes), function(chr.name){
    if(! chr.name %in% names(probes))
      return(NULL)

    res <- list()

    genes.c <- genes[[chr.name]]
    probes.c <- probes[[chr.name]]

    for(gene.name in genes.c$name){
      gene <- genes.c[genes.c$name == gene.name,]
      res[[gene.name]] <- vector()
      for(probe.name in probes.c$name){
        probe <- probes.c[probes.c$name == probe.name,]
        if(probe$start >= gene$start && probe$end <= gene$end)
          res[[gene.name]] <- append(res[[gene.name]], probe.name)
        else if(probe$start > gene$end)
          break
      }
      if(length(res[[gene.name]]) == 0)
        res[[gene.name]] <- NULL
    }

    if(length(res) == 0)
      res <- NA
    return(res)
  })
  names(links) <- names(genes)
  links <- Filter(function(links.c){!is.null(links.c)}, links)
  return(links)
}

And here is the beginning of my attempt using Rcpp:

src <- '
using namespace Rcpp;

List genes = List(genes_in);
int genes_nb_chr = genes.length();
std::vector<std::string> genes_chr = genes.names();

List probes = List(probes_in);
int probes_nb_chr = probes.length();

std::vector< std::vector<std::string> > links;

// the main task is performed in this loop
for(int chrnum=0; chrnum<genes_nb_chr; ++chrnum){
  DataFrame genes_c = DataFrame(genes[chrnum]);
  // ... add code to map probes on genes, that is fill "links" ...
}

return wrap(links);
'

funC <- cxxfunction(signature(genes_in="list",
                                probes_in="list"),
                      body=src, plugin="Rcpp")

The problem starts quite early: when I compile this piece of code, I get
"error: call of overloaded ?DataFrame(Rcpp::internal::generic_proxy<19>)? is
ambiguous".

What should I do to go through the "probes" and "genes" lists given as
input? Maybe more generically, how can we go through a list of lists (of
lists...) with Rcpp?

2nd (small) question, I don't manage to use Rprintf when using inline, for
instance Rprintf("%d\n", i);, it complains about the quotes. What should I
do to print statement from within the for loop?

Thanks in advance. As my question is very long, I won't mind if you tell me
to find another way by myself. But maybe one of you can put me on the good
track.
On Thu, Aug 11, 2011 at 7:00 AM, Dirk Eddelbuettel <edd at debian.org> wrote:

            
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20110811/24f6a2f4/attachment.htm>