[Rcpp-devel] add new components to list without specifying list size initially
Ok, thanks for your answer, but I wasn't clear enough. So here are more
details of what I want to do.
I have one list named "probes":
probes <- list(chr1=data.frame(name=c("p1","p2"),
start=c(81,95),
end=c(85,100),
stringsAsFactors=FALSE))
I also have one list named "genes":
genes <- list(chr1=data.frame(name=c("g1","g2"), start=c(11,111),
end=c(90,190)),
chr2=data.frame(name="g3", start=11, end=90))
I need to compare those two lists in order to obtain the following list
which contains, for each gene, the name of the probes included in it:
links <- list(chr1=list(g1=c("p1")))
Here is my R function (assuming that the probes are sorted based on their
start and end coordinates):
fun.l <- function(genes, probes){
links <- lapply(names(genes), function(chr.name){
if(! chr.name %in% names(probes))
return(NULL)
res <- list()
genes.c <- genes[[chr.name]]
probes.c <- probes[[chr.name]]
for(gene.name in genes.c$name){
gene <- genes.c[genes.c$name == gene.name,]
res[[gene.name]] <- vector()
for(probe.name in probes.c$name){
probe <- probes.c[probes.c$name == probe.name,]
if(probe$start >= gene$start && probe$end <= gene$end)
res[[gene.name]] <- append(res[[gene.name]], probe.name)
else if(probe$start > gene$end)
break
}
if(length(res[[gene.name]]) == 0)
res[[gene.name]] <- NULL
}
if(length(res) == 0)
res <- NA
return(res)
})
names(links) <- names(genes)
links <- Filter(function(links.c){!is.null(links.c)}, links)
return(links)
}
And here is the beginning of my attempt using Rcpp:
src <- '
using namespace Rcpp;
List genes = List(genes_in);
int genes_nb_chr = genes.length();
std::vector<std::string> genes_chr = genes.names();
List probes = List(probes_in);
int probes_nb_chr = probes.length();
std::vector< std::vector<std::string> > links;
// the main task is performed in this loop
for(int chrnum=0; chrnum<genes_nb_chr; ++chrnum){
DataFrame genes_c = DataFrame(genes[chrnum]);
// ... add code to map probes on genes, that is fill "links" ...
}
return wrap(links);
'
funC <- cxxfunction(signature(genes_in="list",
probes_in="list"),
body=src, plugin="Rcpp")
The problem starts quite early: when I compile this piece of code, I get
"error: call of overloaded ?DataFrame(Rcpp::internal::generic_proxy<19>)? is
ambiguous".
What should I do to go through the "probes" and "genes" lists given as
input? Maybe more generically, how can we go through a list of lists (of
lists...) with Rcpp?
2nd (small) question, I don't manage to use Rprintf when using inline, for
instance Rprintf("%d\n", i);, it complains about the quotes. What should I
do to print statement from within the for loop?
Thanks in advance. As my question is very long, I won't mind if you tell me
to find another way by myself. But maybe one of you can put me on the good
track.
On Thu, Aug 11, 2011 at 7:00 AM, Dirk Eddelbuettel <edd at debian.org> wrote:
On 11 August 2011 at 03:06, Walrus Foolhill wrote:
| Hello,
| I need to create a list and then fill it sequentially by adding
components in a
| for loop. Here is an example that works:
|
| library(inline)
| src <- '
| Rcpp::List mylist(2);
| for(int i=0; i<2; ++i)
| mylist[i] = i;
| mylist.names() = CharacterVector::create("a","b");
| return mylist;
| '
| fun <- cxxfunction(body=src, plugin="Rcpp")
| print(fun())
|
| But what I really want is to create an empty list and then fill it, that
is
| without specifying its number of components before hand... This is
because I
| don't know in advance at which step of the for loop I will need to create
a new
| component. Here is an example, that obviously doesn't work, but that
should
| show what I am looking for:
|
| Rcpp::List mylist;
| CharacterVector names = CharacterVector::create("a", "b");
If you know how long names is, you know how long mylist going to be ....
| for(int i=0; i<2; ++i){
| mylist.add(names[i], IntegerVector::create());
| mylist[names[i]].push_back(i);
I don't understand what that is trying to do.
| }
| return mylist;
|
| Do you know how I could achieve this? Thanks.
Rcpp::List is an alias for Rcpp::GenericVector, and derives from Vector.
You
can look at the public member functions -- there are things like
push_back()
push_front()
insert()
etc that behave like STL functions __but are inefficient as we (almost
always) need to copy the whole object__ so they are not recommended.
When I had to deal with 'unknown quantities of data' returning I was mostly
able to either turn it into a 'fixed or known columns, unknow rows' problem
(easy, just grow row-wise) or I 'cached' in a C++ data structure first
before
returning to R via Rcpp structures -- and then I knew the dimensions for
the
to-be-created object too.
Dirk
--
Two new Rcpp master classes for R and C++ integration scheduled for
New York (Sep 24) and San Francisco (Oct 8), more details are at
http://dirk.eddelbuettel.com/blog/2011/08/04#rcpp_classes_2011-09_and_2011-10
-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20110811/24f6a2f4/attachment.htm>