Skip to content

append/concatenate an element to a list in C-language

6 messages · Oleg Sklyar, Tony Plate, Brian Ripley +2 more

#
dear people,

i need to code a function in C working in R and receives two R SEXP
objects as parameters, where one is a list and another is a vector of
integers:

void f(SEXP list, SEXP vector) {

  ...

  return list;
}


and it should return the given list with the integer vector concatenated
at the end (as the last element of the list). the list can be really big
so i would not like to create a new list from scratch and copy all the
elements, including the additional one. i'm also interested in knowing
how should i properly handle protections of the SEXP objects when doing
this.

i've been looking at the R source code for everything that has to do
with CAR, CDR, CONS, and even found functions with promising names like
listAppend or GrowList but i have not been able to figure this out nor i
haven't been able to find any reference on how to do this by googling
all around, so any help will be very much appreciated.


thanks a lot!!!

robert.
#
Hi.

I believe it is virtually impossible, maybe even not virtually: if you
manage to do so somehow, please report it as a bug because such things
must be impossible as they break the integrity of R and data.

Forget about changing size in place: it is C and 'realloc' would be
about the only way to do it, which would not be smart even if you can
find a pointer to what you want to change. However, consider you want to
copy (after all it is a list, which is an array of pointers and you
might think you only need to copy pointers, which is cheap even for a
huge list). Consider a simple example:
you allocVector(VECSXP,n+1); then either memcpy n elements (if you
manage to get a pointer say with DATAPTR, and R-API is smart not to
allow you to get it that simple) or you simply SET_VECTOR_ELT(new, i,
VECTOR_ELT(old, i)) and then add the last one. You know what would be
wrong about this idea? You cannot be sure that pointers to elements of
the old list are not modified/deleted elsewhere, thus affecting your new
list! Even if you overwrite the list in place, providing such software
to users is error prone as users do not know about your ideas of keeping
everything tidy.

The only right way to replicate (if you want to do it in C) is copy and
copy duplicating each element of the list, i.e.:

SEXP f(SEXP old, SEXP v) {
SEXP new; int i,nprotect=0;
PROTECT(new=allocVector(VECSXP,n+1)); nprotect++;
for(i=0;i<n;i++)
  SET_VECTOR_ELT(new, i, Rf_duplicate(VECTOR_ELT(old, i)));
SET_VECTOR_ELT(new, n, v);
/* set names here if you need */
UNPROTECT(1);
return new;

What you could do, if you really need such functionality. Provide R with
external pointer to a data structure that you maintain in your C (I
would suggest C++) code and use C++ STL for dynamic updates (either
lists or vectors, whatever suits better). But the overhead of writing
the interface will take more than copying a list of any size!

Finally: void f(SEXP list, SEXP vector); is a malformed definition which
will lead to Segmentation Fault. Please read "Writing R extensions"
first.

Best,
Oleg

-  
Dr Oleg Sklyar * EMBL-EBI, Cambridge CB10 1SD, UK * +441223494466
On Thu, 2007-10-18 at 19:41 +0200, Robert Castelo wrote:
#
This sounds like it could be dangerous, but if you're sure it's 
necessary and you know what you are doing, you could investigate whether 
the "pairlist" internal structure might enable you to do this (AFAIK, a 
"pairlist" is a traditional linked-list data structure).  In general, 
pairlists seem to be used very little, and I've never seen their use 
encouraged, so I would be very cautious.  You can read more about 
"pairlists" under ?pairlist, in the R-internals manual, and in the 
source starting at src/include/Rinternals.h (look for "LISTSXP").

-- Tony Plate
Robert Castelo wrote:
#
On Thu, 18 Oct 2007, Robert Castelo wrote:

            
You are returning an result in a function that returns void: the compiler 
will complain at you.
If you study the R Internals manual you will see that there is no space on 
the original VECSXP for another element, so you do *need* to create a new 
VECSXP.  Note that creating a new list does not mean necessarily copying 
the elements, but you do need to think about the NAMED bits.

If you are doing this repeatedly you could think about exploiting the 
TRUELENGTH field to create a list with spare space that you could exploit 
in future calls.

It is often possible to avoid copying, but considerable care is needed to 
ensure that you do not end up with an object that does not effectively 
share elements with another user-visible one.
Those are for pairlists, not (vector) lists.
#
hi,

thanks to all the replies, i think the discussion can be followed
throughout this message.
apologies, indeed it should have been

SEXP f(SEXP list, SEXP element);
in this list i need not names associated to each element, i guess that
should release me from thinking about the NAMED bits you refer to (?).

what you say about "creating a new list does not mean necessarily
copying the elements" interests me. i'd like to avoid going through the
old list and copy the elements. i'd like to add one at the end as fast
as possible.

if, in a situation like:

l <- list(1,2)
l <- c(l,3)

R, internally, concatenates 3 to the list l avoiding to copy all 3
elements again, then i'd just like to know how R does that to do it
myself as well.
i cannot anticipate what will be the maximum length of my list. i'm
working with clique lists in undirected graphs and given a list of
cliques and an edge to be removed this function will modify the clique
list according to this edge removal. since i'd like R to handle graphs
as big as possible with the available hardware, this list can be
arbitrarily large and that's why i want to avoid going through the
elements of the list.
so, would it be possible to create a new VECSXP vector with one extra
element, then somehow "memcpy" the old vector of SEXP pointers into this
new one, and add a brand new element with SET_VECTOR_ELT?
ok.


thanks!
robert.
#
Robert Castelo wrote:
No! This is very important: The same R object may be known under
multiple names and this is what NAMED records (well, it tries).  When
this is the case, it is not safe to modify it destructively.  E.g.

x <- rnorm(100)
y <- x     # at this point y and x refer to the SAME object
y[1] <- 0  # NOW we have to duplicate y or the subassignment will change x

Notice that this appears in disguised forms:

g <- function(y) {
   y[1] <- 0  # MAY require duplication
   mean(y)
}
x <- rnorm(100)
g(x) # duplication needed
g(rnorm(100)) # duplication not needed

Don't try anything involving destructive modification of objects before
you have understood this!