Skip to content

.Call and data frames

7 messages · Kasper Daniel Hansen, Brian Ripley, Hin-Tak Leung +1 more

#
Hello,

I'm trying to fetch a data frame through the C API,
and have no problem doing this when all columns
are numbers, but when there is a column of
strings I have a problem. On the C-side the
function looks like:
SEXP myfunc(SEXP df),
and it is called with a dataframe from
the R side with:

.Call("myfunc", somedataframe)

On the C side (actually C++ side) I use code
like this:
SEXP colnames = getAttrib(df, R_NamesSymbol)
cname  = string(CHAR(STRING_ELT(colnames,i))
SEXP coldata = VECTOR_ELT(df,i) (data for i-th column)
if(isReal(colData))
    x = REAL(colData)[j];
else if(isInteger(colData))
    i = INTEGER(colData)[j];
else if(isString(colData))
    s = CHAR(STRING_ELT(colData,j))

The problem is that the last test (isString) never passes,
even when I pass in a frame for which one or more cols
contain character strings. When the column contains
strings the isVector(colData) test passes, but no matter
how I try to fetch the string data I get a seg fault. That
is, forcing CHAR(STRING_ELT(colData,j)) will
fault, and so will VECTOR_ELT(colData,0), even
though colData passes the isVector test.

Any ideas?
Thanks,
ds
#
While I do not know how to handle this on the C level, I know that  
you do not have characters in data frames, everything is factors  
instead. Internally they are coded as a number of integer levels,  
with the levels having labels (which is the character you see). So eg  
(in R):

 > test <- data.frame(tmp = letters[1:10])
 > test
    tmp
1    a
2    b
3    c
4    d
5    e
6    f
7    g
8    h
9    i
10   j
 > is.character(test$temp)
[1] FALSE
 > as.numeric(test$tmp) # The internal code of the factor
[1]  1  2  3  4  5  6  7  8  9 10
 > levels(test$tmp) # gives you the translation from internal code to  
actual label
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"

You probably need to convert the factor to a character, which I do  
not know how to do in C on top of my head, but which is probably not  
that difficult. At least now you should have some idea on where to look.

/Kasper
On Jun 21, 2006, at 10:07 PM, Dominick Samperi wrote:

            
#
On Wed, 21 Jun 2006, Kasper Daniel Hansen wrote:

            
Not so.  The default in data.frame() is to convert character vector to 
factors, but there are many ways to have character vectors in data frames, 
and this will become more common in 2.4.0 and later.

I suspect that this may well be Dominick's problem, though.

isVector is just a test of being one of the several types of vectors: 
VECTOR_ELT is only appropriate for a VECSXP (a R-level list) and for this 
sort of thing it is much safer and cleaner to test TYPEOF.

  
    
#
I think you want
        else if (TYPEOF(colData) == STRSXP)
... instead.

I don't know if this will convert from factors to string's,
but somewhere it probably involves something like this:
     PROTECT(colData = coerceVector(colData, STRSXP));
Dominick Samperi wrote:
#
Hin-Tak Leung wrote:
FWIW, a factor consists of all these things internally:
(1) TYPEOF(colData)  is INTSXP
(2) attr(colData, "levels") exists and is a STRSXP type
(string representation for the levels).
(3) class(colData) = "factor"

if coerVector() doesn't do it, you can test for (3) in your C code,
and use the integer vector in (1) to index into the string vector in (2)
to regenerate the string vector manually.

Not all of this is correct, just an idea:

class = getAttrib(colData, R_ClassSymbol);
...
if (.../* do some test on class */...)
{
     levels = getAttrib(colData, "levels");
     PROTECT(back_to_str = allocVector(STRSXP, LENGTH(colData));
     for(int i = 0; i < LENGTH=(colData) ; i++)
     {
       SET_STRING_ELT(back_to_str, i,
           mkChar(STRING_ELT(levels, INTEGER(colData)[i])));
     }
}
#
Thanks for the tips,

This seems to work:
First test for isReal and isInteger.
If they fail, assume character/factor, and

PROECT(colData = coerceVector(colData,INTSXP); // Not STRSXP
SEXP names = getAttrib(colData, R_LevelsSymbol);
// names now contains the string names I was looking for.

ds
Hin-Tak Leung wrote:
#
On Jun 22, 2006, at 9:04 AM, Dominick Samperi wrote:

            
But of course be aware that there is a map from colData to names  
(here I am guessing the C implementation mirrors what is happening in  
R), where you  have	

 > test = data.frame(tmp =c ("a","a","b"))
 > levels(test$tmp)
[1] "a" "b"

That is, you only have one occurrence of each level. So you need to  
take care of this remapping. Unless all you need is the different  
possible values.

And thanks to Brian Ripley who taught me something new about data  
frames: that it is indeed possible to have characters (although it is  
quite explicit in the man page on data.frame)

/Kasper