Hello,
I'm trying to fetch a data frame through the C API,
and have no problem doing this when all columns
are numbers, but when there is a column of
strings I have a problem. On the C-side the
function looks like:
SEXP myfunc(SEXP df),
and it is called with a dataframe from
the R side with:
.Call("myfunc", somedataframe)
On the C side (actually C++ side) I use code
like this:
SEXP colnames = getAttrib(df, R_NamesSymbol)
cname = string(CHAR(STRING_ELT(colnames,i))
SEXP coldata = VECTOR_ELT(df,i) (data for i-th column)
if(isReal(colData))
x = REAL(colData)[j];
else if(isInteger(colData))
i = INTEGER(colData)[j];
else if(isString(colData))
s = CHAR(STRING_ELT(colData,j))
The problem is that the last test (isString) never passes,
even when I pass in a frame for which one or more cols
contain character strings. When the column contains
strings the isVector(colData) test passes, but no matter
how I try to fetch the string data I get a seg fault. That
is, forcing CHAR(STRING_ELT(colData,j)) will
fault, and so will VECTOR_ELT(colData,0), even
though colData passes the isVector test.
Any ideas?
Thanks,
ds
.Call and data frames
7 messages · Kasper Daniel Hansen, Brian Ripley, Hin-Tak Leung +1 more
While I do not know how to handle this on the C level, I know that
you do not have characters in data frames, everything is factors
instead. Internally they are coded as a number of integer levels,
with the levels having labels (which is the character you see). So eg
(in R):
> test <- data.frame(tmp = letters[1:10])
> test
tmp
1 a
2 b
3 c
4 d
5 e
6 f
7 g
8 h
9 i
10 j
> is.character(test$temp)
[1] FALSE
> as.numeric(test$tmp) # The internal code of the factor
[1] 1 2 3 4 5 6 7 8 9 10
> levels(test$tmp) # gives you the translation from internal code to
actual label
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"
You probably need to convert the factor to a character, which I do
not know how to do in C on top of my head, but which is probably not
that difficult. At least now you should have some idea on where to look.
/Kasper
On Jun 21, 2006, at 10:07 PM, Dominick Samperi wrote:
Hello,
I'm trying to fetch a data frame through the C API,
and have no problem doing this when all columns
are numbers, but when there is a column of
strings I have a problem. On the C-side the
function looks like:
SEXP myfunc(SEXP df),
and it is called with a dataframe from
the R side with:
.Call("myfunc", somedataframe)
On the C side (actually C++ side) I use code
like this:
SEXP colnames = getAttrib(df, R_NamesSymbol)
cname = string(CHAR(STRING_ELT(colnames,i))
SEXP coldata = VECTOR_ELT(df,i) (data for i-th column)
if(isReal(colData))
x = REAL(colData)[j];
else if(isInteger(colData))
i = INTEGER(colData)[j];
else if(isString(colData))
s = CHAR(STRING_ELT(colData,j))
The problem is that the last test (isString) never passes,
even when I pass in a frame for which one or more cols
contain character strings. When the column contains
strings the isVector(colData) test passes, but no matter
how I try to fetch the string data I get a seg fault. That
is, forcing CHAR(STRING_ELT(colData,j)) will
fault, and so will VECTOR_ELT(colData,0), even
though colData passes the isVector test.
Any ideas?
Thanks,
ds
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
On Wed, 21 Jun 2006, Kasper Daniel Hansen wrote:
While I do not know how to handle this on the C level, I know that you do not have characters in data frames, everything is factors instead.
Not so. The default in data.frame() is to convert character vector to factors, but there are many ways to have character vectors in data frames, and this will become more common in 2.4.0 and later. I suspect that this may well be Dominick's problem, though. isVector is just a test of being one of the several types of vectors: VECTOR_ELT is only appropriate for a VECSXP (a R-level list) and for this sort of thing it is much safer and cleaner to test TYPEOF.
Internally they are coded as a number of integer levels, with the levels having labels (which is the character you see). So eg (in R):
test <- data.frame(tmp = letters[1:10]) test
tmp 1 a 2 b 3 c 4 d 5 e 6 f 7 g 8 h 9 i 10 j
is.character(test$temp)
[1] FALSE
as.numeric(test$tmp) # The internal code of the factor
[1] 1 2 3 4 5 6 7 8 9 10
levels(test$tmp) # gives you the translation from internal code to
actual label [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" You probably need to convert the factor to a character, which I do not know how to do in C on top of my head, but which is probably not that difficult. At least now you should have some idea on where to look. /Kasper On Jun 21, 2006, at 10:07 PM, Dominick Samperi wrote:
Hello,
I'm trying to fetch a data frame through the C API,
and have no problem doing this when all columns
are numbers, but when there is a column of
strings I have a problem. On the C-side the
function looks like:
SEXP myfunc(SEXP df),
and it is called with a dataframe from
the R side with:
.Call("myfunc", somedataframe)
On the C side (actually C++ side) I use code
like this:
SEXP colnames = getAttrib(df, R_NamesSymbol)
cname = string(CHAR(STRING_ELT(colnames,i))
SEXP coldata = VECTOR_ELT(df,i) (data for i-th column)
if(isReal(colData))
x = REAL(colData)[j];
else if(isInteger(colData))
i = INTEGER(colData)[j];
else if(isString(colData))
s = CHAR(STRING_ELT(colData,j))
The problem is that the last test (isString) never passes,
even when I pass in a frame for which one or more cols
contain character strings. When the column contains
strings the isVector(colData) test passes, but no matter
how I try to fetch the string data I get a seg fault. That
is, forcing CHAR(STRING_ELT(colData,j)) will
fault, and so will VECTOR_ELT(colData,0), even
though colData passes the isVector test.
Any ideas?
Thanks,
ds
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
I think you want
else if (TYPEOF(colData) == STRSXP)
... instead.
I don't know if this will convert from factors to string's,
but somewhere it probably involves something like this:
PROTECT(colData = coerceVector(colData, STRSXP));
Dominick Samperi wrote:
Hello,
I'm trying to fetch a data frame through the C API,
and have no problem doing this when all columns
are numbers, but when there is a column of
strings I have a problem. On the C-side the
function looks like:
SEXP myfunc(SEXP df),
and it is called with a dataframe from
the R side with:
.Call("myfunc", somedataframe)
On the C side (actually C++ side) I use code
like this:
SEXP colnames = getAttrib(df, R_NamesSymbol)
cname = string(CHAR(STRING_ELT(colnames,i))
SEXP coldata = VECTOR_ELT(df,i) (data for i-th column)
if(isReal(colData))
x = REAL(colData)[j];
else if(isInteger(colData))
i = INTEGER(colData)[j];
else if(isString(colData))
s = CHAR(STRING_ELT(colData,j))
The problem is that the last test (isString) never passes,
even when I pass in a frame for which one or more cols
contain character strings. When the column contains
strings the isVector(colData) test passes, but no matter
how I try to fetch the string data I get a seg fault. That
is, forcing CHAR(STRING_ELT(colData,j)) will
fault, and so will VECTOR_ELT(colData,0), even
though colData passes the isVector test.
Any ideas?
Thanks,
ds
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Hin-Tak Leung wrote:
I think you want
else if (TYPEOF(colData) == STRSXP)
... instead.
I don't know if this will convert from factors to string's,
but somewhere it probably involves something like this:
PROTECT(colData = coerceVector(colData, STRSXP));
FWIW, a factor consists of all these things internally:
(1) TYPEOF(colData) is INTSXP
(2) attr(colData, "levels") exists and is a STRSXP type
(string representation for the levels).
(3) class(colData) = "factor"
if coerVector() doesn't do it, you can test for (3) in your C code,
and use the integer vector in (1) to index into the string vector in (2)
to regenerate the string vector manually.
Not all of this is correct, just an idea:
class = getAttrib(colData, R_ClassSymbol);
...
if (.../* do some test on class */...)
{
levels = getAttrib(colData, "levels");
PROTECT(back_to_str = allocVector(STRSXP, LENGTH(colData));
for(int i = 0; i < LENGTH=(colData) ; i++)
{
SET_STRING_ELT(back_to_str, i,
mkChar(STRING_ELT(levels, INTEGER(colData)[i])));
}
}
Dominick Samperi wrote:
Hello,
I'm trying to fetch a data frame through the C API,
and have no problem doing this when all columns
are numbers, but when there is a column of
strings I have a problem. On the C-side the
function looks like:
SEXP myfunc(SEXP df),
and it is called with a dataframe from
the R side with:
.Call("myfunc", somedataframe)
On the C side (actually C++ side) I use code
like this:
SEXP colnames = getAttrib(df, R_NamesSymbol)
cname = string(CHAR(STRING_ELT(colnames,i))
SEXP coldata = VECTOR_ELT(df,i) (data for i-th column)
if(isReal(colData))
x = REAL(colData)[j];
else if(isInteger(colData))
i = INTEGER(colData)[j];
else if(isString(colData))
s = CHAR(STRING_ELT(colData,j))
The problem is that the last test (isString) never passes,
even when I pass in a frame for which one or more cols
contain character strings. When the column contains
strings the isVector(colData) test passes, but no matter
how I try to fetch the string data I get a seg fault. That
is, forcing CHAR(STRING_ELT(colData,j)) will
fault, and so will VECTOR_ELT(colData,0), even
though colData passes the isVector test.
Any ideas?
Thanks,
ds
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Thanks for the tips, This seems to work: First test for isReal and isInteger. If they fail, assume character/factor, and PROECT(colData = coerceVector(colData,INTSXP); // Not STRSXP SEXP names = getAttrib(colData, R_LevelsSymbol); // names now contains the string names I was looking for. ds
Hin-Tak Leung wrote:
Hin-Tak Leung wrote:
I think you want
else if (TYPEOF(colData) == STRSXP)
... instead.
I don't know if this will convert from factors to string's,
but somewhere it probably involves something like this:
PROTECT(colData = coerceVector(colData, STRSXP));
FWIW, a factor consists of all these things internally:
(1) TYPEOF(colData) is INTSXP
(2) attr(colData, "levels") exists and is a STRSXP type
(string representation for the levels).
(3) class(colData) = "factor"
if coerVector() doesn't do it, you can test for (3) in your C code,
and use the integer vector in (1) to index into the string vector in (2)
to regenerate the string vector manually.
Not all of this is correct, just an idea:
class = getAttrib(colData, R_ClassSymbol);
...
if (.../* do some test on class */...)
{
levels = getAttrib(colData, "levels");
PROTECT(back_to_str = allocVector(STRSXP, LENGTH(colData));
for(int i = 0; i < LENGTH=(colData) ; i++)
{
SET_STRING_ELT(back_to_str, i,
mkChar(STRING_ELT(levels, INTEGER(colData)[i])));
}
}
Dominick Samperi wrote:
Hello,
I'm trying to fetch a data frame through the C API,
and have no problem doing this when all columns
are numbers, but when there is a column of
strings I have a problem. On the C-side the
function looks like:
SEXP myfunc(SEXP df),
and it is called with a dataframe from
the R side with:
.Call("myfunc", somedataframe)
On the C side (actually C++ side) I use code
like this:
SEXP colnames = getAttrib(df, R_NamesSymbol)
cname = string(CHAR(STRING_ELT(colnames,i))
SEXP coldata = VECTOR_ELT(df,i) (data for i-th column)
if(isReal(colData))
x = REAL(colData)[j];
else if(isInteger(colData))
i = INTEGER(colData)[j];
else if(isString(colData))
s = CHAR(STRING_ELT(colData,j))
The problem is that the last test (isString) never passes,
even when I pass in a frame for which one or more cols
contain character strings. When the column contains
strings the isVector(colData) test passes, but no matter
how I try to fetch the string data I get a seg fault. That
is, forcing CHAR(STRING_ELT(colData,j)) will
fault, and so will VECTOR_ELT(colData,0), even
though colData passes the isVector test.
Any ideas?
Thanks,
ds
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
On Jun 22, 2006, at 9:04 AM, Dominick Samperi wrote:
Thanks for the tips, This seems to work: First test for isReal and isInteger. If they fail, assume character/factor, and PROECT(colData = coerceVector(colData,INTSXP); // Not STRSXP SEXP names = getAttrib(colData, R_LevelsSymbol); // names now contains the string names I was looking for.
But of course be aware that there is a map from colData to names
(here I am guessing the C implementation mirrors what is happening in
R), where you have
> test = data.frame(tmp =c ("a","a","b"))
> levels(test$tmp)
[1] "a" "b"
That is, you only have one occurrence of each level. So you need to
take care of this remapping. Unless all you need is the different
possible values.
And thanks to Brian Ripley who taught me something new about data
frames: that it is indeed possible to have characters (although it is
quite explicit in the man page on data.frame)
/Kasper
ds Hin-Tak Leung wrote:
Hin-Tak Leung wrote:
I think you want
else if (TYPEOF(colData) == STRSXP)
... instead.
I don't know if this will convert from factors to string's,
but somewhere it probably involves something like this:
PROTECT(colData = coerceVector(colData, STRSXP));
FWIW, a factor consists of all these things internally:
(1) TYPEOF(colData) is INTSXP
(2) attr(colData, "levels") exists and is a STRSXP type
(string representation for the levels).
(3) class(colData) = "factor"
if coerVector() doesn't do it, you can test for (3) in your C code,
and use the integer vector in (1) to index into the string vector
in (2)
to regenerate the string vector manually.
Not all of this is correct, just an idea:
class = getAttrib(colData, R_ClassSymbol);
...
if (.../* do some test on class */...)
{
levels = getAttrib(colData, "levels");
PROTECT(back_to_str = allocVector(STRSXP, LENGTH(colData));
for(int i = 0; i < LENGTH=(colData) ; i++)
{
SET_STRING_ELT(back_to_str, i,
mkChar(STRING_ELT(levels, INTEGER(colData)[i])));
}
}
Dominick Samperi wrote:
Hello,
I'm trying to fetch a data frame through the C API,
and have no problem doing this when all columns
are numbers, but when there is a column of
strings I have a problem. On the C-side the
function looks like:
SEXP myfunc(SEXP df),
and it is called with a dataframe from
the R side with:
.Call("myfunc", somedataframe)
On the C side (actually C++ side) I use code
like this:
SEXP colnames = getAttrib(df, R_NamesSymbol)
cname = string(CHAR(STRING_ELT(colnames,i))
SEXP coldata = VECTOR_ELT(df,i) (data for i-th column)
if(isReal(colData))
x = REAL(colData)[j];
else if(isInteger(colData))
i = INTEGER(colData)[j];
else if(isString(colData))
s = CHAR(STRING_ELT(colData,j))
The problem is that the last test (isString) never passes,
even when I pass in a frame for which one or more cols
contain character strings. When the column contains
strings the isVector(colData) test passes, but no matter
how I try to fetch the string data I get a seg fault. That
is, forcing CHAR(STRING_ELT(colData,j)) will
fault, and so will VECTOR_ELT(colData,0), even
though colData passes the isVector test.
Any ideas?
Thanks,
ds
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel