Hello list, looking at Rcpp::DataFrame in the gallery<http://gallery.rcpp.org/tags/dataframe/>I realized that I didn't know how to modify a DataFrame by reference. Googling a bit I found this post on SO<http://stackoverflow.com/questions/13773529/passing-a-data-table-to-c-functions-using-rcpp-and-or-rcpparmadillo>and this post on the archive<http://www.mail-archive.com/rcpp-devel at lists.r-forge.r-project.org/msg04919.html> . There is nothing obvious so I suspect I miss something big like "It is already the case because" or "it does not make sense because". I tried the following which compiled but the data.frame object passed to updateDFByRef in R stayed untouched #include <Rcpp.h> using namespace Rcpp; // [[Rcpp::export]] void updateDFByRef(DataFrame& df) { int N = df.nrows(); NumericVector newCol(N,1.); df["newCol"] = newCol; return; } Could somebody explain me what I am missing or kindly point me to a document where I can find the explanation ? Cheers -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20130331/1987f49c/attachment-0001.html>
[Rcpp-devel] DataFrame and passing by reference
3 messages · stat quant, Kevin Ushey
I think the problem here is that the assignment df["newCol"] = newCol
copies the dataframe. Note that something like this would work as expected:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
void updateDFByRef(DataFrame& df) {
int N = df.nrows();
NumericVector newCol(N,1.);
df[0] = newCol; // replace the 1st vector with the numeric vector of
1s, by ref
return;
}
So, the reference to the original df is getting passed, the problem is
figuring out how to assign a new vector to df without forcing a copy.
I'm not sure if there's a ready-made solution, but I imagine the easiest
way to do it would be:
1) Declare a new list of df.size()+1,
2) Copy the pointers to the new list (not sure the best way to do this in
Rcpp),
3) Assign the vector you want to the new, last column,
4) Return that new list.
This should work since internally, lists (VECSXP)s are just vectors of
SEXPs (pointers) to other R vectors (REALSXPs, INTSXPs, and so on...)
(Please correct me if I'm wrong on the above.)
-Kevin
On Sun, Mar 31, 2013 at 6:44 AM, stat quant <statquant at outlook.com> wrote:
Hello list, looking at Rcpp::DataFrame in the gallery<http://gallery.rcpp.org/tags/dataframe/>I realized that I didn't know how to modify a DataFrame by reference. Googling a bit I found this post on SO<http://stackoverflow.com/questions/13773529/passing-a-data-table-to-c-functions-using-rcpp-and-or-rcpparmadillo>and this post on the archive<http://www.mail-archive.com/rcpp-devel at lists.r-forge.r-project.org/msg04919.html> . There is nothing obvious so I suspect I miss something big like "It is already the case because" or "it does not make sense because". I tried the following which compiled but the data.frame object passed to updateDFByRef in R stayed untouched #include <Rcpp.h> using namespace Rcpp; // [[Rcpp::export]] void updateDFByRef(DataFrame& df) { int N = df.nrows(); NumericVector newCol(N,1.); df["newCol"] = newCol; return; } Could somebody explain me what I am missing or kindly point me to a document where I can find the explanation ? Cheers
_______________________________________________ Rcpp-devel mailing list Rcpp-devel at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20130331/65287659/attachment.html>
Thanks Kevin,
This question came because when you do this
// [[Rcpp::export]]
DataFrame updateDFByValue(DataFrame df) {
int N = df.nrows();
NumericVector newCol(N,1.);
df["newCol"] = newCol;
return(df);
}
The DataFrame is returned to R as a list, and building back another
data.frame might
1. cost time
2. appear like a waste if what was intended was to update the data.frame
<http://cran.r-project.org/web/packages/data.table/index.html>
data.tableallows by-reference updates in R but there is no C api that
I know of, but
it is an enhanced data.frame so Rcpp deals with it as a data.frame, I
thought it was too bad to be able to update by reference in R and not in
C++ so I asked this genuine question.
Your way makes sense to me, I'll try to dig deeper.
Thanks
PS: Dirk answered me
here<http://stackoverflow.com/questions/15731106/passing-by-reference-a-data-frame-and-updating-it-with-rcpp>
2013/3/31 Kevin Ushey <kevinushey at gmail.com>
I think the problem here is that the assignment df["newCol"] = newCol
copies the dataframe. Note that something like this would work as expected:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
void updateDFByRef(DataFrame& df) {
int N = df.nrows();
NumericVector newCol(N,1.);
df[0] = newCol; // replace the 1st vector with the numeric vector of
1s, by ref
return;
}
So, the reference to the original df is getting passed, the problem is
figuring out how to assign a new vector to df without forcing a copy.
I'm not sure if there's a ready-made solution, but I imagine the easiest
way to do it would be:
1) Declare a new list of df.size()+1,
2) Copy the pointers to the new list (not sure the best way to do this in
Rcpp),
3) Assign the vector you want to the new, last column,
4) Return that new list.
This should work since internally, lists (VECSXP)s are just vectors of
SEXPs (pointers) to other R vectors (REALSXPs, INTSXPs, and so on...)
(Please correct me if I'm wrong on the above.)
-Kevin
On Sun, Mar 31, 2013 at 6:44 AM, stat quant <statquant at outlook.com> wrote:
Hello list, looking at Rcpp::DataFrame in the gallery<http://gallery.rcpp.org/tags/dataframe/>I realized that I didn't know how to modify a DataFrame by reference. Googling a bit I found this post on SO<http://stackoverflow.com/questions/13773529/passing-a-data-table-to-c-functions-using-rcpp-and-or-rcpparmadillo>and this post on the archive<http://www.mail-archive.com/rcpp-devel at lists.r-forge.r-project.org/msg04919.html> . There is nothing obvious so I suspect I miss something big like "It is already the case because" or "it does not make sense because". I tried the following which compiled but the data.frame object passed to updateDFByRef in R stayed untouched #include <Rcpp.h> using namespace Rcpp; // [[Rcpp::export]] void updateDFByRef(DataFrame& df) { int N = df.nrows(); NumericVector newCol(N,1.); df["newCol"] = newCol; return; } Could somebody explain me what I am missing or kindly point me to a document where I can find the explanation ? Cheers
_______________________________________________ Rcpp-devel mailing list Rcpp-devel at lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20130331/972a90c0/attachment.html>