Skip to content
Prev 47954 / 63424 Next

C API to get numrow of data frame

The safest way is to check the length of the row.names attribute, e.g.

    length(getAttrib(df, R_RowNamesSymbol)).

This protects you from both data.frames with zero columns, as well as
corrupted data.frames containing columns with different lengths, since
by definition the number of rows in a data.frame is defined by its
row.names attribute. However, R will internally un-collapse a
collapsed row.names on this getAttrib call, which is probably
undesired for very large data.frames.

One way of getting around this is calling .row_names_info from R, e.g.
(modulo my errors):

int df_nrows(SEXP s) {
    if (!Rf_inherits(s, "data.frame")) Rf_error("expecting a data.frame");
    SEXP two = PROTECT(Rf_ScalarInteger(2));
    SEXP call = PROTECT( Rf_lang3(
      Rf_install(".row_names_info"),
      s,
      two
    ) );
    SEXP result = PROTECT(Rf_eval(call, R_BaseEnv));
    int output = INTEGER(result)[0];
    UNPROTECT(3);
    return output;
}

More ideally (?), such a function could be added to util.c and
exported by R, e.g. (again, modulo my errors):

int df_nrows(SEXP s) {
    if (!inherits(s, "data.frame")) error("expecting a data.frame");
    SEXP t = getAttrib0(s, R_RowNamesSymbol);
    if (isInteger(t) && INTEGER(t)[0] == NA_INTEGER && LENGTH(t) == 2)
      return abs(INTEGER(t)[1]);
    else
      return LENGTH(t);
}

or even incorporated into the already available 'nrows' function.
Although there is probably someone out there depending on 'nrows'
returning the number of columns for their data.frame...

Cheers,
Kevin
On Mon, Mar 31, 2014 at 6:27 PM, Murray Stokely <murray at stokely.org> wrote: