Skip to content

ncol() vs. length() on data.frames

2 messages · Greg Snow, Ivan Calandra

#
As others have pointed out, ncol calls the length function, so you are
pretty safe in terms of output of getting the same result when applied
to the results of functions like read.csv (there will be a big
difference if you ever apply those functions to a matrix or some other
data structures).

One thing that I have not seen yet is a comparison on timing, so here goes:
+ length = length(iris),
+ ncol = ncol(iris)
+ )
Unit: nanoseconds
   expr  min   lq mean median   uq   max neval
 length  700  750  869    800  800  7400   100
   ncol 2400 2500 2981   2600 2700 31900   100

So ncol takes about 3 times as long to run as length on the iris data
frame (5 columns), you can rerun the above code with data frames more
the size that you will be using to see if that makes any difference.
But also notice that the units are nanoseconds, so the median time for
ncol to run is less than the time it takes light to travel a kilometer
in a vacuum, or about the time it takes light to go 1/3 of a mile
through a fiber optic cable (en.wikipedia.org/wiki/Microsecond).  If
this is used as part of a simulation or other repeated procedure and
it is done one million times then you will add about 2 seconds to the
overall run.  If this is just part of code where length/ncol will be
called fewer than 10 times then nobody is going to notice.

So the trade-off of moving from length to ncol is a slight decrease in
speed for an increase of readability.  I think that I would go with
the readability myself.
On Tue, Mar 31, 2020 at 8:11 AM Ivan Calandra <calandra at rgzm.de> wrote:

  
    
2 days later
#
Thank you Greg for the insights!

I agree with you that the decrease in speed is not worth the decrease in
readability, and I'll change my length() calls to ncol().

Best,
Ivan

--
Dr. Ivan Calandra
TraCEr, laboratory for Traceology and Controlled Experiments
MONREPOS Archaeological Research Centre and
Museum for Human Behavioural Evolution
Schloss Monrepos
56567 Neuwied, Germany
+49 (0) 2631 9772-243
https://www.researchgate.net/profile/Ivan_Calandra
On 03/04/2020 17:45, Greg Snow wrote: