Skip to content
Prev 279602 / 398506 Next

read.table performance

Here is a test that I ran where the difference was rather the data was
in a single column or 3700 columns.  If in a single column, the 'scan'
and 'read.table' were comparable; with 3700 columns, read.table took
3X longer.  using 'colClasses' did not make a difference:
size isdir mode               mtime
C:\\Users\\Owner\\AppData\\Local\\Temp\\RtmpOWGkEu\\file60a82064
35154500 FALSE  666 2011-12-07 06:13:56

        ctime               atime
C:\\Users\\Owner\\AppData\\Local\\Temp\\RtmpOWGkEu\\file60a82064
2011-12-07 06:13:52 2011-12-07 06:13:52
                                                                 exe
C:\\Users\\Owner\\AppData\\Local\\Temp\\RtmpOWGkEu\\file60a82064  no
Read 1850000 items
   user  system elapsed
   4.04    0.05    4.10
NULL
14800040 bytes
user  system elapsed
   4.68    0.06    4.74
14800672 bytes
size isdir mode               mtime
C:\\Users\\Owner\\AppData\\Local\\Temp\\RtmpOWGkEu\\file60a82064
33305000 FALSE  666 2011-12-07 06:14:11

        ctime               atime
C:\\Users\\Owner\\AppData\\Local\\Temp\\RtmpOWGkEu\\file60a82064
2011-12-07 06:13:52 2011-12-07 06:13:52

C:\\Users\\Owner\\AppData\\Local\\Temp\\RtmpOWGkEu\\file60a82064  no
Read 1850000 items
   user  system elapsed
   4.21    0.02    4.23
NULL
14800040 bytes
user  system elapsed
  13.24    0.06   13.33
[1]  500 3700
15185368 bytes
+     , colClasses = rep('numeric', 3700)
+     )
+ )
   user  system elapsed
  12.39    0.06   12.48

        
On Tue, Dec 6, 2011 at 4:33 PM, Gene Leynes <gleynes at gmail.com> wrote: