R 3.2.2 Hangs Reading Files in El Capitan [Solved]
Simon, Absolutely was about RDS, but R is all about choices and the underlying issue was time to read in data which fread and feather are quite fast at. I assume when you say efficient you are referring to disk space? I put together a script to look at this further with and without compression*. If speed is a priority over disk space then Feather and data.table (CSV) are good options**. CSV is portable to any system and feather can be used by python/Julia. RDS/RDA saves a lot of space and, but are slower to write and read due to compression. I hope that's helpful to those thinking about their priorities for file IO in R. Brandon * http://rpubs.com/bhive01/fileioinr ** writing a CSV with data.table is freaky fast if you can get OpenMP working on your machine https://github.com/Rdatatable/data.table/issues/1692 Reading that same CSV is comparable to RDS. On Fri, May 6, 2016 at 6:07 AM, Simon Urbanek
<simon.urbanek at r-project.org> wrote:
Brandon, note that the post was about RDS which is more efficient than all the options you list (in particular when not compressed). General advice is to avoid strings. Numeric vectors are several orders of magnitude faster than strings to load/save. Cheers, Simon
On May 5, 2016, at 6:49 PM, Brandon Hurr <bhive01 at gmail.com> wrote: You might be interested in the speed wars that are happening in the file reading/writing space currently. Matt Dowle/Arun Srinivasan's data.table and Hadley Wickham/Wes McKinney's Feather have made huge speed advances in reading/writing large datasets from disks (mostly csv). Data Table fread()/fwrite(): https://github.com/Rdatatable/data.table https://stackoverflow.com/questions/35763574/fastest-way-to-read-in-100-000-dat-gz-files http://blog.h2o.ai/2016/04/fast-csv-writing-for-r/ Feather read_feather()/write_feather() https://github.com/wesm/feather I don't often have big datasets (10s of MBs) so I don't see the benefits of these much, but you might. HTH, B On Thu, May 5, 2016 at 3:16 PM, Charles DiMaggio <charles.dimaggio at gmail.com> wrote:
Been a while, but wanted to close the page on a previous post describing R hanging on readRDS() and load() for largish (say 500MB or larger) files. Tried again with recent release (3.3.0). Am able to read in large files under El Cap. While the file is reading in, I get a disconcerting spinning pinwheel of death and a check under Force Quit reports R is not responding. But if I wait it out, it eventually reads in. Odd. But I can live with it.
Cheers
Charles
Charles DiMaggio, PhD, MPH
Professor of Surgery and Population Health
Director of Injury Research
Department of Surgery
New York University School of Medicine
462 First Avenue, NBV 15
New York, NY 10016-9196
Charles.Dimaggio at nyumc.org
Office: 212.263.3202
Mobile: 516.308.6426
[[alternative HTML version deleted]]
_______________________________________________ R-SIG-Mac mailing list R-SIG-Mac at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-mac
_______________________________________________ R-SIG-Mac mailing list R-SIG-Mac at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-mac