Dealing With Extremely Large Files
You can always setup a "connection" and then read in the number of lines you need for the analysis, write out the results and then read in the next ones. I have also used 'filehash' to initially read in portions of a file and then write the objects into the database. These are quickly retrieved if I want to make subsequent passes through the data. A 100,000 rows will also probably tax your machine since if these are numeric, you will need 800MB to store a since copy of the object and you will probably need 3-4X that amount (a total of 4GB of physical memory) if you are doing any processing that might make copies. Hopefully you are running on a 64-bit system with lots of memory.
On Fri, Sep 26, 2008 at 3:55 PM, zerfetzen <zerfetzen at yahoo.com> wrote:
Hi, I'm sure that a large fixed width file, such as 300 million rows and 1,000 columns, is too large for R to handle on a PC, but are there ways to deal with it? For example, is there a way to combine some sampling method with read.fwf so that you can read in a sample of 100,000 records, for example? Something like this may make analysis possible. Once analyzed, is there a way to, say, read in only x rows at a time, save and score each subset separately, and finally append them back together? I haven't seen any information on this, if it is possible. Thank you for reading, and sorry if the information was easily available and I simply didn't find it. -- View this message in context: http://www.nabble.com/Dealing-With-Extremely-Large-Files-tp19695311p19695311.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?