the quote problem with readLines()
The amount of data that you want to read in (136M numbers) will require about 1GB of memory (8 bytes per number for floating point - truncation does not reduce this number of bytes). So if you want to read it all in, then find a 64-bit version of R and probably at least 4GB of memory for your process. A 32-bit version might have just enough space if you can allocate all the 4GB of memory to that process. So if you want to have it all in memory, invest in a larger computer. If you want to run on the system you have, then you will probably have to sample your data so that you can get a portion that will fit in memory to run your test, or see if there is a way of processing portions of the file and then combining for a final result.
On Wed, Mar 18, 2009 at 9:58 AM, Dongyan Song <yzhskdls at hotmail.com> wrote:
Hi, Thank you for your concern! The file has 136,047,472 lines, with one value in each line, and is 1.7G in size. I run in a Linux (OpenSuse OS) with 4G memory in total. The error message is Error: cannot allocate vector of size 2.0 Gb. And the worst thing is even if I read all the data into R after I truncate the numbers' precision, i.e. from 1.234567e+00 to 1.2, I cannot manipulate these numbers, for example, I cannot do ks.test, histogram, kernel density estimator, which I want to do with these numbers. And after I input commands above, computer also give error messages like Error: cannot allocate vector of size 809.1 Mb. I can read a half of file, but I want to know the overall distribution of those numbers, and values in this file is not ordered, and it is not quite easy to random pick up some numbers or sort them. Is these information enough? Thank you again! Best, Dongyan jholtman wrote:
readLines is doing exactly what you are asking: Value A character vector of length the number of lines read. You still have to convert the character strings to numeric. ?Exactly how large is "quite large"? ?What system are you running on? ?How much memory do you have? ?What is the error message that you are getting? Exactly what does your file look like? ?Have you tried reading in portions of the file? ?How big will it be if you could read it in? Will it take up more than 25% of real memory? ?There is still some information you need to provide so an assessment can be made. On Tue, Mar 17, 2009 at 8:50 AM, Dongyan Song <yzhskdls at hotmail.com> wrote:
Dear all, I read a file with all numbers with readLines function, as below,
f <- file("data.txt")
a <- readLines(f)
but all the values in a are in format "....", and I cannot do the calculation with them since they are not numeric. I wonder how should I skip those quotes, thank you for help! I have to use readLines function instead of scan, read.table or matrix, because the size of file is quite large, and other function cannot allocate enough space/memory to read the input file. Best, Dongyan -- View this message in context: http://www.nabble.com/the-quote-problem-with-readLines%28%29-tp22558454p22558454.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
----- Dongyan Song, Msc Medical informatics, Uppsala University, Sweden -- View this message in context: http://www.nabble.com/the-quote-problem-with-readLines%28%29-tp22558454p22579163.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?