Skip to content

iterators : checkFunc with ireadLines

2 messages · William Michels, Jeff Newmiller

#
Hi Laurent,

Thank you for explaining your size limitations. Below is an example
using the read.fwf() function to grab the first column of your input
file (in 2000 row chunks). This column is converted to an index, and
the index is used to create an iterator useful for skipping lines when
reading input with scan(). (You could try processing your large file
in successive 2000 line chunks, or whatever number of lines fits into
memory). Maybe not as elegant as the approach you were going for, but
read.fwf() should be pretty efficient:
V1
1 Time
2 N023
3 N053
4 N123
5 N163
6 N193
[1] 3 5
[1] "N053"      "-0.014083" "-0.004741" "0.001443"  "-0.010152"
"-0.012996" "-0.005337" "-0.008738" "-0.015094" "-0.012104"
[1] "N163"      "-0.054023" "-0.049345" "-0.037158" "-0.04112"
"-0.044612" "-0.036953" "-0.036061" "-0.044516" "-0.046436"
(Note for this email and the previous one, I've deleted the first
"hash" character from each line of your test file for clarity).

HTH, Bill.

W. Michels, Ph.D.
On Mon, May 18, 2020 at 3:35 AM Laurent Rhelp <LaurentRHelp at free.fr> wrote:
#
Laurent... Bill is suggesting building your own indexed database... but this has been done before, so re-inventing the wheel seems inefficient and risky. It is actually impossible to create such a beast without reading the entire file into memory at least temporarily anyway, so you are better off looking at ways to process the entire file efficiently.

For example, you could load the data into a sqlite database in a couple of lines of code and use SQL directly or use the sqldf data frame interface, or use dplyr to query the database.

Or you could look at read_csv_chunked from readr package.
On May 18, 2020 11:37:46 AM PDT, William Michels via R-help <r-help at r-project.org> wrote: