With
?if?(!j--)?{
?????R_CheckUserInterrupt();
?????j?=?10000;
?}
as?in?current?R?devel?(r83976),?j goes negative (-1) and interrupt is checked every 10001 instead of 10000. I?prefer
?if?(!--j)?{
?????R_CheckUserInterrupt();
?????j?=?10000;
?}
.
In?current?R?devel?(r83976),?if?EOF?is?reached,?the?outer?loop?keeps?going,?i?keeps?incrementing?until?nskip.
The?outer?loop?could?be?made?to?also?stop?on?EOF.
Alternatively,?not?using?nested?loop?is?possible,?like?the?following.
?if?(nskip)?for?(R_xlen_t?i?=?0,?j?=?10000;?;?)?{?/*?MBCS-safe?*/
?c?=?scanchar(FALSE,?&data);
?if?(!j--)?{
?????R_CheckUserInterrupt();
?????j?=?10000;
?}
?if?((c?==?'\n'?&&?++i?==?nskip)?||?c?==?R_EOF)
?????break;
?}
-----------
On?2/11/23?09:33,?Ivan?Krylov?wrote:
?On?Fri,?10?Feb?2023?23:38:55?-0600 ?Spencer?Graves?<spencer.graves?using?prodsyse.com>?wrote:
?I?have?a?4.54?GB?file?that?I'm?trying?to?read?in?chunks?using ?"scan(...,?skip=__)".??It?works?as?expected?for?small?values?of ?"skip"?but?goes?into?an?infinite?loop?for?"skip=1e11"?and?similar ?large?values?of?skip:??I?cannot?even?interrupt?it;??I?must?kill?R.
?Skipping?lines?is?done?by?two?nested?loops.?The?outer?loop?counts?the
?lines?to?skip;?the?inner?loop?reads?characters?until?it?encounters?a
?newline?or?end?of?file.?The?outer?loop?doesn't?check?for?EOF?and?keeps
?asking?for?more?characters?until?the?inner?loop?runs?at?least?once?for
?every?line?it?wants?to?skip.?The?following?patch?should?avoid?the
?wait?in?such?cases:
?---?src/main/scan.c?(revision?83797)
?+++?src/main/scan.c?(working?copy)
?@@?-835,7?+835,7?@@
???attribute_hidden?SEXP?do_scan(SEXP?call,?SEXP?op,?SEXP?args,?SEXP?rho)
???{
???????SEXP?ans,?file,?sep,?what,?stripwhite,?dec,?quotes,?comstr;
?-????int?c,?flush,?fill,?blskip,?multiline,?escapes,?skipNul;
?+????int?c?=?0,?flush,?fill,?blskip,?multiline,?escapes,?skipNul;
???????R_xlen_t?nmax,?nlines,?nskip;
???????const?char?*p,?*encoding;
???????RCNTXT?cntxt;
?@@?-952,7?+952,7?@@
????????if(!data.con->canread)
????error(_("cannot?read?from?this?connection"));
????}
?-?for?(R_xlen_t?i?=?0;?i?<?nskip;?i++)?/*?MBCS-safe?*/
?+?for?(R_xlen_t?i?=?0;?i?<?nskip?&&?c?!=?R_EOF;?i++)?/*?MBCS-safe?*/
????????while?((c?=?scanchar(FALSE,?&data))?!=?'\n'?&&?c?!=?R_EOF);
???????}
?Making?it?interruptible?is?a?bit?more?work:?we?need?to?ensure?that?a
?valid?context?is?set?up?and?check?regularly?for?an?interrupt.
?---?src/main/scan.c?(revision?83797)
?+++?src/main/scan.c?(working?copy)
?@@?-835,7?+835,7?@@
???attribute_hidden?SEXP?do_scan(SEXP?call,?SEXP?op,?SEXP?args,?SEXP?rho)
???{
???????SEXP?ans,?file,?sep,?what,?stripwhite,?dec,?quotes,?comstr;
?-????int?c,?flush,?fill,?blskip,?multiline,?escapes,?skipNul;
?+????int?c?=?0,?flush,?fill,?blskip,?multiline,?escapes,?skipNul;
???????R_xlen_t?nmax,?nlines,?nskip;
???????const?char?*p,?*encoding;
???????RCNTXT?cntxt;
?@@?-952,8?+952,6?@@
????????if(!data.con->canread)
????error(_("cannot?read?from?this?connection"));
????}
?-?for?(R_xlen_t?i?=?0;?i?<?nskip;?i++)?/*?MBCS-safe?*/
?-?????while?((c?=?scanchar(FALSE,?&data))?!=?'\n'?&&?c?!=?R_EOF);
???????}
???????ans?=?R_NilValue;?/*?-Wall?*/
?@@?-966,6?+964,10?@@
???????cntxt.cend?=?&scan_cleanup;
???????cntxt.cenddata?=?&data;
?+????if?(ii)?for?(R_xlen_t?i?=?0,?j?=?0;?i?<?nskip?&&?c?!=?R_EOF;?i++)?/*?MBCS-safe?*/
?+?while?((c?=?scanchar(FALSE,?&data))?!=?'\n'?&&?c?!=?R_EOF)
?+?????if?(j++?%?10000?==?9999)?R_CheckUserInterrupt();
?+
???????switch?(TYPEOF(what))?{
???????case?LGLSXP:
???????case?INTSXP:
?This?way,?even?if?you?pour?a?Decanter?of?Endless?Lines?(e.g.?mkfifo
?LINES;?perl?-E'print?"A"x42?while?1;'?>?LINES)?into?scan(),?it?can
?still?be?interrupted,?even?if?neither?newline?nor?EOF?ever?arrives.
Thanks,?I've?updated?the?implementation?of?scan()?in?R-devel?to?be interruptible?while?skipping?lines. I've?done?it?slightly?differently?as?I?found?there?already?was?a?memory leak,?which?could?be?fixed?by?creating?the?context?a?bit?earlier. I've?also?avoided?modulo?on?the?fast?path?as?I?saw?13%?performance overhead?on?my?mailbox?file.?Decrementing?and?checking?against?zero didn't?have?measurable?overhead. Best Tomas [snip]