Skip to content

R Bug: write.table for matrix of more than 2, 147, 483, 648 elements

7 messages · Duncan Murdoch, Tomas Kalibera, Serguei Sokol +2 more

#
On 18/04/2018 5:08 PM, Tousey, Colton wrote:
Yes, looks like a typo:  R_len_t is an int, and that's how nr was 
declared.  It should be R_xlen_t, which is bigger on machines that 
support big vectors.

I haven't tested the change; there may be something else in that 
function that assumes short vectors.

Duncan Murdoch
#
On 04/19/2018 02:06 AM, Duncan Murdoch wrote:
Indeed, I think the function won't work for long vectors because of 
EncodeElement2 and EncodeElement0. EncodeElement2/0 would have to be 
changed, including their signatures

Tomas
#
Le 19/04/2018 ? 09:30, Tomas Kalibera a ?crit?:
That would be a definite fix but before such deep rewriting is 
undertaken may the following small fix (in addition to "(R_xlen_t)nr * 
nc") will be sufficient for cases where nr and nc are in int range but 
their product can reach long vector limit:

replace
 ??? tmp = EncodeElement2(x, i + j*nr, quote_col[j], qmethod,
 ??? ??? ??? ??? ??? &strBuf, sdec);
by
 ??? tmp = EncodeElement2(VECTOR_ELT(x, (R_xlen_t)i + j*nr), 0, 
quote_col[j], qmethod,
 ??? ??? ??? ??? ??? &strBuf, sdec);

Serguei
#
On 04/19/2018 11:47 AM, Serguei Sokol wrote:
Unfortunately we can't do that, x is a matrix of an atomic vector type. 
VECTOR_ELT is taking elements of a generic vector, so it cannot be 
applied to "x". But even if we extracted a single element from "x" (e.g. 
via a type-switch etc), we would not be able to pass it to 
EncodeElement0 which expects a full atomic vector (that is, including 
its header). Instead we would have to call functions like EncodeInteger, 
EncodeReal0, etc on the individual elements. Which is then the same as 
changing EncodeElement0 or implementing a new version of it. This does 
not seem that hard to fix, just is not as trivial as changing the cast..

Tomas
#
Le 19/04/2018 ? 12:15, Tomas Kalibera a ?crit?:
Thanks Tomas for this detailed explanation.

I would like also to signal a problem with the list. It must be 
corrupted in some way because beside the Tomas'? response I've got five 
or six (so far) dating spam. All of them coming from two emails: 
Kristina Oliynik <kristinaoliynik604324 at kw.taluss.com> and Samantha 
Smith <samanthasmith317260 at kw.fefty.com>.

Serguei.
#
[...............]

    > Thanks Tomas for this detailed explanation.

    > I would like also to signal a problem with the list. It must be 
    > corrupted in some way because beside the Tomas'? response I've got five 
    > or six (so far) dating spam. All of them coming from two emails: 
    > Kristina Oliynik <kristinaoliynik604324 at kw.taluss.com> and 
    > Samantha Smith <samanthasmith317260 at kw.fefty.com>.


Well, that's the current ones for you.  They change over time,
and in my experience you get about 10--20 (about once per hour;
on purpose not exactly every 60 minutes) and then it stops.

I've replied to the thread  "Hacked" on R-help yesterday:
  https://stat.ethz.ch/pipermail/r-help/2018-April/452423.html

This has started ca 2 weeks ago on R-help already, and today
we've learned that even  R-SIG-Mixed-Models  is affected.

I think I don't see them anymore at all because my spam filters have adapted.

Note that

1. This is *NOT* from regular mailing list subscribers, and none
   of these spam come via the R mailing list servers.

2. It's still a huge pain and disreputable to the R lists of course.

3. I had hoped we could wait and see it go away, but I may be wrong.

4. We have re-started discussing what could be done.

   One drastic measure would make mailing list usage
   *less* attractive by "munging" all poster's e-mail addresses.

-----

For now use your mail providers spam filters to quickly get rid
of this. .. or more interestingly and clearly less legally: use R to
write "mail bombs".  Write an R function sending ca 10 e-mails per
hour randomly to that address   ... ;-)  I did something like
that (with a shell script, not R) at the end of last millennium
when I was younger and the internet was a much much smaller
space than now...

Martin
#
On 2018-04-19 09:40, Martin Maechler wrote:
????? What about implementing "Mailhide", described in the Wikipedia 
article on "reCAPTCHA"?


 ????? '[F]or example, "mailme at example.com" would be converted to 
"mai... at example.com". The visitor would then click on the "..." and 
solve the CAPTCHA in order to obtain the full email address. One can 
also edit the pop-up code so that none of the address is visible.' 
(https://en.wikipedia.org/wiki/ReCAPTCHA)


 ????? Of course, this is easier for me to suggest, because I'm not in a 
position to actually implement it ;-)


 ????? Spencer Graves


p.s.? I wish again to express my deep appreciation to Martin and the 
other members of the R Core team who have invested so much time and 
creativity into making The R Project for Statistical Computing the 
incredible service it is today.? A good portion of humanity lives better 
today, because of problems that would not otherwise have been addressed 
as well as they have been without some important analysis done with R.