Debuggers,
I wrote to r-help about this and was appropriately told off by Peter
Dalgaard. I append that mail in case you have not seen it.
Following Peter's advice I have attempted to simplify the problem.
First note that the following does *not* fail (by which I mean crash, as
in generate a memory access violation):
I tried to cut down my original data set to just the first ten rows to
make it manageable to transmit. Of course then when I ran lm() there
were NA estimates. Thus I wasn't totally surprised that summary() would
have trouble. But, unlike the above, it crashes fatally.
Thinking to reproduce this very simply, I used (sorry quick and dirty, I
know there's a way to use paste to give the model formula):
But this has no problem. So it doesn't seem to be singularity of X or
the length of the model formula at fault (my problem data has 27
variables).
What follows now is what *does* give a fault. The data (in sasch2) is
truncated to just the first 10 rows. I made it so the modified dataset
is called sasch2 so that I could cut and paste the exact same lm() call:
free total
Ncells 886738 1000000
Vcells 7912909 8388608
I get a fatal error when attempting summary() on the fit of an lm() on a
large-ish set of dummy variables (stored in a matrix):
Call:
lm(formula = sasch2[, "ddiff"] ~ sasch2[, "td30"] + sasch2[, "td60"]
+ sasch2[, "td90"] + sasch2[, "td120"] + +sasch2[, "td180"] +
sasch2[, "td240"] + sasch2[, "td300"] + sasch2[, "td360"] +
sasch2[, "td420"] + sasch2[, "td480"] + sasch2[, "db1"] + sasch2[,
"db1.5"] + sasch2[, "db2"] + sasch2[, "db2.5"] + +sasch2[, "db3.5"]
+ sasch2[, "db4"] + sasch2[, "db4.5"] + sasch2[, "db5"] + sasch2[,
"db5.5"] + +sasch2[, "db6"] + sasch2[, "db6.5"] + sasch2[, "db7"] +
sasch2[, "db7.5"] + sasch2[, "db8"] + +sasch2[, "db8.5"] + sasch2[,
"db9"] + sasch2[, "db9.5"])
I get estimates OK, but summary() collapses. However, if I do the same
thing less clumsily, by writing all the relevant variables to a new data
frame, and then calling
Call:
lm(formula = ddiff ~ ., data = dtmp)
I get not only the estimates but can also summary() with no problem.
Any ideas why? Seems to be memory-linked, because I can lm() and
summary() the matrix versions using only the sasch2[,'td*'] or db*
variable sets.
Simon Fear
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Dear Simon,
I have just now tried your example. My temptative conclusion is that
it is due to problem described in r-bugs in message with id 101;
in short: when the R console is a windowed one (surely under Windows,
but I suspect also under the Mac) printing makes use of a
buffer with a fixed and hard-coded length.
Indeed: (i) I got a segmentation fault using rw0632; (ii) all
work without problem in my pre-release rw0633 where problem
has been 'cured' simply by enlarging the buffer (of course,
this WILL not been the definitive solution) but (iii) I got
a segmentation fault also in rw0633 if I reset the length of the
buffer to the R-0.63.2 one.
We hope to make available rw0633 towards the end of the
week. Brian Ripley and I have introduced a lot of Windows
specific changes and this is the reason of the delay.
guido
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._