Sorry, I omitted background information:
R version: 2.3.0
OS: Windows XP
CPU: Pentium III,
RAM: 768 MB
Also, what command-line memory settings might prevent R from crashing
while using the Matrix package to convert my 600 X 4482 dgTMatrix to the
dgCMatrix class or to an expanded Matrix, via the as() function? I can
do this with half of the matrix, 300 x 4482.
Thanks,
John T
Confidentiality Notice: This e-mail message, including any a...{{dropped}}
Warning while subsetting using Matrix package
5 messages · Martin Maechler, Thaden, John J
"JohnT" == Thaden, John J <ThadenJohnJ at uams.edu>
on Thu, 6 Jul 2006 00:02:10 -0500 writes:
JohnT> > # ...I had previously created a sparse matrix in triplet form:
JohnT> > str(x)
JohnT> Formal class 'dgTMatrix' [package "Matrix"] with 6 slots
JohnT> ..@ i : int [1:923636] 1 2 3 4 5 6 7 8 9 10 ...
JohnT> ..@ j : int [1:923636] 1 1 1 1 1 1 1 1 1 1 ...
JohnT> ..@ Dim : int [1:2] 600 4482
JohnT> ..@ Dimnames:List of 2
JohnT> .. ..$ : chr [1:601] "50" "51" "52" "53" ...
JohnT> .. ..$ : chr [1:4482] "1" "2" "3" "4" ...
JohnT> ..@ x : num [1:923636] 50.2 51.2 52.2 53.2 54.2 ...
JohnT> ..@ factors : list()
JohnT> >
JohnT> > # While subsetting x, I was surprised to get this warning:
JohnT> > y<-x[1:300,]
JohnT> Warning message:
JohnT> number of items to replace is not a multiple of replacement length
and later
JohnT> Sorry, I omitted background information:
JohnT> R version: 2.3.0
JohnT> OS: Windows XP
JohnT> CPU: Pentium III,
JohnT> RAM: 768 MB
You omitted the most pertinent information: The version of
'Matrix' you are using.
The latest released version of Matrix does *not* show the
behavior you mentioned.
{So I have now spent 20 minutes just because you did not update 'Matrix'..}
JohnT> Also, what command-line memory settings might prevent R from crashing
JohnT> while using the Matrix package to convert my 600 X 4482 dgTMatrix to the
JohnT> dgCMatrix class or to an expanded Matrix, via the as() function? I can
JohnT> do this with half of the matrix, 300 x 4482.
It's hard to believe that you get a "crash" when coercing to
'dgC' -- but of course this really depends how much memory you
have already goggled up by other large objects in your R
workspace, or by other applications running at the same time in
Windows. Coercing to a full matrix will of course require
8 * 601 * 4482 = 21549456 extra bytes just for the numbers.
That's only 21.5 Megabytes, so I wonder..
I have never seen R crashes from using 'Matrix', but then I
work with an operating system, not with M$ Windows.
Maybe you meant you got an error message "... memory allocation .."?
JohnT> Thanks,
JohnT> John T
Martin Maechler replied to my query "Warning while subsetting...":
MartinM> >>>>> "JohnT" == Thaden, John J <ThadenJohnJ at uams.edu>
MartinM> >>>>> on Thu, 6 Jul 2006 00:02:10 -0500 writes:
JohnT> ...
JohnT> > # While subsetting x, I was surprised to get this warning:
JohnT> > y<-x[1:300,]
JohnT> Warning message:
JohnT> number of items to replace is not a multiple of replacement
length
MartinM> and later
JohnT> Sorry, I omitted background information:
JohnT> R version: 2.3.0
JohnT> OS: Windows XP
JohnT> CPU: Pentium III,
JohnT> RAM: 768 MB
MartinM> You omitted the most pertinent information: The
MartinM> version of 'Matrix' you are using.
MartinM> The latest released version of Matrix does
MartinM> *not* show the behavior you mentioned. {So I have
MartinM> now spent 20 minutes just because you did not
MartinM> update 'Matrix'..}
The Matrix package was version 0.995-10, now is 0.995-11.
The R base was version 2.3.0, now is 2.3.1.
Subsetting 'y <- x[1:300,]' now works. Please accept my apology.
JohnT> Also, what command-line memory settings might prevent
JohnT> R from crashing while using the Matrix package to
JohnT> convert my 600 X 4482 dgTMatrix to the dgCMatrix class
JohnT> or to an expanded Matrix, via the as() function? I can
JohnT> do this with half of the matrix, 300 x 4482.
MartinM> It's hard to believe that you get a "crash"
MartinM> when coercing to 'dgC' -- but of course this
MartinM> really depends how much memory you have already
MartinM> goggled up by other large objects in your R
MartinM> workspace, or by other applications running at
MartinM> the same time in Windows. Coercing to a full
MartinM> matrix will of course require 8 * 601 * 4482 =
MartinM> 21549456 extra bytes just for the numbers.
MartinM> That's only 21.5 Megabytes, so I wonder..
MartinM>
MartinM> I have never seen R crashes from using 'Matrix',
MartinM> but then I work with an operating system, not
MartinM> with M$ Windows.
MartinM>
MartinM> Maybe you meant you got an error message
MartinM> "... memory allocation .."?
Testing again, I closed all applications; disabled antivirus;
opened RGui; removed all R objects but 'x' (a 600x4482 dgTMatrix);
opened WinXP's 'Task Manager'; saw only "Rgui" under
'Applications'; saw processes using a total of 287 MB of memory
under 'Processes'; closed 'Task Manager'; and typed R commands:
# Steps leading to an R crash... ls()
[1] "x"
str(x)
Formal class 'dgTMatrix' [package "Matrix"] with 6 slots ..@ i : int [1:923636] 1 2 3 4 5 6 7 8 9 10 ... ..@ j : int [1:923636] 1 1 1 1 1 1 1 1 1 1 ... ..@ Dim : int [1:2] 600 4482 ..@ Dimnames:List of 2 .. ..$ : chr [1:601] "50" "51" "52" "53" ... .. ..$ : chr [1:4482] "1" "2" "3" "4" ... ..@ x : num [1:923636] 50.2 51.2 52.2 53.2 54.2 ... ..@ factors : list()
gc()
used (Mb) gc trigger (Mb) max used (Mb) Ncells 183529 5.0 407500 10.9 350000 9.4 Vcells 1928101 14.8 2286173 17.5 1928652 14.8
library(Matrix)
Loading required package: lattice
gc()
used (Mb) gc trigger (Mb) max used (Mb) Ncells 627772 16.8 1073225 28.7 1073225 28.7 Vcells 2165773 16.6 3345184 25.6 2332013 17.8
search()
[1] ".GlobalEnv" "package:Matrix" "package:lattice" [4] "package:methods" "package:stats" "package:graphics" [7] "package:grDevices" "package:utils" "package:datasets" [10] "Autoloads" "package:base"
#Now the line that causes crashes... y <- as(x,"dgCMatrix")
After ~10 seconds, R blinks off and a WinXP dialog appears:
R for Windows GUI front-end has encountered
a problem and needs to close. We are sorry
for the inconvenience....Error signature:
AppName: rgui.exe AppVer: 2.31.38247.0
ModName: matrix.dll Offset: 0000ff31....
Report error?
Sincerely,
John Thaden
Confidentiality Notice: This e-mail message, including any a...{{dropped}}
"JohnT" == Thaden, John J <ThadenJohnJ at uams.edu>
on Thu, 6 Jul 2006 12:29:42 -0500 writes:
JohnT> Martin Maechler replied to my query "Warning while subsetting...":
MartinM> >>>>> "JohnT" == Thaden, John J <ThadenJohnJ at uams.edu>
MartinM> >>>>> on Thu, 6 Jul 2006 00:02:10 -0500 writes:
JohnT> ...
JohnT> > # While subsetting x, I was surprised to get this warning:
JohnT> > y<-x[1:300,]
JohnT> Warning message:
JohnT> number of items to replace is not a multiple of
JohnT> replacement length
MartinM> and later
JohnT> Sorry, I omitted background information:
JohnT> R version: 2.3.0
JohnT> OS: Windows XP
JohnT> CPU: Pentium III,
JohnT> RAM: 768 MB
MartinM> You omitted the most pertinent information: The
MartinM> version of 'Matrix' you are using.
MartinM> The latest released version of Matrix does
MartinM> *not* show the behavior you mentioned. {So I have
MartinM> now spent 20 minutes just because you did not
MartinM> update 'Matrix'..}
JohnT> The Matrix package was version 0.995-10, now is 0.995-11.
JohnT> The R base was version 2.3.0, now is 2.3.1.
JohnT> Subsetting 'y <- x[1:300,]' now works. Please accept my apology.
JohnT> Also, what command-line memory settings might prevent
JohnT> R from crashing while using the Matrix package to
JohnT> convert my 600 X 4482 dgTMatrix to the dgCMatrix class
JohnT> or to an expanded Matrix, via the as() function? I can
JohnT> do this with half of the matrix, 300 x 4482.
MartinM> It's hard to believe that you get a "crash"
MartinM> when coercing to 'dgC' -- but of course this
MartinM> really depends how much memory you have already
MartinM> goggled up by other large objects in your R
MartinM> workspace, or by other applications running at
MartinM> the same time in Windows. Coercing to a full
MartinM> matrix will of course require 8 * 601 * 4482 =
MartinM> 21549456 extra bytes just for the numbers.
MartinM> That's only 21.5 Megabytes, so I wonder..
MartinM>
MartinM> I have never seen R crashes from using 'Matrix',
(actually that's not even true; at some point in time we had a
bug in 'Matrix' which lead to spurious segmentation faults)
MartinM> but then I work with an operating system, not
MartinM> with M$ Windows.
MartinM>
MartinM> Maybe you meant you got an error message
MartinM> "... memory allocation .."?
JohnT> Testing again, I closed all applications; disabled antivirus;
JohnT> opened RGui; removed all R objects but 'x' (a 600x4482 dgTMatrix);
JohnT> opened WinXP's 'Task Manager'; saw only "Rgui" under
JohnT> 'Applications'; saw processes using a total of 287 MB of memory
JohnT> under 'Processes'; closed 'Task Manager'; and typed R commands:
>> # Steps leading to an R crash...
>> ls()
JohnT> [1] "x"
>> str(x)
JohnT> Formal class 'dgTMatrix' [package "Matrix"] with 6 slots
JohnT> ..@ i : int [1:923636] 1 2 3 4 5 6 7 8 9 10 ...
JohnT> ..@ j : int [1:923636] 1 1 1 1 1 1 1 1 1 1 ...
JohnT> ..@ Dim : int [1:2] 600 4482
JohnT> ..@ Dimnames:List of 2
JohnT> .. ..$ : chr [1:601] "50" "51" "52" "53" ...
JohnT> .. ..$ : chr [1:4482] "1" "2" "3" "4" ...
JohnT> ..@ x : num [1:923636] 50.2 51.2 52.2 53.2 54.2 ...
JohnT> ..@ factors : list()
>> gc()
JohnT> used (Mb) gc trigger (Mb) max used (Mb)
JohnT> Ncells 183529 5.0 407500 10.9 350000 9.4
JohnT> Vcells 1928101 14.8 2286173 17.5 1928652 14.8
>> library(Matrix)
JohnT> Loading required package: lattice
>> gc()
JohnT> used (Mb) gc trigger (Mb) max used (Mb)
JohnT> Ncells 627772 16.8 1073225 28.7 1073225 28.7
JohnT> Vcells 2165773 16.6 3345184 25.6 2332013 17.8
>> search()
JohnT> [1] ".GlobalEnv" "package:Matrix" "package:lattice"
JohnT> [4] "package:methods" "package:stats" "package:graphics"
JohnT> [7] "package:grDevices" "package:utils" "package:datasets"
JohnT> [10] "Autoloads" "package:base"
>> #Now the line that causes crashes...
>> y <- as(x,"dgCMatrix")
JohnT> After ~10 seconds, R blinks off and a WinXP dialog appears:
JohnT> R for Windows GUI front-end has encountered
JohnT> a problem and needs to close. We are sorry
JohnT> for the inconvenience....Error signature:
JohnT> AppName: rgui.exe AppVer: 2.31.38247.0
JohnT> ModName: matrix.dll Offset: 0000ff31....
JohnT> Report error?
Thanks a lot, John, for the more detailed report.
I do wonder how it happens, since the memory allocation is not
really big. E.g., I can easily solve ``your'' (well, a
simulated version of it) problem on a machine with only 512 MB
RAM:
library("Matrix")
## MM: construct a matrix *as* John's :
d <- as.integer(c(600,4482))
n0 <- 923636
set.seed(1)
M <- new("dgTMatrix", Dim = d,
i = sort(sample(0:(d[1]-1), size = n0, replace = TRUE)),
j = sample(0:(d[2]-1), size = n0, replace = TRUE),
x = round(rnorm(n0, m = 50, sd = 10), 1))
dimnames(M) <- list(paste("r", 1:d[1], sep=''),
paste("C", 1:d[2], sep=''))
str(M)
M1.10 <- M[1:10,] # gave warning in earlier versions of 'Matrix'
## on 'nanny' which has just 512 MB (with other processes active, etc):
gc()
## used (Mb) gc trigger (Mb) max used (Mb)
## Ncells 642690 17.2 1073225 28.7 1073225 28.7
## Vcells 3136547 24.0 8305047 63.4 7988501 61.0
mC <- as(M, "dgCMatrix")
## ---------
gc()
## used (Mb) gc trigger (Mb) max used (Mb)
## Ncells 642721 17.2 1073225 28.7 1073225 28.7
## Vcells 4311327 32.9 8305047 63.4 7988501 61.0
## well, this will need a bit more memory, but should still work:
mm <- as(M, "matrix")
## -------
gc()
##- used (Mb) gc trigger (Mb) max used (Mb)
##- Ncells 642725 17.2 1073225 28.7 1073225 28.7
##- Vcells 7000528 53.5 8438708 64.4 7988501 61.0
I see in the CHANGES file for {R for Windows}
R 2.3.1 patched =============== [.........................] R could crash when very low on memory. (PR#8981)
So, maybe you can try to even run "R 2.3.1 patched" for Windows,
which you can get from here,
http://cran.us.r-project.org/bin/windows/base/rpatched.html
and see if your crashes go away ?
Regards,
Martin
1 day later
With thanks to Matrix package co-author Martin
Maechler, I'm happy to report satisfactory closure
of two recent threads I initiated about that package:
- Warning while subsetting with Matrix
- R Crash with 'library(Matrix);as(x,"dgCMatrix")
In the first, I reported seeing a warning message after
selecting a subset of a matrix 'M' of class "dgTMatrix"
via the first but not the second of these commands:
M10 <- M[1:10,]
M10 <- M[1:10,1:10]
Acting on Martin's advice, I upgraded Matrix version
0.995-10 to version 0.995-11. This eliminated the
warning message.
In the second, I reported that class-conversion of
a self-constructed "dgTMatrix" of dimension 600 X 4482
caused R to crash when done via either of the first two
but neither of the last two of these commands:
Mc <- as(M, "dgCMatrix")
Mm <- as(M, "matrix")
Mc <- rbind(as(M[1:300,], "dgCMatrix"), as(M[301:600], "dgCMatrix")
Mm <- rbind(as(M[1:300,], "matrix"), as(M[301:600], "matrix")
I surmised this was a memory issue but Martin pointed out
the memory demand to hold even the expanded sparse matrix
is actually rather small (8 x 600 x 4482 = 21549456 bytes
= 21.5 Megabytes).
For obvious bandwidth reasons, I did not post my large
matrix, so Martin created one of his own. Neither he
nor I could reproduce crashes with his matrix. Offlist,
I sent him my matrix. Upon class-conversion, it crashed
his machine too.
Upon inspection, Martin discovered that my matrix did not
conform to the dgTMatrix class as defined in Matrix. In
particular, my slots M at i and M at j were integer vectors
beginning with ones, i.e., they respectively indexed the
first row and column of the matrix as "1", not as "0".
However, correcting this did not stop crashes, until I also
realized that M at i had 601 (not 600) unique integer values,
and corrected my Dims slot (M at Dims) to be c(601,4482). This
stopped the crashes.
Once created, Matrix objects behave like matrix objects in
the sense that "one-based" indexing is used to subset them
or re-assign values to them. But for users who create
them, especially using the new() function, it is helpful
to remind oneself that Matrix objects are created and
stored using "zero-based" indexing. This fact is
mentioned only obliquely in the current version's PDF file
and R-accessible documentation. Hopefully that will change.
Again, my thanks to Martin. For me this experience confirms
The vigor of open-source, forum-supported projects.
-John Thaden, Ph.D.
Research Assistant Professor of Geriatrics
University of Arkansas for Medical Sciences
Little Rock, AR 72205
USA
Confidentiality Notice: This e-mail message, including any a...{{dropped}}