Skip to content

Warning while subsetting using Matrix package

5 messages · Martin Maechler, Thaden, John J

#
Sorry, I omitted background information:
  R version: 2.3.0
  OS: Windows XP
  CPU:  Pentium III, 
  RAM:  768 MB

Also, what command-line memory settings might prevent R from crashing
while using the Matrix package to convert my 600 X 4482 dgTMatrix to the
dgCMatrix class or to an expanded Matrix, via the as() function? I can
do this with half of the matrix, 300 x 4482.

Thanks,
John T

Confidentiality Notice: This e-mail message, including any a...{{dropped}}
#
JohnT> > # ...I had previously created a sparse matrix in triplet form:
  JohnT> > str(x)
  JohnT> Formal class 'dgTMatrix' [package "Matrix"] with 6 slots
  JohnT>   ..@ i       : int [1:923636] 1 2 3 4 5 6 7 8 9 10 ...
  JohnT>   ..@ j       : int [1:923636] 1 1 1 1 1 1 1 1 1 1 ...
  JohnT>   ..@ Dim     : int [1:2] 600 4482
  JohnT>   ..@ Dimnames:List of 2
  JohnT>   .. ..$ : chr [1:601] "50" "51" "52" "53" ...
  JohnT>   .. ..$ : chr [1:4482] "1" "2" "3" "4" ...
  JohnT>   ..@ x       : num [1:923636] 50.2 51.2 52.2 53.2 54.2 ...
  JohnT>   ..@ factors : list()
  JohnT> >
  JohnT> > # While subsetting x, I was surprised to get this warning: 
  JohnT> > y<-x[1:300,]
  JohnT> Warning message:
  JohnT> number of items to replace is not a multiple of replacement length

and later

    JohnT> Sorry, I omitted background information:
    JohnT> R version: 2.3.0
    JohnT> OS: Windows XP
    JohnT> CPU:  Pentium III, 
    JohnT> RAM:  768 MB

You omitted the most pertinent information: The version of
'Matrix' you are using.
The latest released version of Matrix does *not* show the
behavior you mentioned.
{So I have now spent 20 minutes just because you did not update 'Matrix'..}

    JohnT> Also, what command-line memory settings might prevent R from crashing
    JohnT> while using the Matrix package to convert my 600 X 4482 dgTMatrix to the
    JohnT> dgCMatrix class or to an expanded Matrix, via the as() function? I can
    JohnT> do this with half of the matrix, 300 x 4482.

It's hard to believe that you get a "crash" when coercing to
'dgC' -- but of course this really depends how much memory you
have already goggled up by other large objects in your R
workspace, or by other applications running at the same time in
Windows.  Coercing to a full matrix will of course require 
8 * 601 * 4482 = 21549456 extra bytes just for the numbers.
That's only 21.5 Megabytes, so I wonder..

I have never seen R crashes from using 'Matrix', but then I
work with an operating system, not with M$ Windows.  
Maybe you meant you got an error message "... memory allocation .."?

    JohnT> Thanks,
    JohnT> John T
#
Martin Maechler replied to my query "Warning while subsetting...":

MartinM> >>>>> "JohnT" == Thaden, John J <ThadenJohnJ at uams.edu>
MartinM> >>>>>     on Thu, 6 Jul 2006 00:02:10 -0500 writes:

    JohnT> ...
    JohnT> > # While subsetting x, I was surprised to get this warning: 
    JohnT> > y<-x[1:300,]
    JohnT> Warning message:
    JohnT> number of items to replace is not a multiple of replacement
length

  MartinM> and later

    JohnT> Sorry, I omitted background information:
    JohnT> R version: 2.3.0
    JohnT> OS: Windows XP
    JohnT> CPU:  Pentium III, 
    JohnT> RAM:  768 MB

  MartinM> You omitted the most pertinent information: The 
  MartinM> version of 'Matrix' you are using.
  MartinM> The latest released version of Matrix does
  MartinM> *not* show the behavior you mentioned. {So I have 
  MartinM> now spent 20 minutes just because you did not 
  MartinM> update 'Matrix'..}

The Matrix package was version 0.995-10, now is 0.995-11. 
The R base was version 2.3.0, now is 2.3.1. 
Subsetting 'y <- x[1:300,]' now works. Please accept my apology.

    JohnT> Also, what command-line memory settings might prevent
    JohnT> R from crashing while using the Matrix package to 
    JohnT> convert my 600 X 4482 dgTMatrix to the dgCMatrix class
    JohnT> or to an expanded Matrix, via the as() function? I can
    JohnT> do this with half of the matrix, 300 x 4482.

  MartinM> It's hard to believe that you get a "crash" 
  MartinM> when coercing to 'dgC' -- but of course this 
  MartinM> really depends how much memory you have already
  MartinM> goggled up by other large objects in your R
  MartinM> workspace, or by other applications running at
  MartinM> the same time in Windows.  Coercing to a full 
  MartinM> matrix will of course require 8 * 601 * 4482 = 
  MartinM> 21549456 extra bytes just for the numbers.
  MartinM> That's only 21.5 Megabytes, so I wonder..
  MartinM>
  MartinM> I have never seen R crashes from using 'Matrix', 
  MartinM> but then I work with an operating system, not 
  MartinM> with M$ Windows. 
  MartinM>
  MartinM> Maybe you meant you got an error message 
  MartinM> "... memory allocation .."?

Testing again, I closed all applications; disabled antivirus; 
opened RGui; removed all R objects but 'x' (a 600x4482 dgTMatrix); 
opened WinXP's 'Task Manager'; saw only "Rgui" under 
'Applications'; saw processes using a total of 287 MB of memory
under 'Processes'; closed 'Task Manager'; and typed R commands:
[1] "x"
Formal class 'dgTMatrix' [package "Matrix"] with 6 slots
  ..@ i       : int [1:923636] 1 2 3 4 5 6 7 8 9 10 ...
  ..@ j       : int [1:923636] 1 1 1 1 1 1 1 1 1 1 ...
  ..@ Dim     : int [1:2] 600 4482
  ..@ Dimnames:List of 2
  .. ..$ : chr [1:601] "50" "51" "52" "53" ...
  .. ..$ : chr [1:4482] "1" "2" "3" "4" ...
  ..@ x       : num [1:923636] 50.2 51.2 52.2 53.2 54.2 ...
  ..@ factors : list()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells  183529  5.0     407500 10.9   350000  9.4
Vcells 1928101 14.8    2286173 17.5  1928652 14.8
Loading required package: lattice
used (Mb) gc trigger (Mb) max used (Mb)
Ncells  627772 16.8    1073225 28.7  1073225 28.7
Vcells 2165773 16.6    3345184 25.6  2332013 17.8
[1] ".GlobalEnv" "package:Matrix" "package:lattice"
 [4] "package:methods" "package:stats" "package:graphics"  
 [7] "package:grDevices" "package:utils" "package:datasets"
 [10] "Autoloads"  "package:base"
After ~10 seconds, R blinks off and a WinXP dialog appears: 

  R for Windows GUI front-end has encountered 
  a problem and needs to close.  We are sorry
  for the inconvenience....Error signature:
  AppName: rgui.exe  AppVer: 2.31.38247.0  
  ModName: matrix.dll Offset: 0000ff31....
  Report error?

Sincerely,
John Thaden

Confidentiality Notice: This e-mail message, including any a...{{dropped}}
#
JohnT> Martin Maechler replied to my query "Warning while subsetting...":
    MartinM> >>>>> "JohnT" == Thaden, John J <ThadenJohnJ at uams.edu>
    MartinM> >>>>>     on Thu, 6 Jul 2006 00:02:10 -0500 writes:

    JohnT> ...
    JohnT> > # While subsetting x, I was surprised to get this warning: 
    JohnT> > y<-x[1:300,]
    JohnT> Warning message:
    JohnT> number of items to replace is not a multiple of
    JohnT> replacement length

    MartinM> and later

    JohnT> Sorry, I omitted background information:
    JohnT> R version: 2.3.0
    JohnT> OS: Windows XP
    JohnT> CPU:  Pentium III, 
    JohnT> RAM:  768 MB

    MartinM> You omitted the most pertinent information: The 
    MartinM> version of 'Matrix' you are using.
    MartinM> The latest released version of Matrix does
    MartinM> *not* show the behavior you mentioned. {So I have 
    MartinM> now spent 20 minutes just because you did not 
    MartinM> update 'Matrix'..}

    JohnT> The Matrix package was version 0.995-10, now is 0.995-11. 
    JohnT> The R base was version 2.3.0, now is 2.3.1. 
    JohnT> Subsetting 'y <- x[1:300,]' now works. Please accept my apology.

    JohnT> Also, what command-line memory settings might prevent
    JohnT> R from crashing while using the Matrix package to 
    JohnT> convert my 600 X 4482 dgTMatrix to the dgCMatrix class
    JohnT> or to an expanded Matrix, via the as() function? I can
    JohnT> do this with half of the matrix, 300 x 4482.

    MartinM> It's hard to believe that you get a "crash" 
    MartinM> when coercing to 'dgC' -- but of course this 
    MartinM> really depends how much memory you have already
    MartinM> goggled up by other large objects in your R
    MartinM> workspace, or by other applications running at
    MartinM> the same time in Windows.  Coercing to a full 
    MartinM> matrix will of course require 8 * 601 * 4482 = 
    MartinM> 21549456 extra bytes just for the numbers.
    MartinM> That's only 21.5 Megabytes, so I wonder..
    MartinM> 
    MartinM> I have never seen R crashes from using 'Matrix', 

 (actually that's not even true; at some point in time we had a
  bug in 'Matrix' which lead to spurious segmentation faults)

    MartinM> but then I work with an operating system, not 
    MartinM> with M$ Windows. 
    MartinM> 
    MartinM> Maybe you meant you got an error message 
    MartinM> "... memory allocation .."?

    JohnT> Testing again, I closed all applications; disabled antivirus; 
    JohnT> opened RGui; removed all R objects but 'x' (a 600x4482 dgTMatrix); 
    JohnT> opened WinXP's 'Task Manager'; saw only "Rgui" under 
    JohnT> 'Applications'; saw processes using a total of 287 MB of memory
    JohnT> under 'Processes'; closed 'Task Manager'; and typed R commands:

    >> # Steps leading to an R crash...
    >> ls()
    JohnT> [1] "x"
    >> str(x)
    JohnT> Formal class 'dgTMatrix' [package "Matrix"] with 6 slots
    JohnT> ..@ i       : int [1:923636] 1 2 3 4 5 6 7 8 9 10 ...
    JohnT> ..@ j       : int [1:923636] 1 1 1 1 1 1 1 1 1 1 ...
    JohnT> ..@ Dim     : int [1:2] 600 4482
    JohnT> ..@ Dimnames:List of 2
    JohnT> .. ..$ : chr [1:601] "50" "51" "52" "53" ...
    JohnT> .. ..$ : chr [1:4482] "1" "2" "3" "4" ...
    JohnT> ..@ x       : num [1:923636] 50.2 51.2 52.2 53.2 54.2 ...
    JohnT> ..@ factors : list()
    >> gc()
    JohnT> used (Mb) gc trigger (Mb) max used (Mb)
    JohnT> Ncells  183529  5.0     407500 10.9   350000  9.4
    JohnT> Vcells 1928101 14.8    2286173 17.5  1928652 14.8
    >> library(Matrix)
    JohnT> Loading required package: lattice
    >> gc()
    JohnT> used (Mb) gc trigger (Mb) max used (Mb)
    JohnT> Ncells  627772 16.8    1073225 28.7  1073225 28.7
    JohnT> Vcells 2165773 16.6    3345184 25.6  2332013 17.8
    >> search()
    JohnT> [1] ".GlobalEnv" "package:Matrix" "package:lattice"
    JohnT> [4] "package:methods" "package:stats" "package:graphics"  
    JohnT> [7] "package:grDevices" "package:utils" "package:datasets"
    JohnT> [10] "Autoloads"  "package:base"     
    >> #Now the line that causes crashes...
    >> y <- as(x,"dgCMatrix")

    JohnT> After ~10 seconds, R blinks off and a WinXP dialog appears: 

    JohnT> R for Windows GUI front-end has encountered 
    JohnT> a problem and needs to close.  We are sorry
    JohnT> for the inconvenience....Error signature:
    JohnT> AppName: rgui.exe  AppVer: 2.31.38247.0  
    JohnT> ModName: matrix.dll Offset: 0000ff31....
    JohnT> Report error?

Thanks a lot, John, for the more detailed report.
I do wonder how it happens, since the memory allocation is not
really big.   E.g., I can easily solve ``your'' (well, a
simulated version of it) problem on a machine with only 512 MB
RAM:

  library("Matrix")

  ## MM: construct a matrix *as* John's :
  d <- as.integer(c(600,4482))
  n0 <- 923636
  set.seed(1)
  M <- new("dgTMatrix", Dim = d,
	   i = sort(sample(0:(d[1]-1), size = n0, replace = TRUE)),
	   j = sample(0:(d[2]-1), size = n0, replace = TRUE),
	   x = round(rnorm(n0, m = 50, sd = 10), 1))
  dimnames(M) <- list(paste("r", 1:d[1], sep=''),
		      paste("C", 1:d[2], sep=''))
  str(M)

  M1.10 <- M[1:10,] # gave warning in earlier versions of 'Matrix'

  ## on 'nanny' which has just 512 MB  (with other processes active, etc):
  gc()
  ##           used (Mb) gc trigger (Mb) max used (Mb)
  ## Ncells  642690 17.2    1073225 28.7  1073225 28.7
  ## Vcells 3136547 24.0    8305047 63.4  7988501 61.0

  mC <- as(M, "dgCMatrix")
  ##           ---------
  gc()
  ##           used (Mb) gc trigger (Mb) max used (Mb)
  ## Ncells  642721 17.2    1073225 28.7  1073225 28.7
  ## Vcells 4311327 32.9    8305047 63.4  7988501 61.0

  ## well, this will need a bit more memory, but should still work:
  mm <- as(M, "matrix")
  ##           -------
  gc()
  ##-           used (Mb) gc trigger (Mb) max used (Mb)
  ##- Ncells  642725 17.2    1073225 28.7  1073225 28.7
  ##- Vcells 7000528 53.5    8438708 64.4  7988501 61.0


I see in the CHANGES file for {R for Windows}
So, maybe you can try to even run "R 2.3.1 patched" for Windows,
which you can get from here,
      http://cran.us.r-project.org/bin/windows/base/rpatched.html
and see if your crashes go away ?

Regards,
Martin
1 day later
#
With thanks to Matrix package co-author Martin 
Maechler, I'm happy to report satisfactory closure 
of two recent threads I initiated about that package: 
   -  Warning while subsetting with Matrix
   -  R Crash with 'library(Matrix);as(x,"dgCMatrix")
   
In the first, I reported seeing a warning message after
selecting a subset of a matrix 'M' of class "dgTMatrix"
via the first but not the second of these commands:
 
  M10 <- M[1:10,]
  M10 <- M[1:10,1:10]

Acting on Martin's advice, I upgraded Matrix version 
0.995-10 to version 0.995-11.  This eliminated the
warning message.

In the second, I reported that class-conversion of 
a self-constructed "dgTMatrix" of dimension 600 X 4482
caused R to crash when done via either of the first two
but neither of the last two of these commands:

  Mc <- as(M, "dgCMatrix")
  Mm <- as(M, "matrix")
  Mc <- rbind(as(M[1:300,], "dgCMatrix"), as(M[301:600], "dgCMatrix")
  Mm <- rbind(as(M[1:300,], "matrix"), as(M[301:600], "matrix")

I surmised this was a memory issue but Martin pointed out 
the memory demand to hold even the expanded sparse matrix
is actually rather small (8 x 600 x 4482 = 21549456 bytes
= 21.5 Megabytes).
 
For obvious bandwidth reasons, I did not post my large 
matrix, so Martin created one of his own.  Neither he 
nor I could reproduce crashes with his matrix. Offlist,
I sent him my matrix. Upon class-conversion, it crashed 
his machine too.

Upon inspection, Martin discovered that my matrix did not
conform to the dgTMatrix class as defined in Matrix. In 
particular, my slots M at i and M at j were integer vectors 
beginning with ones, i.e., they respectively indexed the
first row and column of the matrix as "1", not as "0".  

However, correcting this did not stop crashes, until I also
realized that M at i had 601 (not 600) unique integer values, 
and corrected my Dims slot (M at Dims) to be c(601,4482).  This 
stopped the crashes.

Once created, Matrix objects behave like matrix objects in
the sense that "one-based" indexing is used to subset them
or re-assign values to them.  But for users who create 
them, especially using the new() function, it is helpful
to remind oneself that Matrix objects are created and 
stored using "zero-based" indexing.  This fact is
mentioned only obliquely in the current version's PDF file
and R-accessible documentation.  Hopefully that will change.

Again, my thanks to Martin. For me this experience confirms
The vigor of open-source, forum-supported projects.

-John Thaden, Ph.D.
Research Assistant Professor of Geriatrics
University of Arkansas for Medical Sciences
Little Rock, AR 72205
USA

Confidentiality Notice: This e-mail message, including any a...{{dropped}}