Skip to content

"LAPACK routine DGESDD gave error code -12" with Debian (PR#2822)

7 messages · Ramon Diaz-Uriarte, Kurt Hornik, Dirk Eddelbuettel +3 more

#
Dear All,

Under Debian GNU/Linux La.svd (with method = "dgesdd") sometimes gives the 
error

"Error in La.svd(data, nu = 0, nv = min(nrow, ncol), method = "dgesdd") : 
	LAPACK routine DGESDD gave error code -12"

It seems not to depend on the data per se, but on the relationship between 
numbers of rows and columns. 

For example, if the number of columns is 100, La.svd will fail when the number 
of rows is 56, but not if it is 55 or 57. It will not fail if we use 
"dgesvd". If the number of columns is 51, La.svd fails when the number of 
rows is between 29 and 50 if we use "dgesdd".

This happens if I use the latest deb packages (and thus ATLAS, etc). It does 
not happen if I build R in this same machine with "--without-blas" (where 
make check reports no errors). In case it matters, the bug does not show up 
in a different machine with Windwos 2000 and the Rblas.dll linked against 
ATLAS provided in http://cran.r-project.org/bin/windows/contrib/ATLAS/P4).

I understand this is probably related to the issues mentioned in R-admin about 
LAPACK 3.0 and some of the issues recently discussed in this list by M. 
Burger, D. Bates and D. Eddelbuettel. Are there any workarounds (besides not 
using ATLAS at all?).


Ramón

********************************
An example of failure:
nv = min(nrow, ncol), method = "dgesdd") ## error
nv = min(nrow, ncol), method = "dgesvd") ## OK
*************************
_                
platform i386-pc-linux-gnu
arch     i386             
os       linux-gnu        
system   i386, linux-gnu  
status                    
major    1                
minor    7.0              
year     2003             
month    04               
day      16               
language R        
*****************************
_  
platform i686-pc-linux-gnu
arch     i686             
os       linux-gnu        
system   i686, linux-gnu  
status                    
major    1                
minor    7.0              
year     2003             
month    04               
day      16               
language R   


-----
Ramón Díaz-Uriarte
Bioinformatics Unit
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://bioinfo.cnio.es/~rdiaz
#
Confirmed on Debian GNU/Linux testing with atlas2-base-dev for both the
current 1.7.0 debs and 1.8.0 built from scratch using --with-lapack:

hornik@mithrandir:~/tmp$ ldd /usr/local/lib/R/modules/lapack.so 
        libR.so => not found
        liblapack.so.2 => /usr/lib/atlas/liblapack.so.2 (0x40013000)

-k
#
On Tue, Apr 22, 2003 at 08:07:36PM +0200, rdiaz@cnio.es wrote:
We should probably talk to Camm, the Atlas maintainer. Note how on recent
upgrades he inserted the note (cf /var/lib/dpkg/info/atlas2-3dnow.templates
on my Athlon system) via debconf:

   Template: atlas2-3dnow/3dnow_warning
   Type: note
   Description: 3dnow arithmetic is not IEEE compliant
    Please note that 3dnow arithmetic does not furnish several results
    required by the IEEE standard, and may therefore cause errors in code
    which needs to trap NaN and Inf results, for example.  The
    atlas2-3dnow binaries make heavy use of the 3dnow extensions.
    Please see the accompanying file /usr/share/doc/atlas2-3dnow/3DNow.txt
    for details.
      
I know Camm is on a sabbatical but will CC him nonetheless.

Dirk

  
    
#
Greetings!

1) This should have nothing to do with atlas, as atlas does not tune
   this routine, meaning you are using the stock routine from lapack.

nm --dynamic /usr/lib/liblapack_atlas.so.2.3 |grep dge
00003810 T ATL_dgetrf
00003590 T ATL_dgetrfC
00003870 T ATL_dgetrfR
00003ae0 T ATL_dgetrs
000047f0 T atl_f77wrap_dgesv__
000048b0 T atl_f77wrap_dgetnb__
000048e0 T atl_f77wrap_dgetrf__
00004970 T atl_f77wrap_dgetrs__
         U cblas_dgemm
00007c68 T clapack_dgesv
00007da0 T clapack_dgetrf
00007e90 T clapack_dgetrs
00009430 T dgesv_
00009500 T dgetrf_
000095b0 T dgetrs_
intech19:/fix/t2/camm/gcl-2.5.2$ 

   You can check this by verifying that the difficulty persists if you
   set the LD_LIBRARY_PATH environment variable to /usr/lib

2) Given the error code, and the scaling behavior with matrix size,
   I'd say the lwork parameter (size of the work array) passed to
   dgesdd is not always large enough, i.e. is not scaling properly
   with n,m.  Please see 'man dgesdd' for interpretations of the error
   code.  It is the responsibility of the calling routine to allocate
   and pass the work array to dgesdd.  With most lapack routines, one
   can make a 'workspace query' call first by setting lwork to -1, or
   some such.  check the man page for details.  This of course would
   have to be done with each change in n,m.  Alternatively, you could
   take the minimum workspace requirements from the manpage.  

   lapack is the relevant lib, so I don't know what --without-blas is
   supposed to do.  And working under windows, while nice, doesn't
   exactly inspire confidence :-).

I am in general away from email until 6/1, so correspondence will be
spotty.

Take care,

Dirk Eddelbuettel <edd@debian.org> writes:

  
    
#
Camm Maguire <camm@enhanced.com> writes:
Right, but that's actually what we do, use the workspace query. It's
all very weird, because the -12 value indicates that the lwork
parameter is wrong, but it is computed from an exactly identical call,
except lwork=-l:

        lwork = -1;

        F77_CALL(dgesdd)(CHAR(STRING_ELT(jobu, 0)),
                         &n, &p, xvals, &n, REAL(s),
                         REAL(u), &ldu,
                         REAL(v), &ldvt,
                         &tmp, &lwork, iwork, &info);
        lwork = (int) tmp;

        work = (double *) R_alloc(lwork, sizeof(double));
        F77_CALL(dgesdd)(CHAR(STRING_ELT(jobu, 0)),
                         &n, &p, xvals, &n, REAL(s),
                         REAL(u), &ldu,
                         REAL(v), &ldvt,
                         work, &lwork, iwork, &info);


Also, this must be happening in the early parts of DGESDD which seem
to be all integer storage size calculations and so shouldn't need the
BLAS. Nevertheless people are seeing different behaviour when linking
against different BLAS libraries.
 
There are a lot of calls similar to this one, though:

      WRKBL = M + M*ILAENV( 1, 'DGELQF', ' ', M, N, -1, -1 )

so whether or not the BLAS is being used is hard to tell precisely.
--without-blas means to use generic blas routines in the R
sources?rather than any (tuned) system set.
#
Peter Dalgaard BSA <p.dalgaard@biostat.ku.dk> writes:
It looks like a problem in ilaenv, the Lapack routine that returns the
tuning parameters, like the optimal temporary storage size, for
various Lapack routines.  The value returned in tmp will depend upon
the results of several calls to ilaenv.  These results can vary between
different implementations of the blas (or atlas).
#
Greetings!

Douglas Bates <bates@stat.wisc.edu> writes:
Just a note that you should check the returned info value on the
workspace call to make sure tmp has been filled in.  (the man page
says this is done only if the other parameters are valid.)

Does this problem vary with blas, and if so, how?  You can run under
whatever blas you want, including reference, via the LD_LIBRARY_PATH
variable.  Asuming you've installed all the i386 atlas versions and the blas
package:

LD_LIBRARY_PATH     used blas:

/usr/lib                reference
/usr/lib/atlas          base vanilla i386 atlas
/usr/lib/{sse,sse2,3dnow}/atlas    atlas with ISA extensions


ilaenv might possibly be an issue, but only realistically if the
problem blas is coming from the 3dnow atlas.  When I put together the
lapack package, I read in the lapack notes how many of the ilaenv
constants can be hardwired, saving a certain amount of time on the
first call.  I chose not to do this only because of the existence of
the 3dnow, non-ieee compliant blas option at runtime, as one of the
parameters pertains directly to ieee.  So ilaenv is calculating its
parameters n first call at runtime on Debian.  I'm dubious as to the
relevance of this, though, as this should only kick in for single
precision if at all, and should not affect integer values in any case.

Take care,