Hi R Developers,
Greg is helping me with debugging R on Solaris 10 x64. Please let us
know if you have any thoughts or tips that can help us debug this.
Thanks,
David
************
Using default transfer plist
in vector_io: permuting
About to write
*** caught segfault ***
address e8554000, cause 'memory not mapped'
Traceback:
1: .External("do_hdf5save", call, sys.frame(sys.parent()), fileout,
..., PACKAGE = "hdf5")
2: hdf5save(hdf5_Fstat, "Fstat", "geneNames", "genotype")
aborting ...
************
We've tried many things to debug it:
* dbx Runtime Checking (RTC) is not detecting any (meaningful) memory
access problems that I can see.
* The same on Solaris/SPARC.
* Neither does Valgrind on Linux.
* I've tried increasing the C stack size, assuming R could be running
out of stack size. Didn't help.
Running R under dbx (without RTC) until the crash shows this:
...
About to write
t at 1 (l at 1) signal SEGV (no mapping at the fault address) in _memcpy at
0xfe90444b
0xfe90444b: _memcpy+0x006b: movaps 0x00000000(%esi),%xmm0
Current function is H5D_select_mgath
379 HDmemcpy(tgath_buf,buf+off[curr_seq],curr_len);
(dbx) where
current thread: t at 1
[1] _memcpy(0x0, 0xfdebc707, 0x9f5c4f0), at 0xfe90444b
=>[2] H5D_select_mgath(_buf = 0x9f79580, space = 0x8966770, iter =
0x8045980, nelmts = 3120U, dxpl_cache = 0xfe170078, _tgath_buf =
0x9f5c4f0), line 379 in "H5Dselect.c"
[3] H5D_contig_write(io_info = 0x804620c, nelmts = 3120ULL, mem_type =
0x97b05c8, mem_space = 0x8966770, file_space = 0x8966770, tpath =
0x8ee7078, src_id = 201326906, dst_id = 201326904, buf = 0x9f79580),
line 1418 in "H5Dio.c"
[4] H5D_write(dataset = 0x8f169c0, mem_type_id = 201326906, mem_space
= 0x8966770, file_space = 0x8966770, dxpl_id = 671088643, buf =
0x9f79580), line 952 in "H5Dio.c"
[5] H5Dwrite(dset_id = 335544330, mem_type_id = 201326906,
mem_space_id = 0, file_space_id = 0, plist_id = 671088643, buf =
0x9f79580), line 586 in "H5Dio.c"
[6] vector_io(call = 0x97234ec, writeflag = 1, dataset = 335544330,
space = 268435472, obj = 0x98386a0), line 535 in "hdf5.c"
[7] hdf5_write_vector(call = 0x97234ec, id = 67108867, symname =
0x9cf35d0 "geneNames", val = 0x98386a0), line 693 in "hdf5.c"
[8] hdf5_save_object(call = 0x97234ec, fid = 67108867, symname =
0x9cf35d0 "geneNames", val = 0x98386a0), line 957 in "hdf5.c"
[9] do_hdf5save(args = 0x9723284), line 1104 in "hdf5.c"
[10] do_External(call = 0x86d62bc, op = 0x8371cd8, args = 0x972340c,
env = 0x9723594), line 832 in "dotcode.c"
[11] Rf_eval(e = 0x86d62bc, rho = 0x9723594), line 445 in "eval.c"
[12] Rf_evalList(el = 0x86d6230, rho = 0x9723594, op = 0x837226c),
line 1463 in "eval.c"
[13] Rf_eval(e = 0x86d6214, rho = 0x9723594), line 438 in "eval.c"
[14] do_begin(call = 0x86d56bc, op = 0x836709c, args = 0x86d61dc, rho
= 0x9723594), line 1107 in "eval.c"
[15] Rf_eval(e = 0x86d56bc, rho = 0x9723594), line 431 in "eval.c"
[16] Rf_applyClosure(call = 0x9723738, op = 0x83c0328, arglist =
0x97236e4, rho = 0x8379b1c, suppliedenv = 0x8379b38), line 614 in "eval.c"
[17] Rf_eval(e = 0x9723738, rho = 0x8379b1c), line 455 in "eval.c"
[18] Rf_ReplIteration(rho = 0x8379b1c, savestack = 0, browselevel = 0,
state = 0x8047328), line 256 in "main.c"
[19] R_ReplConsole(rho = 0x8379b1c, savestack = 0, browselevel = 0),
line 305 in "main.c"
[20] run_Rmainloop(), line 944 in "main.c"
[21] Rf_mainloop(), line 951 in "main.c"
[22] main(ac = 4, av = 0x80477ac), line 33 in "Rmain.c"
(dbx) p curr_len
curr_len = 24960U
(dbx) p curr_seq
curr_seq = 0
(dbx) p of
dbx: "of" is not defined in the scope
`libhdf5.so.0.0.0`H5Dselect.c`H5D_select_mgath:347`
dbx: see `help scope' for details
(dbx) p off
off = 0x8042960
(dbx) p tgath_buf
tgath_buf = 0x9f5c4f0
"\xd87\x83^H\xa8\xf3\x82^H0^X\x82^H^X\xd4\x81^H^P\x90\x81^H\xb8m\x80^H^H'\x80^H\x88^?^?^H\x908^?^H\xb0\xf7~^H\xd8\xad~^H\xf8\xb2~^H\xb8\x8e~^H\xe8]~^H\xe8\xcb\xed^HP\xe3}^Hh\xdd\xbb^H\x98\xc4}^H\xf0\xa0}^H\xa8r}^HH}\xc3^HpO|^HH^V|^H^X\xd8|^H\xc0\xb1|^H8=}^H\x90\xcd{^H^Pm{^H\xb8#{^Hx'{^H\x90\xf8x^HpKx^H^POx^H\xa8~w^H^H>w^H\xf0\xb2w^H\xc8^Ew^HX'x^H\xf8\xdbv^H"
(dbx) p buf
buf = 0x9f79580
"\xd87\x83^H\xa8\xf3\x82^H0^X\x82^H^X\xd4\x81^H^P\x90\x81^H\xb8m\x80^H^H'\x80^H\x88^?^?^H\x908^?^H\xb0\xf7~^H\xd8\xad~^H\xf8\xb2~^H\xb8\x8e~^H\xe8]~^H\xe8\xcb\xed^HP\xe3}^Hh\xdd\xbb^H\x98\xc4}^H\xf0\xa0}^H\xa8r}^HH}\xc3^HpO|^HH^V|^H^X\xd8|^H\xc0\xb1|^H8=}^H\x90\xcd{^H^Pm{^H\xb8#{^Hx'{^H\x90\xf8x^HpKx^H^POx^H\xa8~w^H^H>w^H\xf0\xb2w^H\xc8^Ew^HX'x^H\xf8\xdbv^H"
(dbx) p nseq
nseq = 1U
(dbx) p len
len = 0x804195c
(dbx) p len[0..2]
len[0..2] =
[0] = 24960U
[1] = 140025512U
[2] = 140013048U
(dbx)
The R code in question is:
...
/* Loop, while sequences left to process */
for(curr_seq=0; curr_seq<nseq; curr_seq++) {
/* Get the number of bytes in sequence */
curr_len=len[curr_seq];
HDmemcpy(tgath_buf,buf+off[curr_seq],curr_len);
/* Advance offset in gather buffer */
tgath_buf+=curr_len;
} /* end for */
...
where
./src/hdf5-1.6.5/src/H5private.h:
#define HDmemcpy(X,Y,Z) memcpy((char*)(X),(const char*)(Y),Z)
Maybe the "curr_len = 24960U" value is too high. I have no way of
knowing what it should be in this case.
The crash could be caused by a compiler bug, although it's not very
likely. These crashes have occurred both with and without optimization,
with and without -g.
R on Solaris 10 x64
8 messages · Tai-Wei (David) Lin, Brian Ripley, Peter Dalgaard +3 more
What did the maintainer of this unmentioned contributed package (hdf5) say when you ask him? [Hint: you *have* read the posting guide at http://www.r-project.org/posting-guide.html and done as it asks?] There is no evidence here that this is anything to do with R itself.
On Fri, 13 Apr 2007, Tai-Wei (David) Lin wrote:
Hi R Developers,
Greg is helping me with debugging R on Solaris 10 x64. Please let us
know if you have any thoughts or tips that can help us debug this.
Thanks,
David
************
Using default transfer plist
in vector_io: permuting
About to write
*** caught segfault ***
address e8554000, cause 'memory not mapped'
Traceback:
1: .External("do_hdf5save", call, sys.frame(sys.parent()), fileout,
..., PACKAGE = "hdf5")
2: hdf5save(hdf5_Fstat, "Fstat", "geneNames", "genotype")
aborting ...
************
We've tried many things to debug it:
* dbx Runtime Checking (RTC) is not detecting any (meaningful) memory
access problems that I can see.
* The same on Solaris/SPARC.
* Neither does Valgrind on Linux.
* I've tried increasing the C stack size, assuming R could be running
out of stack size. Didn't help.
Running R under dbx (without RTC) until the crash shows this:
...
About to write
t at 1 (l at 1) signal SEGV (no mapping at the fault address) in _memcpy at
0xfe90444b
0xfe90444b: _memcpy+0x006b: movaps 0x00000000(%esi),%xmm0
Current function is H5D_select_mgath
379 HDmemcpy(tgath_buf,buf+off[curr_seq],curr_len);
(dbx) where
current thread: t at 1
[1] _memcpy(0x0, 0xfdebc707, 0x9f5c4f0), at 0xfe90444b
=>[2] H5D_select_mgath(_buf = 0x9f79580, space = 0x8966770, iter =
0x8045980, nelmts = 3120U, dxpl_cache = 0xfe170078, _tgath_buf =
0x9f5c4f0), line 379 in "H5Dselect.c"
[3] H5D_contig_write(io_info = 0x804620c, nelmts = 3120ULL, mem_type =
0x97b05c8, mem_space = 0x8966770, file_space = 0x8966770, tpath =
0x8ee7078, src_id = 201326906, dst_id = 201326904, buf = 0x9f79580),
line 1418 in "H5Dio.c"
[4] H5D_write(dataset = 0x8f169c0, mem_type_id = 201326906, mem_space
= 0x8966770, file_space = 0x8966770, dxpl_id = 671088643, buf =
0x9f79580), line 952 in "H5Dio.c"
[5] H5Dwrite(dset_id = 335544330, mem_type_id = 201326906,
mem_space_id = 0, file_space_id = 0, plist_id = 671088643, buf =
0x9f79580), line 586 in "H5Dio.c"
[6] vector_io(call = 0x97234ec, writeflag = 1, dataset = 335544330,
space = 268435472, obj = 0x98386a0), line 535 in "hdf5.c"
[7] hdf5_write_vector(call = 0x97234ec, id = 67108867, symname =
0x9cf35d0 "geneNames", val = 0x98386a0), line 693 in "hdf5.c"
[8] hdf5_save_object(call = 0x97234ec, fid = 67108867, symname =
0x9cf35d0 "geneNames", val = 0x98386a0), line 957 in "hdf5.c"
[9] do_hdf5save(args = 0x9723284), line 1104 in "hdf5.c"
[10] do_External(call = 0x86d62bc, op = 0x8371cd8, args = 0x972340c,
env = 0x9723594), line 832 in "dotcode.c"
[11] Rf_eval(e = 0x86d62bc, rho = 0x9723594), line 445 in "eval.c"
[12] Rf_evalList(el = 0x86d6230, rho = 0x9723594, op = 0x837226c),
line 1463 in "eval.c"
[13] Rf_eval(e = 0x86d6214, rho = 0x9723594), line 438 in "eval.c"
[14] do_begin(call = 0x86d56bc, op = 0x836709c, args = 0x86d61dc, rho
= 0x9723594), line 1107 in "eval.c"
[15] Rf_eval(e = 0x86d56bc, rho = 0x9723594), line 431 in "eval.c"
[16] Rf_applyClosure(call = 0x9723738, op = 0x83c0328, arglist =
0x97236e4, rho = 0x8379b1c, suppliedenv = 0x8379b38), line 614 in "eval.c"
[17] Rf_eval(e = 0x9723738, rho = 0x8379b1c), line 455 in "eval.c"
[18] Rf_ReplIteration(rho = 0x8379b1c, savestack = 0, browselevel = 0,
state = 0x8047328), line 256 in "main.c"
[19] R_ReplConsole(rho = 0x8379b1c, savestack = 0, browselevel = 0),
line 305 in "main.c"
[20] run_Rmainloop(), line 944 in "main.c"
[21] Rf_mainloop(), line 951 in "main.c"
[22] main(ac = 4, av = 0x80477ac), line 33 in "Rmain.c"
(dbx) p curr_len
curr_len = 24960U
(dbx) p curr_seq
curr_seq = 0
(dbx) p of
dbx: "of" is not defined in the scope
`libhdf5.so.0.0.0`H5Dselect.c`H5D_select_mgath:347`
dbx: see `help scope' for details
(dbx) p off
off = 0x8042960
(dbx) p tgath_buf
tgath_buf = 0x9f5c4f0
"\xd87\x83^H\xa8\xf3\x82^H0^X\x82^H^X\xd4\x81^H^P\x90\x81^H\xb8m\x80^H^H'\x80^H\x88^?^?^H\x908^?^H\xb0\xf7~^H\xd8\xad~^H\xf8\xb2~^H\xb8\x8e~^H\xe8]~^H\xe8\xcb\xed^HP\xe3}^Hh\xdd\xbb^H\x98\xc4}^H\xf0\xa0}^H\xa8r}^HH}\xc3^HpO|^HH^V|^H^X\xd8|^H\xc0\xb1|^H8=}^H\x90\xcd{^H^Pm{^H\xb8#{^Hx'{^H\x90\xf8x^HpKx^H^POx^H\xa8~w^H^H>w^H\xf0\xb2w^H\xc8^Ew^HX'x^H\xf8\xdbv^H"
(dbx) p buf
buf = 0x9f79580
"\xd87\x83^H\xa8\xf3\x82^H0^X\x82^H^X\xd4\x81^H^P\x90\x81^H\xb8m\x80^H^H'\x80^H\x88^?^?^H\x908^?^H\xb0\xf7~^H\xd8\xad~^H\xf8\xb2~^H\xb8\x8e~^H\xe8]~^H\xe8\xcb\xed^HP\xe3}^Hh\xdd\xbb^H\x98\xc4}^H\xf0\xa0}^H\xa8r}^HH}\xc3^HpO|^HH^V|^H^X\xd8|^H\xc0\xb1|^H8=}^H\x90\xcd{^H^Pm{^H\xb8#{^Hx'{^H\x90\xf8x^HpKx^H^POx^H\xa8~w^H^H>w^H\xf0\xb2w^H\xc8^Ew^HX'x^H\xf8\xdbv^H"
(dbx) p nseq
nseq = 1U
(dbx) p len
len = 0x804195c
(dbx) p len[0..2]
len[0..2] =
[0] = 24960U
[1] = 140025512U
[2] = 140013048U
(dbx)
The R code in question is:
...
/* Loop, while sequences left to process */
for(curr_seq=0; curr_seq<nseq; curr_seq++) {
/* Get the number of bytes in sequence */
curr_len=len[curr_seq];
HDmemcpy(tgath_buf,buf+off[curr_seq],curr_len);
/* Advance offset in gather buffer */
tgath_buf+=curr_len;
} /* end for */
...
where
./src/hdf5-1.6.5/src/H5private.h:
#define HDmemcpy(X,Y,Z) memcpy((char*)(X),(const char*)(Y),Z)
Maybe the "curr_len = 24960U" value is too high. I have no way of
knowing what it should be in this case.
The crash could be caused by a compiler bug, although it's not very
likely. These crashes have occurred both with and without optimization,
with and without -g.
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Prof Brian Ripley wrote:
What did the maintainer of this unmentioned contributed package (hdf5) say when you ask him? [Hint: you *have* read the posting guide at
http://www.r-project.org/posting-guide.html and done as it asks?] There is no evidence here that this is anything to do with R itself.
Er, Brian, I think you are misfiring this time. As I understand it, they (David/Greg) are well aware that they are debugging an issue in a contributed package which occurs with an uncommon compiler on a maybe not so uncommon platform. They are just looking for hints on how to proceed to pinpoint the issue. -------------- 2 things caught my eye (except that their "R code" is clearly C): The dbx output doesn't show off[curr_seq], which could actually be the culprit, and the _memcpy call on the stack looks odd: _memcpy(0x0, 0xfdebc707, 0x9f5c4f0) (What happened to the length and how did NULL get in there.) If memcpy() is a macro, I think I'd take a closer look at the include files and see if something is getting expanded in an unintended way.
On Fri, 13 Apr 2007, Tai-Wei (David) Lin wrote:
Hi R Developers,
Greg is helping me with debugging R on Solaris 10 x64. Please let us
know if you have any thoughts or tips that can help us debug this.
Thanks,
David
************
Using default transfer plist
in vector_io: permuting
About to write
*** caught segfault ***
address e8554000, cause 'memory not mapped'
Traceback:
1: .External("do_hdf5save", call, sys.frame(sys.parent()), fileout,
..., PACKAGE = "hdf5")
2: hdf5save(hdf5_Fstat, "Fstat", "geneNames", "genotype")
aborting ...
************
We've tried many things to debug it:
* dbx Runtime Checking (RTC) is not detecting any (meaningful) memory
access problems that I can see.
* The same on Solaris/SPARC.
* Neither does Valgrind on Linux.
* I've tried increasing the C stack size, assuming R could be running
out of stack size. Didn't help.
Running R under dbx (without RTC) until the crash shows this:
...
About to write
t at 1 (l at 1) signal SEGV (no mapping at the fault address) in _memcpy at
0xfe90444b
0xfe90444b: _memcpy+0x006b: movaps 0x00000000(%esi),%xmm0
Current function is H5D_select_mgath
379 HDmemcpy(tgath_buf,buf+off[curr_seq],curr_len);
(dbx) where
current thread: t at 1
[1] _memcpy(0x0, 0xfdebc707, 0x9f5c4f0), at 0xfe90444b
=>[2] H5D_select_mgath(_buf = 0x9f79580, space = 0x8966770, iter =
0x8045980, nelmts = 3120U, dxpl_cache = 0xfe170078, _tgath_buf =
0x9f5c4f0), line 379 in "H5Dselect.c"
[3] H5D_contig_write(io_info = 0x804620c, nelmts = 3120ULL, mem_type =
0x97b05c8, mem_space = 0x8966770, file_space = 0x8966770, tpath =
0x8ee7078, src_id = 201326906, dst_id = 201326904, buf = 0x9f79580),
line 1418 in "H5Dio.c"
[4] H5D_write(dataset = 0x8f169c0, mem_type_id = 201326906, mem_space
= 0x8966770, file_space = 0x8966770, dxpl_id = 671088643, buf =
0x9f79580), line 952 in "H5Dio.c"
[5] H5Dwrite(dset_id = 335544330, mem_type_id = 201326906,
mem_space_id = 0, file_space_id = 0, plist_id = 671088643, buf =
0x9f79580), line 586 in "H5Dio.c"
[6] vector_io(call = 0x97234ec, writeflag = 1, dataset = 335544330,
space = 268435472, obj = 0x98386a0), line 535 in "hdf5.c"
[7] hdf5_write_vector(call = 0x97234ec, id = 67108867, symname =
0x9cf35d0 "geneNames", val = 0x98386a0), line 693 in "hdf5.c"
[8] hdf5_save_object(call = 0x97234ec, fid = 67108867, symname =
0x9cf35d0 "geneNames", val = 0x98386a0), line 957 in "hdf5.c"
[9] do_hdf5save(args = 0x9723284), line 1104 in "hdf5.c"
[10] do_External(call = 0x86d62bc, op = 0x8371cd8, args = 0x972340c,
env = 0x9723594), line 832 in "dotcode.c"
[11] Rf_eval(e = 0x86d62bc, rho = 0x9723594), line 445 in "eval.c"
[12] Rf_evalList(el = 0x86d6230, rho = 0x9723594, op = 0x837226c),
line 1463 in "eval.c"
[13] Rf_eval(e = 0x86d6214, rho = 0x9723594), line 438 in "eval.c"
[14] do_begin(call = 0x86d56bc, op = 0x836709c, args = 0x86d61dc, rho
= 0x9723594), line 1107 in "eval.c"
[15] Rf_eval(e = 0x86d56bc, rho = 0x9723594), line 431 in "eval.c"
[16] Rf_applyClosure(call = 0x9723738, op = 0x83c0328, arglist =
0x97236e4, rho = 0x8379b1c, suppliedenv = 0x8379b38), line 614 in "eval.c"
[17] Rf_eval(e = 0x9723738, rho = 0x8379b1c), line 455 in "eval.c"
[18] Rf_ReplIteration(rho = 0x8379b1c, savestack = 0, browselevel = 0,
state = 0x8047328), line 256 in "main.c"
[19] R_ReplConsole(rho = 0x8379b1c, savestack = 0, browselevel = 0),
line 305 in "main.c"
[20] run_Rmainloop(), line 944 in "main.c"
[21] Rf_mainloop(), line 951 in "main.c"
[22] main(ac = 4, av = 0x80477ac), line 33 in "Rmain.c"
(dbx) p curr_len
curr_len = 24960U
(dbx) p curr_seq
curr_seq = 0
(dbx) p of
dbx: "of" is not defined in the scope
`libhdf5.so.0.0.0`H5Dselect.c`H5D_select_mgath:347`
dbx: see `help scope' for details
(dbx) p off
off = 0x8042960
(dbx) p tgath_buf
tgath_buf = 0x9f5c4f0
"\xd87\x83^H\xa8\xf3\x82^H0^X\x82^H^X\xd4\x81^H^P\x90\x81^H\xb8m\x80^H^H'\x80^H\x88^?^?^H\x908^?^H\xb0\xf7~^H\xd8\xad~^H\xf8\xb2~^H\xb8\x8e~^H\xe8]~^H\xe8\xcb\xed^HP\xe3}^Hh\xdd\xbb^H\x98\xc4}^H\xf0\xa0}^H\xa8r}^HH}\xc3^HpO|^HH^V|^H^X\xd8|^H\xc0\xb1|^H8=}^H\x90\xcd{^H^Pm{^H\xb8#{^Hx'{^H\x90\xf8x^HpKx^H^POx^H\xa8~w^H^H>w^H\xf0\xb2w^H\xc8^Ew^HX'x^H\xf8\xdbv^H"
(dbx) p buf
buf = 0x9f79580
"\xd87\x83^H\xa8\xf3\x82^H0^X\x82^H^X\xd4\x81^H^P\x90\x81^H\xb8m\x80^H^H'\x80^H\x88^?^?^H\x908^?^H\xb0\xf7~^H\xd8\xad~^H\xf8\xb2~^H\xb8\x8e~^H\xe8]~^H\xe8\xcb\xed^HP\xe3}^Hh\xdd\xbb^H\x98\xc4}^H\xf0\xa0}^H\xa8r}^HH}\xc3^HpO|^HH^V|^H^X\xd8|^H\xc0\xb1|^H8=}^H\x90\xcd{^H^Pm{^H\xb8#{^Hx'{^H\x90\xf8x^HpKx^H^POx^H\xa8~w^H^H>w^H\xf0\xb2w^H\xc8^Ew^HX'x^H\xf8\xdbv^H"
(dbx) p nseq
nseq = 1U
(dbx) p len
len = 0x804195c
(dbx) p len[0..2]
len[0..2] =
[0] = 24960U
[1] = 140025512U
[2] = 140013048U
(dbx)
The R code in question is:
...
/* Loop, while sequences left to process */
for(curr_seq=0; curr_seq<nseq; curr_seq++) {
/* Get the number of bytes in sequence */
curr_len=len[curr_seq];
HDmemcpy(tgath_buf,buf+off[curr_seq],curr_len);
/* Advance offset in gather buffer */
tgath_buf+=curr_len;
} /* end for */
...
where
./src/hdf5-1.6.5/src/H5private.h:
#define HDmemcpy(X,Y,Z) memcpy((char*)(X),(const char*)(Y),Z)
Maybe the "curr_len = 24960U" value is too high. I have no way of
knowing what it should be in this case.
The crash could be caused by a compiler bug, although it's not very
likely. These crashes have occurred both with and without optimization,
with and without -g.
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Peter, Thanks for the hints.
2 things caught my eye (except that their "R code" is clearly C): The dbx output doesn't show off[curr_seq], which could actually be the culprit,
(dbx) p off[curr_seq] off[curr_seq] = 0
and the _memcpy call on the stack looks odd: _memcpy(0x0, 0xfdebc707, 0x9f5c4f0)
It shows the register values, which are not necessarily the arguments to memcpy(), apparently not in this case. The "guilty" line of code HDmemcpy(tgath_buf,buf+off[curr_seq],curr_len); translates into (due to macro "#define HDmemcpy(X,Y,Z) memcpy((char*)(X),(const char*)(Y),Z)") memcpy((char*)(tgath_buf),(const char*)(buf+off[curr_seq]),curr_len); and (dbx) p tgath_buf tgath_buf = 0x9f5c4f0 (definitely not NULL) We've also asked for help at help at hdfgroup.org regarding the HDF5 package. This is under Solaris 10 x86, using the latest Sun Studio compiler/tools. -Greg
(What happened to the length and how did NULL get in there.) If memcpy() is a macro, I think I'd take a closer look at the include files and see if something is getting expanded in an unintended way.
[Peter Dalgaard]
Er, Brian, I think you are misfiring this time.
Firing is always misfiring, at least in that it lacks elegance. There are ways to speak without the heat, or otherwise, to merely stay silent. Yet this particular aspect of things improved a lot, lately, on the various R lists. Congratulations to all.
Fran?ois Pinard http://pinard.progiciels-bpi.ca
Greg Nakhimovsky wrote:
Peter, Thanks for the hints.
2 things caught my eye (except that their "R code" is clearly C): The dbx output doesn't show off[curr_seq], which could actually be the culprit,
(dbx) p off[curr_seq] off[curr_seq] = 0
and the _memcpy call on the stack looks odd: _memcpy(0x0, 0xfdebc707, 0x9f5c4f0)
It shows the register values, which are not necessarily the arguments to memcpy(), apparently not in this case. The "guilty" line of code HDmemcpy(tgath_buf,buf+off[curr_seq],curr_len); translates into (due to macro "#define HDmemcpy(X,Y,Z) memcpy((char*)(X),(const char*)(Y),Z)")
Hmm, and can you see "both ends" of tgath_buf and buf (in particular tgath_but[curr_len - 1] and buf[curr_len-1])? If only to "eliminate from our inquiries", I'd try running CPP over the code and see if the full macro expansion (including that from memcpy to _memcpy) is working as intended. I take it that you've already tried compiling the module at maximal warning level?
memcpy((char*)(tgath_buf),(const char*)(buf+off[curr_seq]),curr_len); and (dbx) p tgath_buf tgath_buf = 0x9f5c4f0 (definitely not NULL) We've also asked for help at help at hdfgroup.org regarding the HDF5 package. This is under Solaris 10 x86, using the latest Sun Studio compiler/tools. -Greg
(What happened to the length and how did NULL get in there.) If memcpy() is a macro, I think I'd take a closer look at the include files and see if something is getting expanded in an unintended way.
Hmm, and can you see "both ends" of tgath_buf and buf (in particular tgath_but[curr_len - 1] and buf[curr_len-1])?
(dbx) p tgath_buf[curr_len - 1] tgath_buf[curr_len-1] = '\0' (dbx) p buf[curr_len-1] dbx: cannot access address 0x9f7f6ff (dbx) p tgath_buf[curr_len - 2] tgath_buf[curr_len-2] = '\0' (dbx) p buf[curr_len-2] dbx: cannot access address 0x9f7f6fe (dbx) Something is definitely wrong with this memcpy() operation. I suppose we'll need some help from the HDF5 folks to figure out what the buf memory buffer is supposed contain in this case.
If only to "eliminate from our inquiries", I'd try running CPP over the code and see if the full macro expansion (including that from memcpy to _memcpy) is working as intended.
After the macro expansion, this code looks like this:
for(curr_seq=0; curr_seq<nseq; curr_seq++) {
curr_len=len[curr_seq];
memcpy ( ( char * ) ( tgath_buf ) , ( const char *
) ( buf + off [curr_seq ] ) , curr_len );
tgath_buf+=curr_len;
}
I take it that you've already tried compiling the module at maximal warning level?
No. We can try that, as well as lint. Thanks. -Greg
Hi David,
Tai-Wei (David) Lin wrote:
Hi R Developers,
Greg is helping me with debugging R on Solaris 10 x64. Please let us
know if you have any thoughts or tips that can help us debug this.
Thanks,
David
************
Using default transfer plist
in vector_io: permuting
About to write
*** caught segfault ***
address e8554000, cause 'memory not mapped'
Traceback:
1: .External("do_hdf5save", call, sys.frame(sys.parent()), fileout,
..., PACKAGE = "hdf5")
2: hdf5save(hdf5_Fstat, "Fstat", "geneNames", "genotype")
aborting ...
************
We've tried many things to debug it:
* dbx Runtime Checking (RTC) is not detecting any (meaningful) memory
access problems that I can see.
* The same on Solaris/SPARC.
* Neither does Valgrind on Linux.
* I've tried increasing the C stack size, assuming R could be running
out of stack size. Didn't help.
Running R under dbx (without RTC) until the crash shows this:
...
About to write
t at 1 (l at 1) signal SEGV (no mapping at the fault address) in _memcpy at
0xfe90444b
0xfe90444b: _memcpy+0x006b: movaps 0x00000000(%esi),%xmm0
Current function is H5D_select_mgath
379 HDmemcpy(tgath_buf,buf+off[curr_seq],curr_len);
(dbx) where
current thread: t at 1
[1] _memcpy(0x0, 0xfdebc707, 0x9f5c4f0), at 0xfe90444b
=>[2] H5D_select_mgath(_buf = 0x9f79580, space = 0x8966770, iter =
0x8045980, nelmts = 3120U, dxpl_cache = 0xfe170078, _tgath_buf =
0x9f5c4f0), line 379 in "H5Dselect.c"
[3] H5D_contig_write(io_info = 0x804620c, nelmts = 3120ULL, mem_type =
0x97b05c8, mem_space = 0x8966770, file_space = 0x8966770, tpath =
0x8ee7078, src_id = 201326906, dst_id = 201326904, buf = 0x9f79580),
line 1418 in "H5Dio.c"
[4] H5D_write(dataset = 0x8f169c0, mem_type_id = 201326906, mem_space
= 0x8966770, file_space = 0x8966770, dxpl_id = 671088643, buf =
0x9f79580), line 952 in "H5Dio.c"
[5] H5Dwrite(dset_id = 335544330, mem_type_id = 201326906,
mem_space_id = 0, file_space_id = 0, plist_id = 671088643, buf =
0x9f79580), line 586 in "H5Dio.c"
[6] vector_io(call = 0x97234ec, writeflag = 1, dataset = 335544330,
space = 268435472, obj = 0x98386a0), line 535 in "hdf5.c"
[7] hdf5_write_vector(call = 0x97234ec, id = 67108867, symname =
0x9cf35d0 "geneNames", val = 0x98386a0), line 693 in "hdf5.c"
[8] hdf5_save_object(call = 0x97234ec, fid = 67108867, symname =
0x9cf35d0 "geneNames", val = 0x98386a0), line 957 in "hdf5.c"
[9] do_hdf5save(args = 0x9723284), line 1104 in "hdf5.c"
[10] do_External(call = 0x86d62bc, op = 0x8371cd8, args = 0x972340c,
env = 0x9723594), line 832 in "dotcode.c"
[11] Rf_eval(e = 0x86d62bc, rho = 0x9723594), line 445 in "eval.c"
[12] Rf_evalList(el = 0x86d6230, rho = 0x9723594, op = 0x837226c),
line 1463 in "eval.c"
[13] Rf_eval(e = 0x86d6214, rho = 0x9723594), line 438 in "eval.c"
[14] do_begin(call = 0x86d56bc, op = 0x836709c, args = 0x86d61dc, rho
= 0x9723594), line 1107 in "eval.c"
[15] Rf_eval(e = 0x86d56bc, rho = 0x9723594), line 431 in "eval.c"
[16] Rf_applyClosure(call = 0x9723738, op = 0x83c0328, arglist =
0x97236e4, rho = 0x8379b1c, suppliedenv = 0x8379b38), line 614 in "eval.c"
[17] Rf_eval(e = 0x9723738, rho = 0x8379b1c), line 455 in "eval.c"
[18] Rf_ReplIteration(rho = 0x8379b1c, savestack = 0, browselevel = 0,
state = 0x8047328), line 256 in "main.c"
[19] R_ReplConsole(rho = 0x8379b1c, savestack = 0, browselevel = 0),
line 305 in "main.c"
[20] run_Rmainloop(), line 944 in "main.c"
[21] Rf_mainloop(), line 951 in "main.c"
[22] main(ac = 4, av = 0x80477ac), line 33 in "Rmain.c"
(dbx) p curr_len
curr_len = 24960U
(dbx) p curr_seq
curr_seq = 0
(dbx) p of
dbx: "of" is not defined in the scope
`libhdf5.so.0.0.0`H5Dselect.c`H5D_select_mgath:347`
dbx: see `help scope' for details
(dbx) p off
off = 0x8042960
(dbx) p tgath_buf
tgath_buf = 0x9f5c4f0
"\xd87\x83^H\xa8\xf3\x82^H0^X\x82^H^X\xd4\x81^H^P\x90\x81^H\xb8m\x80^H^H'\x80^H\x88^?^?^H\x908^?^H\xb0\xf7~^H\xd8\xad~^H\xf8\xb2~^H\xb8\x8e~^H\xe8]~^H\xe8\xcb\xed^HP\xe3}^Hh\xdd\xbb^H\x98\xc4}^H\xf0\xa0}^H\xa8r}^HH}\xc3^HpO|^HH^V|^H^X\xd8|^H\xc0\xb1|^H8=}^H\x90\xcd{^H^Pm{^H\xb8#{^Hx'{^H\x90\xf8x^HpKx^H^POx^H\xa8~w^H^H>w^H\xf0\xb2w^H\xc8^Ew^HX'x^H\xf8\xdbv^H"
(dbx) p buf
buf = 0x9f79580
"\xd87\x83^H\xa8\xf3\x82^H0^X\x82^H^X\xd4\x81^H^P\x90\x81^H\xb8m\x80^H^H'\x80^H\x88^?^?^H\x908^?^H\xb0\xf7~^H\xd8\xad~^H\xf8\xb2~^H\xb8\x8e~^H\xe8]~^H\xe8\xcb\xed^HP\xe3}^Hh\xdd\xbb^H\x98\xc4}^H\xf0\xa0}^H\xa8r}^HH}\xc3^HpO|^HH^V|^H^X\xd8|^H\xc0\xb1|^H8=}^H\x90\xcd{^H^Pm{^H\xb8#{^Hx'{^H\x90\xf8x^HpKx^H^POx^H\xa8~w^H^H>w^H\xf0\xb2w^H\xc8^Ew^HX'x^H\xf8\xdbv^H"
(dbx) p nseq
nseq = 1U
(dbx) p len
len = 0x804195c
(dbx) p len[0..2]
len[0..2] =
[0] = 24960U
[1] = 140025512U
[2] = 140013048U
(dbx)
The R code in question is:
...
/* Loop, while sequences left to process */
for(curr_seq=0; curr_seq<nseq; curr_seq++) {
/* Get the number of bytes in sequence */
curr_len=len[curr_seq];
HDmemcpy(tgath_buf,buf+off[curr_seq],curr_len);
/* Advance offset in gather buffer */
tgath_buf+=curr_len;
} /* end for */
...
What's the initial size of tgath_buf? You need to make sure that you are not stepping out of it i.e. that sum(len[i], for 0<=i<nseq) is not greater than its initial size. That's for the writing side. Same on the reading side: you need to make sure that buf+off[curr_seq]+len[i]-1 is a safe place to be for any 0<=i<nseq. Otherwise, expect bad things to happen. And they are generally not reproducible in a consistent way. So even if this code never crashes on other systems, it doesn't mean that it is not broken. Cheers, H.
where ./src/hdf5-1.6.5/src/H5private.h: #define HDmemcpy(X,Y,Z) memcpy((char*)(X),(const char*)(Y),Z) Maybe the "curr_len = 24960U" value is too high. I have no way of knowing what it should be in this case. The crash could be caused by a compiler bug, although it's not very likely. These crashes have occurred both with and without optimization, with and without -g.
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel