NULL data not mapped to NA with RODBC on 64-bit Mac OS X
(Missed this until now! Sorry for the delay in responding!) You could be right about the solution I'm suggesting being wrong, but I don't know. Does anyone else have any thoughts on the matter? It works for me....! I don't know enough to answer your other question, I'm afraid. It seems to me as if the nature of 64-bit equality is the real problem here... I'd really like this to get resolved officially. I've done the best I can to write a patch. Prof. Ripley, would you be able to weigh in on this? -Harlan On Mon, Apr 18, 2011 at 9:20 PM, Tomoaki NISHIYAMA <
tomoakin at kenroku.kanazawa-u.ac.jp> wrote:
Hi Harlan, I don't think this is a right way to fix. Because (int64_t) 0xFFFFFFFF == 4294967295 != -1 the bug resides on the code setting ColData[i].IndPtr[row]. Changing the way to interpret it is a wrong solution. Though, it might work as a workaround for the bug in other parts. The commented-out warning() in the code was used to confirm that without the casts, You did see only the lower 32 bits? -- Tomoaki NISHIYAMA Advanced Science Research Center, Kanazawa University, 13-1 Takara-machi, Kanazawa, 920-0934, Japan On 2011/04/19, at 1:31, Harlan Harris wrote: OK, with the help of a rusty C coder at work, I've confirmed that the solution the Python crew figured out also seems to work for me. I have no idea if it breaks the 32-bit MacOS build, or any other build. Note that this seems to be a very subtle effect of the way the compiler and header files are interacting, as far as I can tell. The commented-out warning() in the code was used to confirm that without the casts, the len variable and SQL_NULL_DATA are, in fact, the same, and that despite their being the same, the equality test fails. A patch is attached. All of the len variables are cast to ints. (Note that casting to long, which should work, doesn't.) Would someone please apply to patch to the source tree and do some regression testing? -Harlan On Thu, Mar 31, 2011 at 9:35 AM, Harlan Harris <harlan at harris.name> wrote:
Prof. Ripley? Anyone? Should I cross-post this to R-Help? If someone would be willing to fix this for pay, please email me a proposal and I'll see if my employer would be willing to sponsor a fix. -Harlan On Tue, Mar 29, 2011 at 9:43 AM, Harlan Harris <harlan at harris.name>wrote:
Following up on this... Prof. Ripley? Is this something you can address? -Harlan On Thu, Mar 24, 2011 at 9:54 AM, Harlan Harris <harlan at harris.name>wrote:
Thanks for the reply, Marc! Ah, it does appear I'm booting in 32-bit mode. From the terminal:
uname -a
Darwin Harlan-Harris-MacBook-Pro.local 10.6.0 Darwin Kernel Version 10.6.0: Wed Nov 10 18:13:17 PST 2010; root:xnu-1504.9.26~3/RELEASE_I386 i386 I can usually run stuff in 32-bit mode, but sometimes I would like the ability to use the extra RAM. Also, certain interfaces to R (such as the new RStudio) will run R in 64-bit mode, so for now, I can't use RStudio when I'm pulling data via ODBC that might have NULLs. I think Rscript also wants to run in 64-bit mode. -Harlan On Thu, Mar 24, 2011 at 9:09 AM, Marc Schwartz <marc_schwartz at me.com>wrote:
Hi Harlan, I suspect that we will need to wait for Prof. Ripley to be able to look into this further. There may be subtle issues at play here and I would defer to his more intimate knowledge of the ODBC tool chain. I do know that the Actual driver is both 32 and 64 bit in a single installation. However, I run R in 32 bit mode (don't need the extra address space, even though I now have 8Gb on my MacBook Pro), so have not encountered the 64 bit issues that you report here. I have the same header file. Note that SQLLEN is a signed declaration below. It is defined as a 'long' generally and as a INT64 for 64 bit Windows. It would be 'unsigned long' or UINT64, respectively for an unsigned value. One question for you however, which is do you know if you are booting the 32 bit or 64 bit OSX kernel on your Mac? Macs prior to mid-2010 (like mine) running Snow Leopard typically default to the 32 bit kernel, though can be 64 bit 'capable'. It depends upon various factors including the presence of a 64 bit EFI. Newer Macs now boot into the 64 bit kernel by default. I have seen some problems reported on older Macs where the default kernel was changed from 32 bit to 64 bit, resulting in subtle and not so subtle problems with some drivers. Booting to the 32 bit kernel still allows you to run 64 bit apps under Snow Leopard, just that the kernel and extensions are 32 bit. Regards, Marc On Mar 23, 2011, at 2:29 PM, Harlan Harris wrote:
Any thoughts? Is there someone else or somewhere else I might ask? -Harlan On Tue, Mar 22, 2011 at 1:31 PM, Harlan Harris <harlan at harris.name>
wrote:
Hello, I seem to have hit a bug in RODBC on 64-bit versions of R on Mac OS
X. I'm
running R 2.12.1, and using R to talk to Oracle and other databases
using
RODBC 1.3-2. In 32-bit mode (R32), everything works. (The ODBC
drivers are
the ones from Actual.) In 64-bit mode (R), it seems as if NULL data
is not
being treated as missing. For example, a NULL value in an integer
column is
being returned as a 0 instead, which is wrong/bad. Diving into the code and Google a bit, it seems like an issue that's
been
seen before in other contexts. Here's the relevant C code, from
RODBCFetchRows:
for(row = thisHandle->rowsUsed;
row < thisHandle->rowsFetched && j <= maximum;
j++, row++)
{
thisHandle->rowsUsed++;
if(j > blksize) {
blksize *= 2;
for (i = 0; i < nc; i++)
SET_VECTOR_ELT(data, i,
lengthgets(VECTOR_ELT(data, i), blksize));
}
for (i = 0; i < nc; i++) {
SQLLEN len = thisHandle->ColData[i].IndPtr[row];
switch(thisHandle->ColData[i].DataType) {
case SQL_DOUBLE:
REAL(VECTOR_ELT(data, i))[j-1] =
len == SQL_NULL_DATA ? NA_REAL :
thisHandle->ColData[i].RData[row];
break;
etc...
Googling for SQL_NULL_DATA and related terms finds this page, about
the
same issue in a Python driver: http://code.google.com/p/pyodbc/issues/detail?id=51 It appears as if SQLLEN might be unsigned in 64-bit builds, while SQL_NULL_DATA is -1. Casting len to be an int rather than an
unsigned int
seems to solve the problem for them. On my machine, /usr/include/sqltypes.h starts like this: /* * sqltypes.h * * $Id: sqltypes.h,v 1.23 2007/10/07 13:27:13 source Exp $ * * ODBC typedefs * * The iODBC driver manager. and defines SQLLEN as follows: #ifdef _WIN64 typedef INT64 SQLLEN; typedef UINT64 SQLULEN; typedef UINT64 SQLSETPOSIROW; #elif defined(STRICT_ODBC_TYPES) typedef long SQLLEN; typedef unsigned long SQLULEN; typedef unsigned short SQLSETPOSIROW; #else #define SQLLEN long #define SQLULEN unsigned long #define SQLSETPOSIROW unsigned short #endif Seems to me like SQLLEN should be signed, but apparently not? Can anyone help? Thank you! -Harlan
<RODBC.c.diff>_______________________________________________ R-sig-DB mailing list -- R Special Interest Group R-sig-DB at r-project.org