NULL data not mapped to NA with RODBC on 64-bit Mac OS X
Thanks for the reply, Marc! Ah, it does appear I'm booting in 32-bit mode. From the terminal:
uname -a
Darwin Harlan-Harris-MacBook-Pro.local 10.6.0 Darwin Kernel Version 10.6.0: Wed Nov 10 18:13:17 PST 2010; root:xnu-1504.9.26~3/RELEASE_I386 i386 I can usually run stuff in 32-bit mode, but sometimes I would like the ability to use the extra RAM. Also, certain interfaces to R (such as the new RStudio) will run R in 64-bit mode, so for now, I can't use RStudio when I'm pulling data via ODBC that might have NULLs. I think Rscript also wants to run in 64-bit mode. -Harlan
On Thu, Mar 24, 2011 at 9:09 AM, Marc Schwartz <marc_schwartz at me.com> wrote:
Hi Harlan, I suspect that we will need to wait for Prof. Ripley to be able to look into this further. There may be subtle issues at play here and I would defer to his more intimate knowledge of the ODBC tool chain. I do know that the Actual driver is both 32 and 64 bit in a single installation. However, I run R in 32 bit mode (don't need the extra address space, even though I now have 8Gb on my MacBook Pro), so have not encountered the 64 bit issues that you report here. I have the same header file. Note that SQLLEN is a signed declaration below. It is defined as a 'long' generally and as a INT64 for 64 bit Windows. It would be 'unsigned long' or UINT64, respectively for an unsigned value. One question for you however, which is do you know if you are booting the 32 bit or 64 bit OSX kernel on your Mac? Macs prior to mid-2010 (like mine) running Snow Leopard typically default to the 32 bit kernel, though can be 64 bit 'capable'. It depends upon various factors including the presence of a 64 bit EFI. Newer Macs now boot into the 64 bit kernel by default. I have seen some problems reported on older Macs where the default kernel was changed from 32 bit to 64 bit, resulting in subtle and not so subtle problems with some drivers. Booting to the 32 bit kernel still allows you to run 64 bit apps under Snow Leopard, just that the kernel and extensions are 32 bit. Regards, Marc On Mar 23, 2011, at 2:29 PM, Harlan Harris wrote:
Any thoughts? Is there someone else or somewhere else I might ask? -Harlan On Tue, Mar 22, 2011 at 1:31 PM, Harlan Harris <harlan at harris.name>
wrote:
Hello, I seem to have hit a bug in RODBC on 64-bit versions of R on Mac OS X.
I'm
running R 2.12.1, and using R to talk to Oracle and other databases
using
RODBC 1.3-2. In 32-bit mode (R32), everything works. (The ODBC drivers
are
the ones from Actual.) In 64-bit mode (R), it seems as if NULL data is
not
being treated as missing. For example, a NULL value in an integer column
is
being returned as a 0 instead, which is wrong/bad. Diving into the code and Google a bit, it seems like an issue that's
been
seen before in other contexts. Here's the relevant C code, from
RODBCFetchRows:
for(row = thisHandle->rowsUsed;
row < thisHandle->rowsFetched && j <= maximum;
j++, row++)
{
thisHandle->rowsUsed++;
if(j > blksize) {
blksize *= 2;
for (i = 0; i < nc; i++)
SET_VECTOR_ELT(data, i,
lengthgets(VECTOR_ELT(data, i), blksize));
}
for (i = 0; i < nc; i++) {
SQLLEN len = thisHandle->ColData[i].IndPtr[row];
switch(thisHandle->ColData[i].DataType) {
case SQL_DOUBLE:
REAL(VECTOR_ELT(data, i))[j-1] =
len == SQL_NULL_DATA ? NA_REAL :
thisHandle->ColData[i].RData[row];
break;
etc...
Googling for SQL_NULL_DATA and related terms finds this page, about the
same issue in a Python driver:
http://code.google.com/p/pyodbc/issues/detail?id=51
It appears as if SQLLEN might be unsigned in 64-bit builds, while
SQL_NULL_DATA is -1. Casting len to be an int rather than an unsigned
int
seems to solve the problem for them. On my machine, /usr/include/sqltypes.h starts like this: /* * sqltypes.h * * $Id: sqltypes.h,v 1.23 2007/10/07 13:27:13 source Exp $ * * ODBC typedefs * * The iODBC driver manager. and defines SQLLEN as follows: #ifdef _WIN64 typedef INT64 SQLLEN; typedef UINT64 SQLULEN; typedef UINT64 SQLSETPOSIROW; #elif defined(STRICT_ODBC_TYPES) typedef long SQLLEN; typedef unsigned long SQLULEN; typedef unsigned short SQLSETPOSIROW; #else #define SQLLEN long #define SQLULEN unsigned long #define SQLSETPOSIROW unsigned short #endif Seems to me like SQLLEN should be signed, but apparently not? Can anyone help? Thank you! -Harlan