Skip to content

RODBC: probs with arg

3 messages · Gabor Grothendieck, Matthieu Stigler

#
Hi

I'm trying to raed excel files with RODBC. This works well but I have a
small problem, as my doc has:

-line 1: gen infos
-line 2: colnames
-line 3: units

So I wanted to skip the first line, use the second as colnames and skip the
third.

But by default odbcConnectExcel() takes the first line as the right one, so
the 2 others lines are displayed as kind of NAs and I have wrong col
names...

#To see it, can replicate pseudo example with xlsReadWrite:
library(xlsReadWrite)
tdat <- data.frame( This= c("Price ", "USD",6399, 3699, 2499 ), Is=
c("Amount ", "Unit",2, 3, 1 ),  How_my_file_looks_like= c("Date",
"Day",39202, 39198, 39199 ),  row.names = c("", "Type","Pro machine", "Road
racer", "Streetfire" ) )
write.xls( tdat, "bikes.xls" )

#get the data
library(RODBC)
chan<-odbcConnectExcel("bikes.xls")
sh1 <- sqlFetch(chan, "Sheet1")
head(sh1)
odbcCloseAll()

           F1   This      Is How_my_file_looks_like
1        <NA> Price  Amount                    Date
2        Type    USD    Unit                    Day
3 Pro machine   6399       2                  39202
4  Road racer   3699       3                  39198
5  Streetfire   2499       1                  39199

See the message on the help file:

It is assumed that the first row of the table in the worksheet
     contains column headings: the driver parameter 'FirstRowHasNames =
     0' is supposed to turn this off (giving column names 'F1' ...) but
     it is broken in many versions of the drivers.

I don't know exactly where this arg should be added...  It doesn't work in
odbcConnectExcel("bikes.xls",FirstRowHasNames=0)

So I tried something like:

channel3 <-odbcDriverConnect("DRIVER=Microsoft Excel Driver
(*.xls);DBQ=bikes.xls; ReadOnly=False;FirstRowHasNames=0")

But his doesn't do anything... Do you see how I should do it? And if I
manage to escape the first line, can I avoid the problem of the third line?

I tried alternatively usign xlsReadWrite, whihc can "skip" lines, but then
the first lines after colnames can't be escaped and hence the columns are
seen as factors:
sh2<-read.xls(file="bikes.xls", sheet=1, from=2)
sh2

So do you have a n idea how I can solve my problem?

Thanks a lot!
#
Try reading in the headings and data separately like this
(shown with read.xls from gdata):

library(gdata)
DF <- read.xls("bikes.xls", skip = 3, header = FALSE,
    col.names = names(read.xls("bikes.xls", skip = 1, nrows = 1)))


On Fri, Jul 17, 2009 at 8:47 AM, Matthieu
Stigler<matthieu.stigler at gmail.com> wrote:
1 day later
#
Thanks Gabor, that's a nice workaround!

But any view on how to use the parameter

"FirstRowHasNames"


in package RODBC? Thanks

Matthieu Stigler

Gabor Grothendieck a ?crit :