Building package - tab delimited example data issue

Peter Dalgaard · 2007-12-06T10:52:46Z

Johannes Graumann wrote: > Hello, > > I'm trying to integrate example data in the shape of a tab delimited ASCII > file into my package and therefore dropped it into the data subdirectory. > The build works out just fine, but when I attempt to install I get: > > ** building package indices ... > Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, > na.strings, : > line 1 did not have 500 elements > Calls: ... -> switch -> assign -> read.table -> scan > Execu

Peter Dalgaard

Thu, Dec 6, 2007 2:52 AM

Johannes Graumann wrote:

If you had looked at help(data), you would have found a list of which
file formats it supports and how they are read. Hint: TAB-delimited
files are not among them. *Whitespace* separated files work, using
read.table(filename, header=TRUE), but that is not a superset of
TAB-delimited data if there are empty fields.

A nice trick is to figure out how to read the data from the command line
and drop the relevant code into a mydata.R file (assuming that the
actual data file is mydata.txt). This gets executed when the data is
loaded (by data(mydata) or when building the lazyload database) because
.R files have priority over .txt.

This is quite general and allows a nice way of incorporating data
management while retaining the original data source:

stroke <-  read.csv2("stroke.csv", na.strings=".")
names(stroke) <- tolower(names(stroke))
stroke <-  within(stroke,{
    sex <- factor(sex,levels=0:1,labels=c("Female","Male"))
    dgn <- factor(dgn)
    coma <- factor(coma, levels=0:1, labels=c("No","Yes"))
    minf <- factor(minf, levels=0:1, labels=c("No","Yes"))
    diab <- factor(diab, levels=0:1, labels=c("No","Yes"))
    han <- factor(han, levels=0:1, labels=c("No","Yes"))
    died <- as.Date(died, format="%d.%m.%Y")
    dstr <- as.Date(dstr,format="%d.%m.%Y")
    dead <- !is.na(died) & died < as.Date("1996-01-01")
    died[!dead] <- NA
})

SEX;DIED;DSTR;AGE;DGN;COMA;DIAB;MINF;HAN
1;7.01.1991;2.01.1991;76;INF;0;0;1;0
1;.;3.01.1991;58;INF;0;0;0;0
1;2.06.1991;8.01.1991;74;INF;0;0;1;1
0;13.01.1991;11.01.1991;77;ICH;0;1;0;1
0;23.01.1996;13.01.1991;76;INF;0;1;0;1
1;13.01.1991;13.01.1991;48;ICH;1;0;0;1
0;1.12.1993;14.01.1991;81;INF;0;0;0;1
1;12.12.1991;14.01.1991;53;INF;0;0;1;1
0;.;15.01.1991;73;ID;0;0;0;1

O__  ---- Peter Dalgaard             ?ster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907

Building package - tab delimited example data issue

Thread (9 messages)