-----Original Message-----
From: Torsten Hothorn [mailto:Torsten.Hothorn at rzmail.uni-erlangen.de]
Sent: 12 June 2003 14:00
To: Ernst Hansen
Cc: r-help at stat.math.ethz.ch
Subject: Re: [R] Programcode and data in the same textfile
I have the following problem. It is not of earthshaking importance,
but still I have spent a considerable amount of time thinking about
it.
PROBLEM: Is there any way I can have a single textfile that contains
both
a) data
b) programcode
The program should act on the data, if the textfile is source()'ed
into R.
BOUNDARY CONDITION: I want the data written in the textfile
the same format as I would use, if I had data in a separate
to be read by read.table(). That is, with 'horizontal
and 'vertical homogeneity' in the type of entries. I want to write
something like
Sex Respons
Male 1
Male 2
Female 3
Female 4
something like
tmpfilename <- tempfile()
tmpfile <- file(tmpfilename, "w")
cat(
### here comes my data
"Sex Respons",
"Male 1",
"Male 2",
"Female 3",
"Female 4",
### end of data input
file = tmpfile, sep="\n")
close(tmpfile)
read.table(tmpfilename, header = TRUE)
best,
Torsten
In effect, I am asking if there is some way I can convince
read.table(), that the data is contained in the following n lines of
text.
ILLEGAL SOLUTIONS:
I know I can simulate the behaviour by reading the columns of the
dataframe one by one, and using data.frame() to glue them together.
Like in
data.frame(Sex = c('Male', 'Male', 'Female', 'Female'),
Respons = c(1, 2, 3, 4))
I do not like this solution, because it represents the data in a
"transposed" way in the textfile, and this transposition makes the
structure of the dataframe less transparent - at least to me. It
becomes even less comprehensible if the Sex-factor above is written
with the help of rep() or gl() or the like.
I know I can make read.table() read from stdin, so I could type the
dataframe at the prompt. That is against the spirit of the problem,
as I describe below.
I know I can make read.table() do the job, if I split the
programcode in to different files. But as the purpose of
is to distribute the data and the code to other people, splitting
into several files is a complication.
MOTIVATION: I frequently find myself distributing small
to my students, along with data on which the code can work.
As an example, I might want to demonstrate how model.matrix() treats
interactions, in a certain setting. For that I need a
is complex enough to exhibit the behaviour I want, but
that the model.matrix is easily understood. So I make such a
dataframe.
I am trying to distribute this dataframe along with my
that is as simple as possible to USE for the students (hence the
one-file boundary condition) and to READ (hence the
boundary condition).
Does anybody have any ideas?
Ernst Hansen
Department of Statistics
University of Copenhagen