Programcode and data in the same textfile
Hi Ernst.
I have found myself in a similar situation where I want to send
code to someone with annotations that explain the different pieces
in richer ways than comments will permit.
If you want to contain both data and code within a single document,
you will need to have some way to identify which is which so that the
software can distinguish the different elements of the document. This
is precisely what a markup language does. And rather than inventing ad
hoc conventions, why not simply use a real markup language. XML is the most
natural one, and doing something like
<doc>
<data>
Sex Response
Male 1
Male 2
Female 3
Female 4
</data>
<code>
......
</code>
</doc>
Using the XML package, you can read the document into R
and do what you will with it.
To read the data,
tr = xmlRoot(xmlTreeParse("myFile"))
read.table(textConnection(xmlValue(tr[["data"]])), header=TRUE)
and to access the code text
xmlValue(tr[["code"]])
I have a variety of different variants of this style of thing that I
occassionally add to the SXMLDocs package. But, for me at least, it is
easy to write handlers to process the different content but to leave
XML to identify them within the document.
Hope this provides some ideas for thinking about the problem
in a slightly broader light.
D.
Ernst Hansen wrote:
I have the following problem. It is not of earthshaking importance,
but still I have spent a considerable amount of time thinking about
it.
PROBLEM: Is there any way I can have a single textfile that contains
both
a) data
b) programcode
The program should act on the data, if the textfile is source()'ed
into R.
BOUNDARY CONDITION: I want the data written in the textfile in exactly
the same format as I would use, if I had data in a separate textfile,
to be read by read.table(). That is, with 'horizontal inhomogeneity'
and 'vertical homogeneity' in the type of entries. I want to write
something like
Sex Respons
Male 1
Male 2
Female 3
Female 4
In effect, I am asking if there is some way I can convince
read.table(), that the data is contained in the following n lines of
text.
ILLEGAL SOLUTIONS:
I know I can simulate the behaviour by reading the columns of the
dataframe one by one, and using data.frame() to glue them together.
Like in
data.frame(Sex = c('Male', 'Male', 'Female', 'Female'),
Respons = c(1, 2, 3, 4))
I do not like this solution, because it represents the data in a
"transposed" way in the textfile, and this transposition makes the
structure of the dataframe less transparent - at least to me. It
becomes even less comprehensible if the Sex-factor above is written
with the help of rep() or gl() or the like.
I know I can make read.table() read from stdin, so I could type the
dataframe at the prompt. That is against the spirit of the problem,
as I describe below.
I know I can make read.table() do the job, if I split the data and the
programcode in to different files. But as the purpose of the exercise
is to distribute the data and the code to other people, splitting
into several files is a complication.
MOTIVATION: I frequently find myself distributing small chunks of code
to my students, along with data on which the code can work.
As an example, I might want to demonstrate how model.matrix() treats
interactions, in a certain setting. For that I need a dataframe that
is complex enough to exhibit the behaviour I want, but still so small
that the model.matrix is easily understood. So I make such a
dataframe.
I am trying to distribute this dataframe along with my code, in a way
that is as simple as possible to USE for the students (hence the
one-file boundary condition) and to READ (hence the non-transposition
boundary condition).
Does anybody have any ideas?
Ernst Hansen
Department of Statistics
University of Copenhagen
______________________________________________ R-help at stat.math.ethz.ch mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
_______________________________________________________________
Duncan Temple Lang duncan at research.bell-labs.com
Bell Labs, Lucent Technologies office: (908)582-3217
700 Mountain Avenue, Room 2C-259 fax: (908)582-3340
Murray Hill, NJ 07974-2070
http://cm.bell-labs.com/stat/duncan