Skip to content

rpart help please

9 messages · Remy X.O. Martin, Brian Ripley, A.J. Rossini +1 more

#
Hi all,

I am trying to get to grips with rpart, and find it not very easy given the 
information that comes with the package. Contrary to e.g. the ctest package docs, it 
doesn't say when "an rpart" could be used, and/or how to interpret the results. Here  
are a few of the open questions I have:

1) Read in ?rpart: ...method: one of.... If y is a survival object... A similar 
fleeting reference is just above under 'na.action', and there is a y=TRUE argument to 
rpart itself. I *suppose* that this refers to a formula of the form y~x, with y being 
the dependent variable -- or is this a (minor) bug in the documentation?

2) It looks like rpart and aov are in a way complementary, and should to a certain 
degree give comparable results. Is there some "user's guide" document somewhere that 
describes this in language accessible to "generic scientists" (= non-statisticians)?

3) I just applied rpart to a dataset, and saw something that seems counter-intuitive 
at the least: a branch is made (n=84) between fac1=A (n=28) and fac1=B,C (n=56). The 
fac1=A branch is an end-node, the fac1=B,C branch is itself branched into fac1=C (n=
28) and (!) fac1=A,B (n=28)! According to the n-counts, there should be no more A in 
that latter branch (or in the fac1=B,C branch in general). Do I not understand 
something essential, or is this a bug (in the branch criterion label and/or the n-
count label)?

Thanks in advance!

RXO


-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
On Thu, 4 Jul 2002, Remy X.O. Martin wrote:

            
rpart is a contributed package ported from S-PLUS 6.x.  There is other
documentation on how to use it, and I suggest you search it out.
y is the conventional name in statistics for the response variable.
There are some good introductions to doing statistics in R, and at least
one of them covers aov and rpart in depth.  And there is an FAQ for R.
It is correct, so it looks as if you `do not understand something
essential'.
#
2002/07/04 19:04:30, <ripley at stats.ox.ac.uk> wrote:

            
Hmm... indeed, there is an example just under half a page long on rpart in "Using R 
for Data Analysis and Graphics". The other documents on the "Contributed Documents" 
page do not mention it, nor does the FAQ, by the way. The FAQ does contain a pointer 
to the MASS3 page. Other than the online complement to the book on that page, I 
haven't seen anything in the FAQ that promises to be a freely available "good 
introduction to doing statistics in R" (for the generic scientist...)

[...]
Well, then I will reword my question as: what are good, freely (online) available 
introductions to statistics in R for the generic scientist, and where can they be 
found if not via google and the like?

RXO



-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
remy> Well, then I will reword my question as: what are good,
    remy> freely (online) available introductions to statistics in R
    remy> for the generic scientist, and where can they be found if
    remy> not via google and the like?

And why should it be freely available?  Someone has to write it;
that's hard.  I've heard of proposals to write such a book, but it
won't be free.  I'm not seeing any support for such a project
commercially, nor is there formal academic credit in general for such
(at least not compared to the effort required to make it good, not
just adequate), which might be the other

There are good (though not perfect) introductions in the contrib
section of CRAN.  There are plenty of "textbooks" on the WWW, in
various degrees of quality.  

Any scientist has to put in for research equipment.  I've never heard
of RT-PCR machines being free, nor the computers, nor most tools for
most endeavors.  Why should statistical tools (knowledge as well as
software) be treated in the same way?  

"Statistics, anyone can do that; you provide a service, you aren't
part of the scientific team, we don't need to provide you with tools,
since it's more important to pick up reagent X, or our own new Win XP
laptops, or an additional conference...".

Thankfully, it's been a long while since I've worked with such
folk..., though I get inquiries from them, thanks to the two
(incomplete, and of marginal/poor quality compared to contemporary
work) WWW-texts that I wrote nearly 6 years ago, and have made
freely available.

best,
-tony
#
2002/07/04 20:46:56, rossini at blindglobe.net (A.J. Rossini) wrote:

            
I added this specification because I understood I had to be concise, precise and 
specific in order to get an appropriate answer :) Besides, I'd rather prevent 
having to go through the administrative ordeals of ordering books, and use the time 
I have until my next big project to improve on my stats...
I understand what you want to say, with a few major "buts". First, R itself is 
free. It has an large amount of equally free documentation, that is probably 
adequate for statisticians. Then, of course, science itself is for the most part 
free. What we pay for is access to results of colleagues (who usually don't get 
very rich out of it :)). Have you heard of the Budapest Open Access Initiative? 
Books are arguably different, but then there are "plenty of \"textbooks\"" 
available on various subjects, often in fact whole courses of considerable quality. 
Mind you: I'm not saying we should all work for free.

There has been a lot of talk about R GUIs and their use in teaching; I think that 
the sort of documention I have in mind  (or a self-explaining [g]ui) should be 
included in these considerations.
The introductions on CRAN are indeed mostly good, but (necessarily) concise and in 
many cases do not explain the examples they give, suggesting they are primarily 
intended for (student) statisticians. As to the textbooks on the web: wouldn't it 
be a good idea to put links to the better among them in that same CRAN section??
So then what exactly is the price of knowledge, and what its currency...?
Other than that, I agree. So a colleague pays dearly for his yearly SAS license 
just because he needs an anova that can handle varying samples-per-subject. 
(Something I'm sure R can do, but wouldn't know how.)
I hope you're not counting me among them! I can only wish there were a statistics 
department in the institution I work in (a Prestiguous French College founded by 
François Ier), where I could get the help I (sometimes) need. Rather than that, 
each lab hires its own cleaning personel...

Thanks anyway for your reaction :)

Best,
RXO



-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
remy> I understand what you want to say, with a few major
    remy> "buts". First, R itself is free. It has an large amount of
    remy> equally free documentation, that is probably adequate for
    remy> statisticians. Then, of course, science itself is for the
    remy> most part free. What we pay for is access to results of
    remy> colleagues (who usually don't get very rich out of it
    remy> :)). Have you heard of the Budapest Open Access Initiative? 
    remy> Books are arguably different, but then there are "plenty of
    remy> \"textbooks\"" available on various subjects, often in fact
    remy> whole courses of considerable quality. Mind you: I'm not
    remy> saying we should all work for free.

Science is RARELY free, unless you have access to prestigious
personnel.  Second, R is "free" only by certain definitions.  It still
costs time and effort to learn how to use; your example of SAS
licensing is perfect.  I had a wonderful conversation with a colleague
yesterday who would be happy to use R if she could find a fairly
complete book on "how to do graphics" in R, to the same extent that
SAS's manuals exist.  Sure, I can answer nearly all of her questions
(and some of the ones she hasn't realized she needs to ask, yet), but
that won't get the job done -- the few thousands of dollars spent on
SAS and the books are cheap compared with her salary (or my time).

    remy> There has been a lot of talk about R GUIs and their use in
    remy> teaching; I think that the sort of documention I have in
    remy> mind (or a self-explaining [g]ui) should be included in
    remy> these considerations.

I would suggest that you (or others) could start plans for one.  Part
of the benefit of writing books or other manuscripts, is that in the
process of clarity, you can gain insight into what you do not
understand, or specifying what is needed.  There are plenty of people
on this list who are happy to provide occassional (free) help for such
attempts.  But they'd like to see the effort first!

    remy> The introductions on CRAN are indeed mostly good, but
    remy> (necessarily) concise and in many cases do not explain the
    remy> examples they give, suggesting they are primarily intended
    remy> for (student) statisticians. As to the textbooks on the web:
    remy> wouldn't it be a good idea to put links to the better among
    remy> them in that same CRAN section??

Then why don't you go through them, and try to phrase what is missing?
I'm sure that the authors would appreciate insight, especially in the
form of contributed sections or examples, even if not necessarily
correct (they'd at least be more easily correctable than by doing it
from scratch.

Yes, it does take time, but everything takes time.  Now find someone
with the knowledge to do that, and it's difficult to let them find the
time.  Help them out, support current projects (esp documentation
projects), and contribute.  

Suggesting projects is pretty silly; nearly any project suggestable
can be approximately solved, though by tools you might not care to
use...  (i.e. the GUI part -- it is solved, though people don't care
to learn LISP or Emacs to contribute the little part remaining --
they'd rather roll their own solutions).

best,
-tony
#
My apologies for the off-topic post, but this seemed the most likely place to
find relevant expertise.

Is there anyone who would be able to assist me in configuring GNU Emacs, ESS,
and SAS on my Windows NT and Windows 2000 machines?

I have been successful running R in ESS, thanks to the excellent examples I've
found for helping with .emacs and related files.

SAS, however, seems to be a horse of a different colour.

Thus far, I can edit .sas files using ESS. However, when I call for SAS (
Alt-X SAS"), Emacs responds:

apply: Spawning child process: invalid argument

(the above copied from the *Messages* buffer).

I suspect this is due to problems with some variable associated with
inferior-SAS-program-name or inferior-SAS-args, but I'm getting in over my head,
here.

Thanks for any pointers - I'd be delighted to be told to read 'document X'.  So
far, the online documents I've read and the documents embedded within ESS 'docs'
directory appear to assume a Unix environment.

Cheers,

Rob

P.S.  I will summarize any discoveries/imparted wisdoms to the group, so feel
free to respond directly.

-- Robert Balshaw, Ph.D.
-- Senior Biostatistician, Syreon Corp.
-- Phone: (604) 822-5199; Fax: (604) 822-5911

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
robert> My apologies for the off-topic post, but this seemed the
    robert> most likely place to find relevant expertise.

Not even close (to being the most likely).  Try the ESS-help mailing
list.  Neither Rich nor Rodney read this list very much.  Note that
what you are asking might be beyond SAS for win2k.  (I think that
there is a batch-submission mode, but it isn't really the
"interactive" version that exists on Unix).

doc/README.SAS or the texinfo files in doc directory for more
information (in the ESS unpacked directory).

best,
-tony
#
Thanks to Tony Rossini, who directed me to the more appropriate ESS-help mailing
list.

Rich Heiberger was quick to confirm that Meta-x SAS will not work for Windows.
He suggested that I use the editing capabilities and submit the code to an
external or batch SAS job.

<Sigh>

One more reason I should switch to GNU/Linux.

Cheers,

Rob

-- Robert Balshaw, Ph.D.
-- Senior Biostatistician, Syreon Corp.
-- Phone: (604) 822-5199; Fax: (604) 822-5911
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._