Hi all, I am trying to get to grips with rpart, and find it not very easy given the information that comes with the package. Contrary to e.g. the ctest package docs, it doesn't say when "an rpart" could be used, and/or how to interpret the results. Here are a few of the open questions I have: 1) Read in ?rpart: ...method: one of.... If y is a survival object... A similar fleeting reference is just above under 'na.action', and there is a y=TRUE argument to rpart itself. I *suppose* that this refers to a formula of the form y~x, with y being the dependent variable -- or is this a (minor) bug in the documentation? 2) It looks like rpart and aov are in a way complementary, and should to a certain degree give comparable results. Is there some "user's guide" document somewhere that describes this in language accessible to "generic scientists" (= non-statisticians)? 3) I just applied rpart to a dataset, and saw something that seems counter-intuitive at the least: a branch is made (n=84) between fac1=A (n=28) and fac1=B,C (n=56). The fac1=A branch is an end-node, the fac1=B,C branch is itself branched into fac1=C (n= 28) and (!) fac1=A,B (n=28)! According to the n-counts, there should be no more A in that latter branch (or in the fac1=B,C branch in general). Do I not understand something essential, or is this a bug (in the branch criterion label and/or the n- count label)? Thanks in advance! RXO -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
rpart help please
9 messages · Remy X.O. Martin, Brian Ripley, A.J. Rossini +1 more
On Thu, 4 Jul 2002, Remy X.O. Martin wrote:
I am trying to get to grips with rpart, and find it not very easy given the information that comes with the package. Contrary to e.g. the ctest package docs, it doesn't say when "an rpart" could be used, and/or how to interpret the results. Here are a few of the open questions I have:
rpart is a contributed package ported from S-PLUS 6.x. There is other documentation on how to use it, and I suggest you search it out.
1) Read in ?rpart: ...method: one of.... If y is a survival object... A similar fleeting reference is just above under 'na.action', and there is a y=TRUE argument to rpart itself. I *suppose* that this refers to a formula of the form y~x, with y being the dependent variable -- or is this a (minor) bug in the documentation?
y is the conventional name in statistics for the response variable.
2) It looks like rpart and aov are in a way complementary, and should to a certain degree give comparable results. Is there some "user's guide" document somewhere that describes this in language accessible to "generic scientists" (= non-statisticians)?
There are some good introductions to doing statistics in R, and at least one of them covers aov and rpart in depth. And there is an FAQ for R.
3) I just applied rpart to a dataset, and saw something that seems counter-intuitive at the least: a branch is made (n=84) between fac1=A (n=28) and fac1=B,C (n=56). The fac1=A branch is an end-node, the fac1=B,C branch is itself branched into fac1=C (n= 28) and (!) fac1=A,B (n=28)! According to the n-counts, there should be no more A in that latter branch (or in the fac1=B,C branch in general). Do I not understand something essential, or is this a bug (in the branch criterion label and/or the n- count label)?
It is correct, so it looks as if you `do not understand something essential'.
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
2002/07/04 19:04:30, <ripley at stats.ox.ac.uk> wrote:
There are some good introductions to doing statistics in R, and at least one of them covers aov and rpart in depth. And there is an FAQ for R.
Hmm... indeed, there is an example just under half a page long on rpart in "Using R for Data Analysis and Graphics". The other documents on the "Contributed Documents" page do not mention it, nor does the FAQ, by the way. The FAQ does contain a pointer to the MASS3 page. Other than the online complement to the book on that page, I haven't seen anything in the FAQ that promises to be a freely available "good introduction to doing statistics in R" (for the generic scientist...) [...]
that latter branch (or in the fac1=B,C branch in general). Do I not understand something essential, or is this a bug (in the branch criterion label and/or the n- count label)?
It is correct, so it looks as if you `do not understand something essential'.
Well, then I will reword my question as: what are good, freely (online) available introductions to statistics in R for the generic scientist, and where can they be found if not via google and the like? RXO -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
"remy" == Remy X O Martin <Remy> writes:
remy> Well, then I will reword my question as: what are good,
remy> freely (online) available introductions to statistics in R
remy> for the generic scientist, and where can they be found if
remy> not via google and the like?
And why should it be freely available? Someone has to write it;
that's hard. I've heard of proposals to write such a book, but it
won't be free. I'm not seeing any support for such a project
commercially, nor is there formal academic credit in general for such
(at least not compared to the effort required to make it good, not
just adequate), which might be the other
There are good (though not perfect) introductions in the contrib
section of CRAN. There are plenty of "textbooks" on the WWW, in
various degrees of quality.
Any scientist has to put in for research equipment. I've never heard
of RT-PCR machines being free, nor the computers, nor most tools for
most endeavors. Why should statistical tools (knowledge as well as
software) be treated in the same way?
"Statistics, anyone can do that; you provide a service, you aren't
part of the scientific team, we don't need to provide you with tools,
since it's more important to pick up reagent X, or our own new Win XP
laptops, or an additional conference...".
Thankfully, it's been a long while since I've worked with such
folk..., though I get inquiries from them, thanks to the two
(incomplete, and of marginal/poor quality compared to contemporary
work) WWW-texts that I wrote nearly 6 years ago, and have made
freely available.
best,
-tony
A.J. Rossini Rsrch. Asst. Prof. of Biostatistics U. of Washington Biostatistics rossini at u.washington.edu FHCRC/SCHARP/HIV Vaccine Trials Net rossini at scharp.org -------------- http://software.biostat.washington.edu/ ---------------- FHCRC: M: 206-667-7025 (fax=4812)|Voicemail is pretty sketchy/use Email UW: Th: 206-543-1044 (fax=3286)|Change last 4 digits of phone to FAX (my tuesday/wednesday/friday locations are completely unpredictable.) -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
2002/07/04 20:46:56, rossini at blindglobe.net (A.J. Rossini) wrote:
And why should it be freely available? Someone has to write it;
I added this specification because I understood I had to be concise, precise and specific in order to get an appropriate answer :) Besides, I'd rather prevent having to go through the administrative ordeals of ordering books, and use the time I have until my next big project to improve on my stats...
that's hard. I've heard of proposals to write such a book, but it won't be free. I'm not seeing any support for such a project commercially, nor is there formal academic credit in general for such (at least not compared to the effort required to make it good, not just adequate), which might be the other
I understand what you want to say, with a few major "buts". First, R itself is free. It has an large amount of equally free documentation, that is probably adequate for statisticians. Then, of course, science itself is for the most part free. What we pay for is access to results of colleagues (who usually don't get very rich out of it :)). Have you heard of the Budapest Open Access Initiative? Books are arguably different, but then there are "plenty of \"textbooks\"" available on various subjects, often in fact whole courses of considerable quality. Mind you: I'm not saying we should all work for free. There has been a lot of talk about R GUIs and their use in teaching; I think that the sort of documention I have in mind (or a self-explaining [g]ui) should be included in these considerations.
There are good (though not perfect) introductions in the contrib section of CRAN. There are plenty of "textbooks" on the WWW, in various degrees of quality.
The introductions on CRAN are indeed mostly good, but (necessarily) concise and in many cases do not explain the examples they give, suggesting they are primarily intended for (student) statisticians. As to the textbooks on the web: wouldn't it be a good idea to put links to the better among them in that same CRAN section??
Any scientist has to put in for research equipment. I've never heard of RT-PCR machines being free, nor the computers, nor most tools for most endeavors. Why should statistical tools (knowledge as well as software) be treated in the same way?
So then what exactly is the price of knowledge, and what its currency...? Other than that, I agree. So a colleague pays dearly for his yearly SAS license just because he needs an anova that can handle varying samples-per-subject. (Something I'm sure R can do, but wouldn't know how.)
"Statistics, anyone can do that; you provide a service, you aren't part of the scientific team, we don't need to provide you with tools, since it's more important to pick up reagent X, or our own new Win XP laptops, or an additional conference...". Thankfully, it's been a long while since I've worked with such folk..., though I get inquiries from them, thanks to the two
I hope you're not counting me among them! I can only wish there were a statistics department in the institution I work in (a Prestiguous French College founded by François Ier), where I could get the help I (sometimes) need. Rather than that, each lab hires its own cleaning personel... Thanks anyway for your reaction :) Best, RXO -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
"remy" == Remy X O Martin <Remy> writes:
remy> I understand what you want to say, with a few major
remy> "buts". First, R itself is free. It has an large amount of
remy> equally free documentation, that is probably adequate for
remy> statisticians. Then, of course, science itself is for the
remy> most part free. What we pay for is access to results of
remy> colleagues (who usually don't get very rich out of it
remy> :)). Have you heard of the Budapest Open Access Initiative?
remy> Books are arguably different, but then there are "plenty of
remy> \"textbooks\"" available on various subjects, often in fact
remy> whole courses of considerable quality. Mind you: I'm not
remy> saying we should all work for free.
Science is RARELY free, unless you have access to prestigious
personnel. Second, R is "free" only by certain definitions. It still
costs time and effort to learn how to use; your example of SAS
licensing is perfect. I had a wonderful conversation with a colleague
yesterday who would be happy to use R if she could find a fairly
complete book on "how to do graphics" in R, to the same extent that
SAS's manuals exist. Sure, I can answer nearly all of her questions
(and some of the ones she hasn't realized she needs to ask, yet), but
that won't get the job done -- the few thousands of dollars spent on
SAS and the books are cheap compared with her salary (or my time).
remy> There has been a lot of talk about R GUIs and their use in
remy> teaching; I think that the sort of documention I have in
remy> mind (or a self-explaining [g]ui) should be included in
remy> these considerations.
I would suggest that you (or others) could start plans for one. Part
of the benefit of writing books or other manuscripts, is that in the
process of clarity, you can gain insight into what you do not
understand, or specifying what is needed. There are plenty of people
on this list who are happy to provide occassional (free) help for such
attempts. But they'd like to see the effort first!
remy> The introductions on CRAN are indeed mostly good, but
remy> (necessarily) concise and in many cases do not explain the
remy> examples they give, suggesting they are primarily intended
remy> for (student) statisticians. As to the textbooks on the web:
remy> wouldn't it be a good idea to put links to the better among
remy> them in that same CRAN section??
Then why don't you go through them, and try to phrase what is missing?
I'm sure that the authors would appreciate insight, especially in the
form of contributed sections or examples, even if not necessarily
correct (they'd at least be more easily correctable than by doing it
from scratch.
Yes, it does take time, but everything takes time. Now find someone
with the knowledge to do that, and it's difficult to let them find the
time. Help them out, support current projects (esp documentation
projects), and contribute.
Suggesting projects is pretty silly; nearly any project suggestable
can be approximately solved, though by tools you might not care to
use... (i.e. the GUI part -- it is solved, though people don't care
to learn LISP or Emacs to contribute the little part remaining --
they'd rather roll their own solutions).
best,
-tony
A.J. Rossini Rsrch. Asst. Prof. of Biostatistics U. of Washington Biostatistics rossini at u.washington.edu FHCRC/SCHARP/HIV Vaccine Trials Net rossini at scharp.org -------------- http://software.biostat.washington.edu/ ---------------- FHCRC: M: 206-667-7025 (fax=4812)|Voicemail is pretty sketchy/use Email UW: Th: 206-543-1044 (fax=3286)|Change last 4 digits of phone to FAX (my tuesday/wednesday/friday locations are completely unpredictable.) -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
My apologies for the off-topic post, but this seemed the most likely place to find relevant expertise. Is there anyone who would be able to assist me in configuring GNU Emacs, ESS, and SAS on my Windows NT and Windows 2000 machines? I have been successful running R in ESS, thanks to the excellent examples I've found for helping with .emacs and related files. SAS, however, seems to be a horse of a different colour. Thus far, I can edit .sas files using ESS. However, when I call for SAS ( Alt-X SAS"), Emacs responds: apply: Spawning child process: invalid argument (the above copied from the *Messages* buffer). I suspect this is due to problems with some variable associated with inferior-SAS-program-name or inferior-SAS-args, but I'm getting in over my head, here. Thanks for any pointers - I'd be delighted to be told to read 'document X'. So far, the online documents I've read and the documents embedded within ESS 'docs' directory appear to assume a Unix environment. Cheers, Rob P.S. I will summarize any discoveries/imparted wisdoms to the group, so feel free to respond directly. -- Robert Balshaw, Ph.D. -- Senior Biostatistician, Syreon Corp. -- Phone: (604) 822-5199; Fax: (604) 822-5911 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
"robert" == Robert Balshaw <Rob.Balshaw at syreon.com> writes:
robert> My apologies for the off-topic post, but this seemed the
robert> most likely place to find relevant expertise.
Not even close (to being the most likely). Try the ESS-help mailing
list. Neither Rich nor Rodney read this list very much. Note that
what you are asking might be beyond SAS for win2k. (I think that
there is a batch-submission mode, but it isn't really the
"interactive" version that exists on Unix).
doc/README.SAS or the texinfo files in doc directory for more
information (in the ESS unpacked directory).
best,
-tony
A.J. Rossini Rsrch. Asst. Prof. of Biostatistics U. of Washington Biostatistics rossini at u.washington.edu FHCRC/SCHARP/HIV Vaccine Trials Net rossini at scharp.org -------------- http://software.biostat.washington.edu/ ---------------- FHCRC: M: 206-667-7025 (fax=4812)|Voicemail is pretty sketchy/use Email UW: Th: 206-543-1044 (fax=3286)|Change last 4 digits of phone to FAX (my tuesday/wednesday/friday locations are completely unpredictable.) -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Thanks to Tony Rossini, who directed me to the more appropriate ESS-help mailing list. Rich Heiberger was quick to confirm that Meta-x SAS will not work for Windows. He suggested that I use the editing capabilities and submit the code to an external or batch SAS job. <Sigh> One more reason I should switch to GNU/Linux. Cheers, Rob -- Robert Balshaw, Ph.D. -- Senior Biostatistician, Syreon Corp. -- Phone: (604) 822-5199; Fax: (604) 822-5911
-----Original Message----- My apologies for the off-topic post, but this seemed the most likely place to find relevant expertise. Is there anyone who would be able to assist me in configuring GNU Emacs, ESS, and SAS on my Windows NT and Windows 2000 machines? I have been successful running R in ESS, thanks to the excellent examples I've found for helping with .emacs and related files. SAS, however, seems to be a horse of a different colour. Thus far, I can edit .sas files using ESS. However, when I call for SAS ( Alt-X SAS"), Emacs responds: apply: Spawning child process: invalid argument (the above copied from the *Messages* buffer). I suspect this is due to problems with some variable associated with inferior-SAS-program-name or inferior-SAS-args, but I'm getting in over my head, here. Thanks for any pointers - I'd be delighted to be told to read 'document X'. So far, the online documents I've read and the documents embedded within ESS 'docs' directory appear to assume a Unix environment. Cheers, Rob P.S. I will summarize any discoveries/imparted wisdoms to the group, so feel free to respond directly.
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._