I am struggling to generate p values for comparisons of levels (post-hoc
tests) in a glm with a negative binomial distribution
I am trying to compare cell counts on different days as grown on different
media (e.g. types of cryogel) so I have 2 explanatory variables (Day and
Cryogel), which are both factors, and an over-dispersed count variable
(number of cells) as the response. I know that both variables are
significant, and that there is a significant interaction between them.
However, I seem unable to generate multiple comparisons between the days and
cryogels.
So my model is
Model1<-glm.nb(Cells~Cryogel+Day+Day:Cryogel)
The output gives me comparisons between levels of the factors relative to a
reference level (Day 0 and Cryogel 1) as follows:
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.2040 0.2743 4.389 1.14e-05 ***
Day14 3.3226 0.3440 9.658 < 2e-16 ***
Day28 3.3546 0.3440 9.752 < 2e-16 ***
Day7 3.3638 0.3440 9.779 < 2e-16 ***
Cryogel2 0.7097 0.3655 1.942 0.05215 .
Cryogel3 0.7259 0.3651 1.988 0.04677 *
Cryogel4 1.4191 0.3539 4.010 6.07e-05 ***
Day14:Cryogel2 -0.7910 0.4689 -1.687 0.09162 .
Day28:Cryogel2 -0.5272 0.4685 -1.125 0.26053
Day7:Cryogel2 -1.1794 0.4694 -2.512 0.01199 *
Day14:Cryogel3 -1.0833 0.4691 -2.309 0.02092 *
Day28:Cryogel3 0.1735 0.4733 0.367 0.71395
Day7:Cryogel3 -1.0907 0.4690 -2.326 0.02003 *
Day14:Cryogel4 -1.2834 0.4655 -2.757 0.00583 **
Day28:Cryogel4 -0.6300 0.4591 -1.372 0.16997
Day7:Cryogel4 -1.3436 0.4596 -2.923 0.00347 **
HOWEVER I want ALL the comparisons e.g. Cryogel 2 versus 4, 3 versus 2 etc
on each of the days. I realise that such multiple comparsions need to be
approached with care to avoid Type 1 error, however it is easy to do this in
other programmes (e.g. SPSS, Genstat) and I'm frustrated that it appears to
be difficult in R. I have tried the glht (multcomp) function but it gives me
the same results. I assume that there is some way of entering the data
differently so as to tell R to use a different reference level each time and
re-run the analysis for each level, but don't know what this is.
Please help!
Many thanks for your input
Bryony
--
View this message in context: http://r.789695.n4.nabble.com/Post-hoc-tests-in-MASS-using-glm-nb-tp3526934p3526934.html
Sent from the R help mailing list archive at Nabble.com.
Post-hoc tests in MASS using glm.nb
17 messages · bryony, Bryony Tolhurst, Timothy Bates +8 more
?relevel
Also, you might want to fit the models as follows
Model1 <- glm.nb(Cells ~ Cryogel*Day, data = myData)
myData2 <- within(myData, Cryogel <- relevel(Cryogel, ref = "2"))
Model2 <- update(Model1, data = myData1)
&c
You should always spedify the data set when you fit a model if at all possible. I would recommend you NEVER use attach() to put it on the search path, (under all but the most exceptional circumstances).
You could fit your model as
Model0 <- glm.nv(Cells ~ interaction(Cryogel, Day) - 1, data = myData)
This will give you the subclass means as the regression coefficients. You can then use vcov(Model0) to get the variance matrix and compare any two you like using directly calculated t-statistics. This is pretty straightforward as well.
Bill Venables.
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of bryony
Sent: Tuesday, 17 May 2011 3:46 AM
To: r-help at r-project.org
Subject: [R] Post-hoc tests in MASS using glm.nb
I am struggling to generate p values for comparisons of levels (post-hoc
tests) in a glm with a negative binomial distribution
I am trying to compare cell counts on different days as grown on different
media (e.g. types of cryogel) so I have 2 explanatory variables (Day and
Cryogel), which are both factors, and an over-dispersed count variable
(number of cells) as the response. I know that both variables are
significant, and that there is a significant interaction between them.
However, I seem unable to generate multiple comparisons between the days and
cryogels.
So my model is
Model1<-glm.nb(Cells~Cryogel+Day+Day:Cryogel)
The output gives me comparisons between levels of the factors relative to a
reference level (Day 0 and Cryogel 1) as follows:
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.2040 0.2743 4.389 1.14e-05 ***
Day14 3.3226 0.3440 9.658 < 2e-16 ***
Day28 3.3546 0.3440 9.752 < 2e-16 ***
Day7 3.3638 0.3440 9.779 < 2e-16 ***
Cryogel2 0.7097 0.3655 1.942 0.05215 .
Cryogel3 0.7259 0.3651 1.988 0.04677 *
Cryogel4 1.4191 0.3539 4.010 6.07e-05 ***
Day14:Cryogel2 -0.7910 0.4689 -1.687 0.09162 .
Day28:Cryogel2 -0.5272 0.4685 -1.125 0.26053
Day7:Cryogel2 -1.1794 0.4694 -2.512 0.01199 *
Day14:Cryogel3 -1.0833 0.4691 -2.309 0.02092 *
Day28:Cryogel3 0.1735 0.4733 0.367 0.71395
Day7:Cryogel3 -1.0907 0.4690 -2.326 0.02003 *
Day14:Cryogel4 -1.2834 0.4655 -2.757 0.00583 **
Day28:Cryogel4 -0.6300 0.4591 -1.372 0.16997
Day7:Cryogel4 -1.3436 0.4596 -2.923 0.00347 **
HOWEVER I want ALL the comparisons e.g. Cryogel 2 versus 4, 3 versus 2 etc
on each of the days. I realise that such multiple comparsions need to be
approached with care to avoid Type 1 error, however it is easy to do this in
other programmes (e.g. SPSS, Genstat) and I'm frustrated that it appears to
be difficult in R. I have tried the glht (multcomp) function but it gives me
the same results. I assume that there is some way of entering the data
differently so as to tell R to use a different reference level each time and
re-run the analysis for each level, but don't know what this is.
Please help!
Many thanks for your input
Bryony
--
View this message in context: http://r.789695.n4.nabble.com/Post-hoc-tests-in-MASS-using-glm-nb-tp3526934p3526934.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Dear Bill
Many thanks. I will try this.
One question: why is the attach()function problematic? I have always done it that way (well in my very limited R-using capacity!) as dictated by 'Statistics, an Introduction using R' by Michael Crawley. My dataset is called Side ('Side.txt') so how do I import this data without using attach(data)? I have tried:
side<-read.table('Side.txt',T)
attach(side)
instead of:
data<-read.table('Side.txt',T)
attach(data)
But obviously I am still using the attach function, if not with 'data'!!
Thanks again
Bryony Tolhurst
-----Original Message-----
From: Bill.Venables at csiro.au [mailto:Bill.Venables at csiro.au]
Sent: 17 May 2011 03:21
To: Bryony Tolhurst; r-help at r-project.org
Subject: RE: [R] Post-hoc tests in MASS using glm.nb
?relevel
Also, you might want to fit the models as follows
Model1 <- glm.nb(Cells ~ Cryogel*Day, data = myData)
myData2 <- within(myData, Cryogel <- relevel(Cryogel, ref = "2"))
Model2 <- update(Model1, data = myData1)
&c
You should always spedify the data set when you fit a model if at all possible. I would recommend you NEVER use attach() to put it on the search path, (under all but the most exceptional circumstances).
You could fit your model as
Model0 <- glm.nv(Cells ~ interaction(Cryogel, Day) - 1, data = myData)
This will give you the subclass means as the regression coefficients. You can then use vcov(Model0) to get the variance matrix and compare any two you like using directly calculated t-statistics. This is pretty straightforward as well.
Bill Venables.
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of bryony
Sent: Tuesday, 17 May 2011 3:46 AM
To: r-help at r-project.org
Subject: [R] Post-hoc tests in MASS using glm.nb
I am struggling to generate p values for comparisons of levels (post-hoc
tests) in a glm with a negative binomial distribution
I am trying to compare cell counts on different days as grown on different media (e.g. types of cryogel) so I have 2 explanatory variables (Day and Cryogel), which are both factors, and an over-dispersed count variable (number of cells) as the response. I know that both variables are significant, and that there is a significant interaction between them.
However, I seem unable to generate multiple comparisons between the days and cryogels.
So my model is
Model1<-glm.nb(Cells~Cryogel+Day+Day:Cryogel)
The output gives me comparisons between levels of the factors relative to a reference level (Day 0 and Cryogel 1) as follows:
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.2040 0.2743 4.389 1.14e-05 ***
Day14 3.3226 0.3440 9.658 < 2e-16 ***
Day28 3.3546 0.3440 9.752 < 2e-16 ***
Day7 3.3638 0.3440 9.779 < 2e-16 ***
Cryogel2 0.7097 0.3655 1.942 0.05215 .
Cryogel3 0.7259 0.3651 1.988 0.04677 *
Cryogel4 1.4191 0.3539 4.010 6.07e-05 ***
Day14:Cryogel2 -0.7910 0.4689 -1.687 0.09162 .
Day28:Cryogel2 -0.5272 0.4685 -1.125 0.26053
Day7:Cryogel2 -1.1794 0.4694 -2.512 0.01199 *
Day14:Cryogel3 -1.0833 0.4691 -2.309 0.02092 *
Day28:Cryogel3 0.1735 0.4733 0.367 0.71395
Day7:Cryogel3 -1.0907 0.4690 -2.326 0.02003 *
Day14:Cryogel4 -1.2834 0.4655 -2.757 0.00583 **
Day28:Cryogel4 -0.6300 0.4591 -1.372 0.16997
Day7:Cryogel4 -1.3436 0.4596 -2.923 0.00347 **
HOWEVER I want ALL the comparisons e.g. Cryogel 2 versus 4, 3 versus 2 etc on each of the days. I realise that such multiple comparsions need to be approached with care to avoid Type 1 error, however it is easy to do this in other programmes (e.g. SPSS, Genstat) and I'm frustrated that it appears to be difficult in R. I have tried the glht (multcomp) function but it gives me the same results. I assume that there is some way of entering the data differently so as to tell R to use a different reference level each time and re-run the analysis for each level, but don't know what this is.
Please help!
Many thanks for your input
Bryony
--
View this message in context: http://r.789695.n4.nabble.com/Post-hoc-tests-in-MASS-using-glm-nb-tp3526934p3526934.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Dear Bryony: the suggestion was not to change the name of the data object, but to explicitly tell glm.nb what dataset it should look in to find the variables you mention in the formula. so the salient difference is: m1 <- glm.nb(Cells ~ Cryogel*Day, data = side) instead of attach(side) m1 <- glm.nb(Cells ~ Cryogel*Day) This works for other functions also, but not uniformly as yet (how I wish it did and I could say hist(x, data=side) Instead of hist(side$x) this inconsistency encourages the need for attach() best, tim
On 17 May 2011, at 9:09 AM, Bryony Tolhurst wrote:
Dear Bill
Many thanks. I will try this.
One question: why is the attach()function problematic? I have always done it that way (well in my very limited R-using capacity!) as dictated by 'Statistics, an Introduction using R' by Michael Crawley. My dataset is called Side ('Side.txt') so how do I import this data without using attach(data)? I have tried:
side<-read.table('Side.txt',T)
attach(side)
instead of:
data<-read.table('Side.txt',T)
attach(data)
But obviously I am still using the attach function, if not with 'data'!!
Thanks again
Bryony Tolhurst
-----Original Message-----
From: Bill.Venables at csiro.au [mailto:Bill.Venables at csiro.au]
Sent: 17 May 2011 03:21
To: Bryony Tolhurst; r-help at r-project.org
Subject: RE: [R] Post-hoc tests in MASS using glm.nb
?relevel
Also, you might want to fit the models as follows
Model1 <- glm.nb(Cells ~ Cryogel*Day, data = myData)
myData2 <- within(myData, Cryogel <- relevel(Cryogel, ref = "2"))
Model2 <- update(Model1, data = myData1)
&c
You should always spedify the data set when you fit a model if at all possible. I would recommend you NEVER use attach() to put it on the search path, (under all but the most exceptional circumstances).
You could fit your model as
Model0 <- glm.nv(Cells ~ interaction(Cryogel, Day) - 1, data = myData)
This will give you the subclass means as the regression coefficients. You can then use vcov(Model0) to get the variance matrix and compare any two you like using directly calculated t-statistics. This is pretty straightforward as well.
Bill Venables.
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of bryony
Sent: Tuesday, 17 May 2011 3:46 AM
To: r-help at r-project.org
Subject: [R] Post-hoc tests in MASS using glm.nb
I am struggling to generate p values for comparisons of levels (post-hoc
tests) in a glm with a negative binomial distribution
I am trying to compare cell counts on different days as grown on different media (e.g. types of cryogel) so I have 2 explanatory variables (Day and Cryogel), which are both factors, and an over-dispersed count variable (number of cells) as the response. I know that both variables are significant, and that there is a significant interaction between them.
However, I seem unable to generate multiple comparisons between the days and cryogels.
So my model is
Model1<-glm.nb(Cells~Cryogel+Day+Day:Cryogel)
The output gives me comparisons between levels of the factors relative to a reference level (Day 0 and Cryogel 1) as follows:
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.2040 0.2743 4.389 1.14e-05 ***
Day14 3.3226 0.3440 9.658 < 2e-16 ***
Day28 3.3546 0.3440 9.752 < 2e-16 ***
Day7 3.3638 0.3440 9.779 < 2e-16 ***
Cryogel2 0.7097 0.3655 1.942 0.05215 .
Cryogel3 0.7259 0.3651 1.988 0.04677 *
Cryogel4 1.4191 0.3539 4.010 6.07e-05 ***
Day14:Cryogel2 -0.7910 0.4689 -1.687 0.09162 .
Day28:Cryogel2 -0.5272 0.4685 -1.125 0.26053
Day7:Cryogel2 -1.1794 0.4694 -2.512 0.01199 *
Day14:Cryogel3 -1.0833 0.4691 -2.309 0.02092 *
Day28:Cryogel3 0.1735 0.4733 0.367 0.71395
Day7:Cryogel3 -1.0907 0.4690 -2.326 0.02003 *
Day14:Cryogel4 -1.2834 0.4655 -2.757 0.00583 **
Day28:Cryogel4 -0.6300 0.4591 -1.372 0.16997
Day7:Cryogel4 -1.3436 0.4596 -2.923 0.00347 **
HOWEVER I want ALL the comparisons e.g. Cryogel 2 versus 4, 3 versus 2 etc on each of the days. I realise that such multiple comparsions need to be approached with care to avoid Type 1 error, however it is easy to do this in other programmes (e.g. SPSS, Genstat) and I'm frustrated that it appears to be difficult in R. I have tried the glht (multcomp) function but it gives me the same results. I assume that there is some way of entering the data differently so as to tell R to use a different reference level each time and re-run the analysis for each level, but don't know what this is.
Please help!
Many thanks for your input
Bryony
--
View this message in context: http://r.789695.n4.nabble.com/Post-hoc-tests-in-MASS-using-glm-nb-tp3526934p3526934.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On May 17, 2011, at 4:09 AM, Bryony Tolhurst wrote:
Dear Bill Many thanks. I will try this. One question: why is the attach()function problematic? I have always done it that way (well in my very limited R-using capacity!) as dictated by 'Statistics, an Introduction using R' by Michael Crawley.
You should pick your dictators with more care.
My dataset is called Side ('Side.txt') so how do I import this data
without using attach(data)?
The data object is there as soon as you execute the read.table function successfully.
I have tried:
side<-read.table('Side.txt',T)
# NOT attach(side)
The regression functions in R generally have a data argument, so you would use this (as Bill already told you)
Model1 <- glm.nb(Cells ~ Cryogel*Day, data = side)
instead of:
data<-read.table('Side.txt',T)
attach(data)
But obviously I am still using the attach function, if not with
'data'!!
Right. There were two problems and you only addressed one of them.
David. > > Thanks again > > Bryony Tolhurst > > -----Original Message----- > From: Bill.Venables at csiro.au [mailto:Bill.Venables at csiro.au] > Sent: 17 May 2011 03:21 > To: Bryony Tolhurst; r-help at r-project.org > Subject: RE: [R] Post-hoc tests in MASS using glm.nb > > ?relevel > > Also, you might want to fit the models as follows > > Model1 <- glm.nb(Cells ~ Cryogel*Day, data = myData) > > myData2 <- within(myData, Cryogel <- relevel(Cryogel, ref = "2")) > Model2 <- update(Model1, data = myData1) > > &c > > You should always spedify the data set when you fit a model if at > all possible. I would recommend you NEVER use attach() to put it on > the search path, (under all but the most exceptional circumstances). > > You could fit your model as > > Model0 <- glm.nv(Cells ~ interaction(Cryogel, Day) - 1, data = myData) > > This will give you the subclass means as the regression > coefficients. You can then use vcov(Model0) to get the variance > matrix and compare any two you like using directly calculated t- > statistics. This is pretty straightforward as well. > > Bill Venables. > > > -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org > ] On Behalf Of bryony > Sent: Tuesday, 17 May 2011 3:46 AM > To: r-help at r-project.org > Subject: [R] Post-hoc tests in MASS using glm.nb > > I am struggling to generate p values for comparisons of levels (post- > hoc > tests) in a glm with a negative binomial distribution > > I am trying to compare cell counts on different days as grown on > different media (e.g. types of cryogel) so I have 2 explanatory > variables (Day and Cryogel), which are both factors, and an over- > dispersed count variable (number of cells) as the response. I know > that both variables are significant, and that there is a significant > interaction between them. > However, I seem unable to generate multiple comparisons between the > days and cryogels. > > So my model is > > Model1<-glm.nb(Cells~Cryogel+Day+Day:Cryogel) > > The output gives me comparisons between levels of the factors > relative to a reference level (Day 0 and Cryogel 1) as follows: > > Coefficients: > Estimate Std. Error z value Pr(>|z|) > (Intercept) 1.2040 0.2743 4.389 1.14e-05 *** > Day14 3.3226 0.3440 9.658 < 2e-16 *** > Day28 3.3546 0.3440 9.752 < 2e-16 *** > Day7 3.3638 0.3440 9.779 < 2e-16 *** > Cryogel2 0.7097 0.3655 1.942 0.05215 . > Cryogel3 0.7259 0.3651 1.988 0.04677 * > Cryogel4 1.4191 0.3539 4.010 6.07e-05 *** > Day14:Cryogel2 -0.7910 0.4689 -1.687 0.09162 . > Day28:Cryogel2 -0.5272 0.4685 -1.125 0.26053 > Day7:Cryogel2 -1.1794 0.4694 -2.512 0.01199 * > Day14:Cryogel3 -1.0833 0.4691 -2.309 0.02092 * > Day28:Cryogel3 0.1735 0.4733 0.367 0.71395 > Day7:Cryogel3 -1.0907 0.4690 -2.326 0.02003 * > Day14:Cryogel4 -1.2834 0.4655 -2.757 0.00583 ** > Day28:Cryogel4 -0.6300 0.4591 -1.372 0.16997 > Day7:Cryogel4 -1.3436 0.4596 -2.923 0.00347 ** > > > HOWEVER I want ALL the comparisons e.g. Cryogel 2 versus 4, 3 versus > 2 etc on each of the days. I realise that such multiple comparsions > need to be approached with care to avoid Type 1 error, however it is > easy to do this in other programmes (e.g. SPSS, Genstat) and I'm > frustrated that it appears to be difficult in R. I have tried the > glht (multcomp) function but it gives me the same results. I assume > that there is some way of entering the data differently so as to > tell R to use a different reference level each time and re-run the > analysis for each level, but don't know what this is. > Please help! > > Many thanks for your input > > Bryony > > -- > View this message in context: http://r.789695.n4.nabble.com/Post-hoc-tests-in-MASS-using-glm-nb-tp3526934p3526934.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT
On 2011-05-17 02:22, Timothy Bates wrote:
Dear Bryony: the suggestion was not to change the name of the data object, but to explicitly tell glm.nb what dataset it should look in to find the variables you mention in the formula.
so the salient difference is:
m1<- glm.nb(Cells ~ Cryogel*Day, data = side)
instead of
attach(side)
m1<- glm.nb(Cells ~ Cryogel*Day)
This works for other functions also, but not uniformly as yet (how I wish it did and I could say
hist(x, data=side)
Instead of
hist(side$x)
this inconsistency encourages the need for attach()
Only if the user hasn't yet been introduced to the with() function, which is linked to on the ?attach page. Note also this sentence from the ?attach page: ".... attach can lead to confusion." I can't remember the last time I needed attach(). Peter Ehlers [... non-germane material snipped ...]
Folks:
Only if the user hasn't yet been introduced to the with() function, which is linked to on the ?attach page. Note also this sentence from the ?attach page: ?".... attach can lead to confusion." I can't remember the last time I needed attach(). Peter Ehlers
Yes. But perhaps it might be useful to flesh this out with a bit of commentary. To this end, I invite others to correct or clarify the following. The potential "confusion" comes from requiring R to search for the data. There is a rigorous process by which this is done, of course, but it requires that the runtime environment be consistent with that process, and the programmer who wrote the code may not have control over that environment. The usual example is that one has an object named,say, "a" in the formula and in the attached data and another "a" also in the global environment. Then the wrong "a" would be found. The same thing can happen if another data set gets attached in a position before the one of interest. (Like Peter, I haven't used attach() in so long that I don't know whether any warning messages are issued in such cases). Using the "data = " argument when available or the with() function when not avoids this potential confusion and tightly couples the data to be analyzed with the analysis. I hope this clarifies the previous posters' comments. Cheers, Bert
[... non-germane material snipped ...]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
"Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions." -- Maimonides (1135-1204) Bert Gunter Genentech Nonclinical Biostatistics
Amen to all of that, Bert. Nicely put. The google style guide (not perfect, but a thoughtful contribution on these kinds of issues, has avoiding attach() as its very first line. See http://google-styleguide.googlecode.com/svn/trunk/google-r-style.html) I would add, though, that not enough people seem yet to be aware of within(...), a companion of with(...) in a way, but used for modifying data frames or other kinds of list objects. It should be seen as a more flexible replacement for transform() (well, almost). The difference between with() and within() is as follows: with(data, expr, ...) allows you to evaluate 'expr' with 'data' providing the primary source for variables, and returns *the evaluated expression* as the result. By contrast within(data, expr, ...) again uses 'data' as the primary source for variables when evaluating 'expr', but now 'expr' is used to modify the varibles in 'data' and returns *the modified data set* as the result. I use this a lot in the data preparation phase of a project, especially, which is usually the longest, trickiest, most important, but least discussed aspect of any data analysis project. Here is a simple example using within() for something you cannot do in one step with transform(): polyData <- within(data.frame(x = runif(500)), { x2 <- x^2 x3 <- x*x2 b <- runif(4) eta <- cbind(1,x,x2,x3) %*% b y <- eta + rnorm(x, sd = 0.5) rm(b) }) check:
str(polyData)
'data.frame': 500 obs. of 5 variables: $ x : num 0.5185 0.185 0.5566 0.2467 0.0178 ... $ y : num [1:500, 1] 1.343 0.888 0.583 0.187 0.855 ... $ eta: num [1:500, 1] 1.258 0.788 1.331 0.856 0.63 ... $ x3 : num 1.39e-01 6.33e-03 1.72e-01 1.50e-02 5.60e-06 ... $ x2 : num 0.268811 0.034224 0.309802 0.060844 0.000315 ...
Bill Venables. -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Bert Gunter Sent: Wednesday, 18 May 2011 12:08 AM To: Peter Ehlers Cc: R list Subject: Re: [R] Post-hoc tests in MASS using glm.nb Folks:
Only if the user hasn't yet been introduced to the with() function, which is linked to on the ?attach page. Note also this sentence from the ?attach page: ?".... attach can lead to confusion." I can't remember the last time I needed attach(). Peter Ehlers
Yes. But perhaps it might be useful to flesh this out with a bit of commentary. To this end, I invite others to correct or clarify the following. The potential "confusion" comes from requiring R to search for the data. There is a rigorous process by which this is done, of course, but it requires that the runtime environment be consistent with that process, and the programmer who wrote the code may not have control over that environment. The usual example is that one has an object named,say, "a" in the formula and in the attached data and another "a" also in the global environment. Then the wrong "a" would be found. The same thing can happen if another data set gets attached in a position before the one of interest. (Like Peter, I haven't used attach() in so long that I don't know whether any warning messages are issued in such cases). Using the "data = " argument when available or the with() function when not avoids this potential confusion and tightly couples the data to be analyzed with the analysis. I hope this clarifies the previous posters' comments. Cheers, Bert
[... non-germane material snipped ...]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
"Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions." -- Maimonides (1135-1204) Bert Gunter Genentech Nonclinical Biostatistics ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
1 day later
[modified 'Subject' on purpose; Good mail readers will still thread correctly, using the 'References' and 'In-Reply-To' headers, however, unfortunately, in my limited experience, good mail readers seem to disappear more and more .. ]
Peter Ehlers <ehlers at ucalgary.ca>
on Tue, 17 May 2011 06:08:30 -0700 writes:
> On 2011-05-17 02:22, Timothy Bates wrote:
>> Dear Bryony: the suggestion was not to change the name of
>> the data object, but to explicitly tell glm.nb what
>> dataset it should look in to find the variables you
>> mention in the formula.
>>
>> so the salient difference is:
>>
>> m1<- glm.nb(Cells ~ Cryogel*Day, data = side)
>>
>> instead of
>>
>> attach(side) m1<- glm.nb(Cells ~ Cryogel*Day)
>>
>> This works for other functions also, but not uniformly as
>> yet (how I wish it did and I could say hist(x, data=side)
>> Instead of hist(side$x)
>>
>> this inconsistency encourages the need for attach()
> Only if the user hasn't yet been introduced to the with()
> function, which is linked to on the ?attach page.
> Note also this sentence from the ?attach page:
> ".... attach can lead to confusion."
> I can't remember the last time I needed attach().
> Peter Ehlers
Well, then you don't know *THE ONE* case where modern users of
R should use attach() ... as I have been teaching for a while,
but seem not have got enought students listening ;-) ...
--- Use it instead of load() {for save()d R objects} ---
The advantage of attach() over load() there is that loaded
objects (and there maye be a bunch!), are put into a separate
place in the search path and will not accidentally overwrite
objects in the global "workspace".
Of course, there are still quite a few situations {e.g. in
typical BATCH use of R for simulations, or Sweaving, etc} where
load() is good enough, and the extras of using attach() are not
worth it.
But the unconditional "do not use attach()"
is not quite ok,
at least not when you talk to non-beginners.
Martin Maechler, ETH Zurich
Hmm, load() does have an 'envir' argument. So you could simply use that and with() (which is pretty much what attach() does internally). If people really wanted a lazy approach, with() could be extended to allow file names (as attach does).
On Thu, 19 May 2011, Martin Maechler wrote:
[modified 'Subject' on purpose; Good mail readers will still thread correctly, using the 'References' and 'In-Reply-To' headers, however, unfortunately, in my limited experience, good mail readers seem to disappear more and more .. ]
Peter Ehlers <ehlers at ucalgary.ca>
on Tue, 17 May 2011 06:08:30 -0700 writes:
> On 2011-05-17 02:22, Timothy Bates wrote:
>> Dear Bryony: the suggestion was not to change the name of >> the data object, but to explicitly tell glm.nb what >> dataset it should look in to find the variables you >> mention in the formula. >> >> so the salient difference is: >> >> m1<- glm.nb(Cells ~ Cryogel*Day, data = side) >> >> instead of >> >> attach(side) m1<- glm.nb(Cells ~ Cryogel*Day) >> >> This works for other functions also, but not uniformly as >> yet (how I wish it did and I could say hist(x, data=side) >> Instead of hist(side$x) >> >> this inconsistency encourages the need for attach()
> Only if the user hasn't yet been introduced to the with() > function, which is linked to on the ?attach page.
> Note also this sentence from the ?attach page: > ".... attach can lead to confusion."
> I can't remember the last time I needed attach(). > Peter Ehlers
Well, then you don't know *THE ONE* case where modern users of
R should use attach() ... as I have been teaching for a while,
but seem not have got enought students listening ;-) ...
--- Use it instead of load() {for save()d R objects} ---
The advantage of attach() over load() there is that loaded
objects (and there maye be a bunch!), are put into a separate
place in the search path and will not accidentally overwrite
objects in the global "workspace".
Of course, there are still quite a few situations {e.g. in
typical BATCH use of R for simulations, or Sweaving, etc} where
load() is good enough, and the extras of using attach() are not
worth it.
But the unconditional "do not use attach()"
is not quite ok,
at least not when you talk to non-beginners.
Martin Maechler, ETH Zurich
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
The most interesting thing in this thread for me was "within()" - that is what I want 99% of the time, but I had only ever heard of with()
A real boon! Thanks Bill V!
The help for within gives this example for with(), which seems unnecessary, as glm supports data=
with(anorexia, {
anorex.1 <- glm(Postwt ~ Prewt + Treat + offset(Prewt), family = gaussian)
summary(anorex.1)
})
especially when this is easier to read:
m <- glm(Postwt ~ Prewt + Treat + offset(Prewt), family = gaussian, data=anorexia)
summary(m)
The Rd file also has (just?) one example for within? I suggest deleting the anorexia example so the within example comes first, and perhaps supplementing with
## Not run:
# More advanced example of usage (not run)
model1 <- glm.nb(Cells ~ Cryogel*Day, data = myData)
myData2 <- within(myData, Cryogel <- relevel(Cryogel, ref = "2"))
model2 <- update(model1, data = myData1)
## End(Not run)
Also, the help page for transform does not back link to within? That would be helpful. It also uses an example with attach, which might encourage less good habits.
Best wishes, and thanks,
tim
PS : How does one best contribute to help files?
PPS: If R was to go to google summer of code, apart from helpful help, my wishlist would prioritise standardising all common functions to (where possible)
1. Have a formula interface
2. Support ?data=?
3. Use a uniform na handling, with examples
Diverse NA behavior catches most of our students out (now all three departments of our school using R exclusively at the PG level)
NA behaviour can be
na.rm= as in rowMeans(na.rm=T))
use= as in an cor(use=?every?) # with well documented effects in ?cor(), and all legal options shown (everything|all.obs|complete.obs|na.or.complete|pairwise.complete.obs)
na.action= as in t.test(na.action=?na.omit?) # with no examples in ?t.test, nor a list of legal values
Likewise it would help reduce confusion when
plot(~Source, data=Df) # works
# but
boxplot(~Source, data=Df)
# Error in boxplot.formula(~Source, data = Df) :
# 'formula' missing or incorrect
The formula isn?t missing or illformed, it?s that boxplot requires formulae to have something on the left hand side.
best, t
From: Prof Brian Ripley
Hmm, load() does have an 'envir' argument. So you could simply use that and with() (which is pretty much what attach() does internally). If people really wanted a lazy approach, with() could be extended to allow file names (as attach does).
I'm not sure if laziness like this should be encouraged. If I may bring up another "black hole": IMHO the formula interface allows too much flexibility (perhaps to allow some laziness?) that beginners and even non-beginners fall into its various traps a bit too often, and sometimes not even aware of it. It would be great if there's a way to (optionally?) limit the scope of where a formula looks for variables. Just my $0.02... Andy
On Thu, 19 May 2011, Martin Maechler wrote:
[modified 'Subject' on purpose; Good mail readers will still thread correctly, using the
'References'
and 'In-Reply-To' headers, however, unfortunately, in my limited experience, good mail readers seem to
disappear more and more ..
]
Peter Ehlers <ehlers at ucalgary.ca>
on Tue, 17 May 2011 06:08:30 -0700 writes:
> On 2011-05-17 02:22, Timothy Bates wrote:
>> Dear Bryony: the suggestion was not to change the name of >> the data object, but to explicitly tell glm.nb what >> dataset it should look in to find the variables you >> mention in the formula. >> >> so the salient difference is: >> >> m1<- glm.nb(Cells ~ Cryogel*Day, data = side) >> >> instead of >> >> attach(side) m1<- glm.nb(Cells ~ Cryogel*Day) >> >> This works for other functions also, but not uniformly as >> yet (how I wish it did and I could say hist(x, data=side) >> Instead of hist(side$x) >> >> this inconsistency encourages the need for attach()
> Only if the user hasn't yet been introduced to the with() > function, which is linked to on the ?attach page.
> Note also this sentence from the ?attach page: > ".... attach can lead to confusion."
> I can't remember the last time I needed attach(). > Peter Ehlers
Well, then you don't know *THE ONE* case where modern users of
R should use attach() ... as I have been teaching for a while,
but seem not have got enought students listening ;-) ...
--- Use it instead of load() {for save()d R objects} ---
The advantage of attach() over load() there is that loaded
objects (and there maye be a bunch!), are put into a separate
place in the search path and will not accidentally overwrite
objects in the global "workspace".
Of course, there are still quite a few situations {e.g. in
typical BATCH use of R for simulations, or Sweaving, etc} where
load() is good enough, and the extras of using attach() are not
worth it.
But the unconditional "do not use attach()"
is not quite ok,
at least not when you talk to non-beginners.
Martin Maechler, ETH Zurich
-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Notice: This e-mail message, together with any attachme...{{dropped:11}}
On 19/05/2011 9:32 AM, Timothy Bates wrote:
The most interesting thing in this thread for me was "within()" - that is what I want 99% of the time, but I had only ever heard of with()
A real boon! Thanks Bill V!
The help for within gives this example for with(), which seems unnecessary, as glm supports data=
with(anorexia, {
anorex.1<- glm(Postwt ~ Prewt + Treat + offset(Prewt), family = gaussian)
summary(anorex.1)
})
especially when this is easier to read:
m<- glm(Postwt ~ Prewt + Treat + offset(Prewt), family = gaussian, data=anorexia)
summary(m)
The Rd file also has (just?) one example for within? I suggest deleting the anorexia example so the within example comes first, and perhaps supplementing with
## Not run:
# More advanced example of usage (not run)
model1<- glm.nb(Cells ~ Cryogel*Day, data = myData)
myData2<- within(myData, Cryogel<- relevel(Cryogel, ref = "2"))
model2<- update(model1, data = myData1)
## End(Not run)
Also, the help page for transform does not back link to within? That would be helpful. It also uses an example with attach, which might encourage less good habits.
Best wishes, and thanks,
tim
PS : How does one best contribute to help files?
One way is to post suggestions to the R-devel list. For the suggestion to add the See also link, that's probably best (but no need to do it now, I'll add it). The problem with posting suggestions to R-help is that they are more likely to get lost. For more extensive suggestions, it's best if you actually do the work and post a patch. Coming up with good examples is hard, and so often the response to a suggestion like "the example should be better" is "I agree", rather than actually changing it. But if you put together an example that is obviously better, it's easy to just paste it in, and so more likely to happen. In this particular case, I don't agree that your change is an improvement. "Not run" examples are bad, because they are rarely tested. A more compelling use of with() would be a nice change to the help page, if you want to put one together.
PPS: If R was to go to google summer of code, apart from helpful help, my wishlist would prioritise standardising all common functions to (where possible) 1. Have a formula interface 2. Support ?data=? 3. Use a uniform na handling, with examples
R does have GSOC projects in the works. A call for proposals was posted to the R-devel list back in February; you can read about it here: http://rwiki.sciviews.org/doku.php?id=developers:projects:gsoc2011, with links to current discussion.
Diverse NA behavior catches most of our students out (now all three departments of our school using R exclusively at the PG level) NA behaviour can be na.rm= as in rowMeans(na.rm=T)) use= as in an cor(use=?every?) # with well documented effects in ?cor(), and all legal options shown (everything|all.obs|complete.obs|na.or.complete|pairwise.complete.obs) na.action= as in t.test(na.action=?na.omit?) # with no examples in ?t.test, nor a list of legal values Likewise it would help reduce confusion when plot(~Source, data=Df) # works # but boxplot(~Source, data=Df) # Error in boxplot.formula(~Source, data = Df) : # 'formula' missing or incorrect The formula isn?t missing or illformed, it?s that boxplot requires formulae to have something on the left hand side.
Tested fixes should be posted to R-devel. Generally patches should be based on the R-devel (trunk) source, available at https://svn.r-project.org/R/trunk. Duncan Murdoch
On May 19, 2011, at 9:32 AM, Timothy Bates wrote:
Likewise it would help reduce confusion when plot(~Source, data=Df) # works # but boxplot(~Source, data=Df) # Error in boxplot.formula(~Source, data = Df) : # 'formula' missing or incorrect The formula isn?t missing or illformed, it?s that boxplot requires formulae to have something on the left hand side.
It may not be "illformed" but the error message says instead that it is "incorrect". Read the help page for boxplot (which would be the first thing a right-thinking person would do upon getting such a message) and it clearly implies that there needs to be a LHS component. I would rather the R Core spend its time on other areas.
David Winsemius, MD West Hartford, CT
On 19/05/2011 10:34 AM, Liaw, Andy wrote:
From: Prof Brian Ripley
Hmm, load() does have an 'envir' argument. So you could simply use that and with() (which is pretty much what attach() does internally). If people really wanted a lazy approach, with() could be extended to allow file names (as attach does).
I'm not sure if laziness like this should be encouraged. If I may bring up another "black hole": IMHO the formula interface allows too much flexibility (perhaps to allow some laziness?) that beginners and even non-beginners fall into its various traps a bit too often, and sometimes not even aware of it. It would be great if there's a way to (optionally?) limit the scope of where a formula looks for variables.
I think there is: put the variables in an environment that doesn't have parents covering everything visible, and use that as the environment of the formula. For example, you could follow x <- 1:10 y <- rnorm(10) z <- rnorm(10) f <- y ~ x + z with e <- new.env(parent=baseenv()) e$x <- x e$y <- y environment(f) <- e and you'll get a failure with lm(f) because it can't find z. Obviously this could be wrapped in a friendlier function if you really wanted it. Duncan Murdoch
Just my $0.02... Andy
On Thu, 19 May 2011, Martin Maechler wrote:
> [modified 'Subject' on purpose; > Good mail readers will still thread correctly, using the
'References'
> and 'In-Reply-To' headers, however, unfortunately, > in my limited experience, good mail readers seem to
disappear more and more ..
> ] >
>>>>>> Peter Ehlers<ehlers at ucalgary.ca> >>>>>> on Tue, 17 May 2011 06:08:30 -0700 writes:
>
> > On 2011-05-17 02:22, Timothy Bates wrote:
> >> Dear Bryony: the suggestion was not to change the name of > >> the data object, but to explicitly tell glm.nb what > >> dataset it should look in to find the variables you > >> mention in the formula. > >> > >> so the salient difference is: > >> > >> m1<- glm.nb(Cells ~ Cryogel*Day, data = side) > >> > >> instead of > >> > >> attach(side) m1<- glm.nb(Cells ~ Cryogel*Day) > >> > >> This works for other functions also, but not uniformly as > >> yet (how I wish it did and I could say hist(x, data=side) > >> Instead of hist(side$x) > >> > >> this inconsistency encourages the need for attach()
>
> > Only if the user hasn't yet been introduced to the with() > > function, which is linked to on the ?attach page.
>
> > Note also this sentence from the ?attach page: > > ".... attach can lead to confusion."
>
> > I can't remember the last time I needed attach(). > > Peter Ehlers
>
> Well, then you don't know *THE ONE* case where modern users of
> R should use attach() ... as I have been teaching for a while,
> but seem not have got enought students listening ;-) ...
>
> --- Use it instead of load() {for save()d R objects} ---
>
> The advantage of attach() over load() there is that loaded
> objects (and there maye be a bunch!), are put into a separate
> place in the search path and will not accidentally overwrite
> objects in the global "workspace".
>
> Of course, there are still quite a few situations {e.g. in
> typical BATCH use of R for simulations, or Sweaving, etc} where
> load() is good enough, and the extras of using attach() are not
> worth it.
>
> But the unconditional "do not use attach()"
> is not quite ok,
> at least not when you talk to non-beginners.
>
> Martin Maechler, ETH Zurich
-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Notice: This e-mail message, together with any attachme...{{dropped:11}}
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On Thu, 19 May 2011, Liaw, Andy wrote:
From: Prof Brian Ripley
Hmm, load() does have an 'envir' argument. So you could simply use that and with() (which is pretty much what attach() does internally). If people really wanted a lazy approach, with() could be extended to allow file names (as attach does).
I'm not sure if laziness like this should be encouraged.
As you can tell from my wording, I was not (and indeed consider the change to attach() to accept .rda files to have been a mistake).
If I may bring up another "black hole": IMHO the formula interface allows too much flexibility (perhaps to allow some laziness?) that beginners and even non-beginners fall into its various traps a bit too often, and sometimes not even aware of it. It would be great if there's a way to (optionally?) limit the scope of where a formula looks for variables.
Note, there is no such thing as 'the formula interface'. There are several, but I expect you mean model.frame.default, for that is what is most commonly used to collect together the variables from which to interpret a formula. Now in my view model.frame suffers from having been designed for S's scoping rules and subverted for R by users of lexical scoping. So it has to work for both approaches (or break even more legacy code than the 1.2.0 changes did). In my experience the most common trap is to pick up variables in the workspace when 'data' (or 'newdata') is supplied, and that is unavoidable if you allow lexical scoping as the workspace is almost always in the environment of the formula. You can change the latter of course, but I suspect that in 99.9% of cases if a user supplies 'data' he intends to get all the variables from 'data'. This is not really the place for such a disussion, so if you want to pursue it please restart it on R-devel.
Just my $0.02... Andy
On Thu, 19 May 2011, Martin Maechler wrote:
[modified 'Subject' on purpose; Good mail readers will still thread correctly, using the
'References'
and 'In-Reply-To' headers, however, unfortunately, in my limited experience, good mail readers seem to
disappear more and more ..
]
Peter Ehlers <ehlers at ucalgary.ca>
on Tue, 17 May 2011 06:08:30 -0700 writes:
> On 2011-05-17 02:22, Timothy Bates wrote:
>> Dear Bryony: the suggestion was not to change the name of >> the data object, but to explicitly tell glm.nb what >> dataset it should look in to find the variables you >> mention in the formula. >> >> so the salient difference is: >> >> m1<- glm.nb(Cells ~ Cryogel*Day, data = side) >> >> instead of >> >> attach(side) m1<- glm.nb(Cells ~ Cryogel*Day) >> >> This works for other functions also, but not uniformly as >> yet (how I wish it did and I could say hist(x, data=side) >> Instead of hist(side$x) >> >> this inconsistency encourages the need for attach()
> Only if the user hasn't yet been introduced to the with() > function, which is linked to on the ?attach page.
> Note also this sentence from the ?attach page: > ".... attach can lead to confusion."
> I can't remember the last time I needed attach(). > Peter Ehlers
Well, then you don't know *THE ONE* case where modern users of
R should use attach() ... as I have been teaching for a while,
but seem not have got enought students listening ;-) ...
--- Use it instead of load() {for save()d R objects} ---
The advantage of attach() over load() there is that loaded
objects (and there maye be a bunch!), are put into a separate
place in the search path and will not accidentally overwrite
objects in the global "workspace".
Of course, there are still quite a few situations {e.g. in
typical BATCH use of R for simulations, or Sweaving, etc} where
load() is good enough, and the extras of using attach() are not
worth it.
But the unconditional "do not use attach()"
is not quite ok,
at least not when you talk to non-beginners.
Martin Maechler, ETH Zurich
-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Notice: This e-mail message, together with any attachme...{{dropped:11}}
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Martin Maechler writes:
Well, then you don't know *THE ONE* case where modern users of
R should use attach() ... as I have been teaching for a while,
but seem not have got enought students listening ;-) ...
--- Use it instead of load() {for save()d R objects} ---
The advantage of attach() over load() there is that loaded
objects (and there maye be a bunch!), are put into a separate
place in the search path and will not accidentally overwrite
objects in the global "workspace".
Of course, there are still quite a few situations {e.g. in
typical BATCH use of R for simulations, or Sweaving, etc} where
load() is good enough, and the extras of using attach() are not
worth it.
But the unconditional "do not use attach()"
is not quite ok,
at least not when you talk to non-beginners.
Martin Maechler, ETH Zurich
Curiously this is why I wrote the SOAR package. Instead of save() you use Store() (Frank Harrell had already taken 'Save') and instead of attach() use Attach(). The objects are saved as separate rda files in a special subdirectory and when Attach() is used on it they are placed on the search path, normally at position 2, as promises. The global environment is kept relatively free of extraneous objects (if you want to work like that) and the operation is fast and very low on memory use, (unless you want to use every object you see all at once, of course). The "track" package from Tony Plate achieves somewhat similar results but with a different method. Essentially it implements something very like the old S-PLUS style of keeping everything out of memory as files in a .Data subdirectory, (although Tony uses the name 'rdatadir'), unless actually in use. Bill Venables.