An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/r-help/attachments/20080330/aef483cd/attachment.pl
Finding a mean value of a variable holding a dummy variable fixed
6 messages · Alexander Ovodenko, Bill Venables, Simon Blomberg +1 more
The mean problem can be solved with
president=c("Johnson","Johnson","Johnson","Johnson","Johnson","Johnson","Nix
on","Nixon","Nixon","Nixon","Nixon","Nixon")
approval=seq(1:12)
tapply(approval,president,mean)
For the other, I will try to come back. But I am sure somebody will be
faster than I.
Cheers,
Daniel
-------------------------
cuncta stricte discussurus
-------------------------
-----Urspr?ngliche Nachricht-----
Von: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] Im
Auftrag von Alexander Ovodenko
Gesendet: Sunday, March 30, 2008 9:47 PM
An: r-help at r-project.org
Betreff: [R] Finding a mean value of a variable holding a dummy
variablefixed
I have time-series data on approval ratings of British Prime Ministers. The
prime ministers dating from MacMillan onward till today are coded as dummy
variables and the approval ratings are entered for each month. I want to
know the mean value of the approval rating of each Prime Minister in the
dataset and the approval rating during his/her first month and last month as
PM. What R code should I enter for these data? In other words, I want hold
the dummy corresponding to each Prime Minister fixed at value one and know
the first rating that PM has, the last rating s/he has, and the mean rating
s/he has. Thanks.
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
I found a solution. It's probably not the easiest one, but it works. It
assumes that your data frame is ordered from earliest to latest record for
each president, but it can be easily adjusted if you want to make it
dependent on a third column. The final vector "index" gives you the line
indices for the first record for each president. If you replace "min" by
"max" you get the last instead of the first record. You can then find the
values by
##Sample data
president=c("Johnson","Johnson","Johnson","Johnson","Johnson","Johnson","Nix
on","Nixon","Nixon","Nixon","Nixon","Nixon")
approval=c(3,4,5,6,7,8,6,5,4,3,2,1)
tapply(approval,president,mean)
##Find index for first row of each president; assumes ascending order of
observations; change "min" to "max" to find last record
index=NULL
for(i in 1:length(unique(president)))
index[i]=min(which((president==unique(president)[i])==TRUE))
index
##Generate table with first approvals
first.approval=data.frame(cbind(index,president[index],approval[index]))
names(first.approval)=c("Index","President","Approval")
first.approval
Cheers,
Daniel
-------------------------
cuncta stricte discussurus
-------------------------
-----Urspr?ngliche Nachricht-----
Von: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] Im
Auftrag von Alexander Ovodenko
Gesendet: Sunday, March 30, 2008 9:47 PM
An: r-help at r-project.org
Betreff: [R] Finding a mean value of a variable holding a dummy
variablefixed
I have time-series data on approval ratings of British Prime Ministers. The
prime ministers dating from MacMillan onward till today are coded as dummy
variables and the approval ratings are entered for each month. I want to
know the mean value of the approval rating of each Prime Minister in the
dataset and the approval rating during his/her first month and last month as
PM. What R code should I enter for these data? In other words, I want hold
the dummy corresponding to each Prime Minister fixed at value one and know
the first rating that PM has, the last rating s/he has, and the mean rating
s/he has. Thanks.
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Here is another way, starting with similar dummy data:
____________
PMData <-
data.frame(PM = c("Thatcher", "Thatcher", "Thatcher", "Thatcher",
"Thatcher", "Thatcher","Major", "Major", "Major",
"Major", "Major", "Major"),
approval = c(3, 4, 5, 6, 7, 8, 6, 5, 4, 3, 2, 1))
PMData <- transform(PMData, Month = 1:nrow(PMData)) ## add the time variable
PM_average <- with(PMData, tapply(approval, PM, mean))
PM_span <- with(PMData, sapply(tapply(Month, PM, range),
function(x) structure(approval[Month[x]],
names = c("First", "Last"))))
____________
rbind(mean = PM_average, PM_span)
Major Thatcher mean 3.5 5.5 First 6.0 3.0 Last 1.0 8.0 (I don't recall any Prime Minister called Johnson or Nixon, by the way...) Bill Venables CSIRO Laboratories PO Box 120, Cleveland, 4163 AUSTRALIA Office Phone (email preferred): +61 7 3826 7251 Fax (if absolutely necessary): +61 7 3826 7304 Mobile: +61 4 8819 4402 Home Phone: +61 7 3286 7700 mailto:Bill.Venables at csiro.au http://www.cmis.csiro.au/bill.venables/ -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Daniel Malter Sent: Monday, 31 March 2008 1:22 PM To: 'Alexander Ovodenko'; r-help at r-project.org Subject: Re: [R] Finding a mean value of a variable holding a dummyvariablefixed I found a solution. It's probably not the easiest one, but it works. It assumes that your data frame is ordered from earliest to latest record for each president, but it can be easily adjusted if you want to make it dependent on a third column. The final vector "index" gives you the line indices for the first record for each president. If you replace "min" by "max" you get the last instead of the first record. You can then find the values by ##Sample data president=c("Johnson","Johnson","Johnson","Johnson","Johnson","Johnson","Nix on","Nixon","Nixon","Nixon","Nixon","Nixon") approval=c(3,4,5,6,7,8,6,5,4,3,2,1) tapply(approval,president,mean) ##Find index for first row of each president; assumes ascending order of observations; change "min" to "max" to find last record index=NULL for(i in 1:length(unique(president))) index[i]=min(which((president==unique(president)[i])==TRUE)) index ##Generate table with first approvals first.approval=data.frame(cbind(index,president[index],approval[index])) names(first.approval)=c("Index","President","Approval") first.approval Cheers, Daniel ------------------------- cuncta stricte discussurus ------------------------- -----Urspr?ngliche Nachricht----- Von: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] Im Auftrag von Alexander Ovodenko Gesendet: Sunday, March 30, 2008 9:47 PM An: r-help at r-project.org Betreff: [R] Finding a mean value of a variable holding a dummy variablefixed I have time-series data on approval ratings of British Prime Ministers. The prime ministers dating from MacMillan onward till today are coded as dummy variables and the approval ratings are entered for each month. I want to know the mean value of the approval rating of each Prime Minister in the dataset and the approval rating during his/her first month and last month as PM. What R code should I enter for these data? In other words, I want hold the dummy corresponding to each Prime Minister fixed at value one and know the first rating that PM has, the last rating s/he has, and the mean rating s/he has. Thanks. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
How about this? :-)
president <- c("Johnson", "Johnson", "Johnson"," Johnson"," Johnson",
"Johnson","Nixon", "Nixon", "Nixon", "Nixon", "Nixon", "Nixon")
approval <- c(3,4,5,6,7,8,6,5,4,3,2,1)
fn <- function (x) c(first=x[1], last=x[length(x)], mean=mean(x))
lst <- tapply(approval, president, fn)
# Or if you need a data.frame:
res <- data.frame(matrix(unlist(lst), byrow=TRUE,
dimnames=list(names(lst), names(lst[[1]])), ncol=3))
Cheers,
Simon.
On Sun, 2008-03-30 at 23:21 -0400, Daniel Malter wrote:
I found a solution. It's probably not the easiest one, but it works. It
assumes that your data frame is ordered from earliest to latest record for
each president, but it can be easily adjusted if you want to make it
dependent on a third column. The final vector "index" gives you the line
indices for the first record for each president. If you replace "min" by
"max" you get the last instead of the first record. You can then find the
values by
##Sample data
president=c("Johnson","Johnson","Johnson","Johnson","Johnson","Johnson","Nix
on","Nixon","Nixon","Nixon","Nixon","Nixon")
approval=c(3,4,5,6,7,8,6,5,4,3,2,1)
tapply(approval,president,mean)
##Find index for first row of each president; assumes ascending order of
observations; change "min" to "max" to find last record
index=NULL
for(i in 1:length(unique(president)))
index[i]=min(which((president==unique(president)[i])==TRUE))
index
##Generate table with first approvals
first.approval=data.frame(cbind(index,president[index],approval[index]))
names(first.approval)=c("Index","President","Approval")
first.approval
Cheers,
Daniel
-------------------------
cuncta stricte discussurus
-------------------------
-----Urspr?ngliche Nachricht-----
Von: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] Im
Auftrag von Alexander Ovodenko
Gesendet: Sunday, March 30, 2008 9:47 PM
An: r-help at r-project.org
Betreff: [R] Finding a mean value of a variable holding a dummy
variablefixed
I have time-series data on approval ratings of British Prime Ministers. The
prime ministers dating from MacMillan onward till today are coded as dummy
variables and the approval ratings are entered for each month. I want to
know the mean value of the approval rating of each Prime Minister in the
dataset and the approval rating during his/her first month and last month as
PM. What R code should I enter for these data? In other words, I want hold
the dummy corresponding to each Prime Minister fixed at value one and know
the first rating that PM has, the last rating s/he has, and the mean rating
s/he has. Thanks.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Simon Blomberg, BSc (Hons), PhD, MAppStat. Lecturer and Consultant Statistician Faculty of Biological and Chemical Sciences The University of Queensland St. Lucia Queensland 4072 Australia Room 320 Goddard Building (8) T: +61 7 3365 2506 http://www.uq.edu.au/~uqsblomb email: S.Blomberg1_at_uq.edu.au Policies: 1. I will NOT analyse your data for you. 2. Your deadline is your problem. The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. - John Tukey.
(I don't recall any Prime Minister called Johnson or Nixon, by the way...)
That's why the variable was called "president". "prime.minister" is four
letters and a dot longer. That was too complicated ;)
-------------------------
cuncta stricte discussurus
-------------------------
-----Urspr?ngliche Nachricht-----
Von: Bill.Venables at csiro.au [mailto:Bill.Venables at csiro.au]
Gesendet: Sunday, March 30, 2008 11:50 PM
An: daniel at umd.edu; ovodenko at princeton.edu; r-help at r-project.org
Betreff: RE: [R] Finding a mean value of a variable holding a
dummyvariablefixed
Here is another way, starting with similar dummy data:
____________
PMData <-
data.frame(PM = c("Thatcher", "Thatcher", "Thatcher", "Thatcher",
"Thatcher", "Thatcher","Major", "Major", "Major",
"Major", "Major", "Major"),
approval = c(3, 4, 5, 6, 7, 8, 6, 5, 4, 3, 2, 1))
PMData <- transform(PMData, Month = 1:nrow(PMData)) ## add the time
variable
PM_average <- with(PMData, tapply(approval, PM, mean))
PM_span <- with(PMData, sapply(tapply(Month, PM, range),
function(x) structure(approval[Month[x]],
names = c("First", "Last"))))
____________
rbind(mean = PM_average, PM_span)
Major Thatcher mean 3.5 5.5 First 6.0 3.0 Last 1.0 8.0 (I don't recall any Prime Minister called Johnson or Nixon, by the way...) Bill Venables CSIRO Laboratories PO Box 120, Cleveland, 4163 AUSTRALIA Office Phone (email preferred): +61 7 3826 7251 Fax (if absolutely necessary): +61 7 3826 7304 Mobile: +61 4 8819 4402 Home Phone: +61 7 3286 7700 mailto:Bill.Venables at csiro.au http://www.cmis.csiro.au/bill.venables/ -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Daniel Malter Sent: Monday, 31 March 2008 1:22 PM To: 'Alexander Ovodenko'; r-help at r-project.org Subject: Re: [R] Finding a mean value of a variable holding a dummyvariablefixed I found a solution. It's probably not the easiest one, but it works. It assumes that your data frame is ordered from earliest to latest record for each president, but it can be easily adjusted if you want to make it dependent on a third column. The final vector "index" gives you the line indices for the first record for each president. If you replace "min" by "max" you get the last instead of the first record. You can then find the values by ##Sample data president=c("Johnson","Johnson","Johnson","Johnson","Johnson","Johnson","Nix on","Nixon","Nixon","Nixon","Nixon","Nixon") approval=c(3,4,5,6,7,8,6,5,4,3,2,1) tapply(approval,president,mean) ##Find index for first row of each president; assumes ascending order of observations; change "min" to "max" to find last record index=NULL for(i in 1:length(unique(president))) index[i]=min(which((president==unique(president)[i])==TRUE)) index ##Generate table with first approvals first.approval=data.frame(cbind(index,president[index],approval[index])) names(first.approval)=c("Index","President","Approval") first.approval Cheers, Daniel ------------------------- cuncta stricte discussurus ------------------------- -----Urspr?ngliche Nachricht----- Von: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] Im Auftrag von Alexander Ovodenko Gesendet: Sunday, March 30, 2008 9:47 PM An: r-help at r-project.org Betreff: [R] Finding a mean value of a variable holding a dummy variablefixed I have time-series data on approval ratings of British Prime Ministers. The prime ministers dating from MacMillan onward till today are coded as dummy variables and the approval ratings are entered for each month. I want to know the mean value of the approval rating of each Prime Minister in the dataset and the approval rating during his/her first month and last month as PM. What R code should I enter for these data? In other words, I want hold the dummy corresponding to each Prime Minister fixed at value one and know the first rating that PM has, the last rating s/he has, and the mean rating s/he has. Thanks. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.