An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110116/a9463e16/attachment.pl>
Using summaryBy with weighted data
10 messages · Joshua Wiley, David Freedman, S. Messing +4 more
Dear Solomon, On Sun, Jan 16, 2011 at 10:27 PM, Solomon Messing
<solomon.messing at gmail.com> wrote:
Dear Soren and R users:
I am trying to use the summaryBy function with weights. ?Is this possible? ?An example that illustrates what I am trying to do follows:
library(doBy)
## make up some data
response = rnorm(100)
group = c(rep(1,20), rep(2,20), rep(3,20), rep(4,20), rep(5,20))
weights = runif(100, 0, 1)
mydata = data.frame(response,group,weights)
## run summaryBy without weights:
summaryBy(response~group, data = mydata, FUN = mean)
## attempt to run summaryBy with weights, throws error
summaryBy(x~group, data = mydata, FUN = weighted.mean, w=weights )
## throws the error:
# Error in tapply(lh.data[, lh.var[vv]], rh.string.factor, function(x) { :
# ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? arguments must have same length
My guess is that summaryBy is not giving weighted.mean() each group of weights, but instead is passing all of the weights in the data set each time it calls weighted.mean().
Yes, of course. It has no way of knowing that the weights should also be being broken down by group....they are not in the formula.
?Do you know if there is some way to get summaryBy to pass weights to weighted.mean() only for each group?
Ideally there would be a way to pass more than one variable to a function (e.g., response and weights) or just an entire object (mydata) broken down by group. Then you would just make a wrapper function to pass the right values to the x and w arguments of weighted.mean. Instead here is a somewhat hacked version: library(doBy) ## make up some data (easier) mydata <- data.frame(response = rnorm(100), group = rep(1:5, each = 20), weights = runif(100, 0, 1)) ## manually compute weighted mean tmp <- summaryBy(response*weights ~ group, data = mydata, FUN = sum) tmp[,2] <- tmp[,2]/with(mydata, tapply(weights, group, sum)) tmp ## weighted means ## here's the 'problem', if you will, even with +, they are passed one at a time summaryBy(response + weights ~ group, data = mydata, FUN = str) summaryBy(mydata ~ group, data = mydata, FUN = str) ## here is an option using by(): xy <- by(mydata, mydata$group, function(z) weighted.mean(z$response, z$weights)) xy ## if you don't like the formatting.... data.frame(group = names(c(xy)), weighted.mean = c(xy)) HTH, Josh
I suspect this functionality would be a tremendous benefit to R users who regularly work with weighted data, such as myself. Thanks, Solomon Messing www.stanford.edu/~messing PS I know this basic example can be done using lapply(split(...)) approach referenced here: http://www.mail-archive.com/r-help at stat.math.ethz.ch/msg12349.html but for more complex tasks the lapply approach will mean writing a lot of extra code to run everything and then to get things formatted as nicely as summaryBy() was designed to do. ? ? ? ?[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/
You might use the plyr package to get group-wise weighted means library(plyr) ddply(mydata,~group,summarise, b=mean(weights), c=weighted.mean(response,weights)) hth david freedman
View this message in context: http://r.789695.n4.nabble.com/Using-summaryBy-with-weighted-data-tp3220761p3221212.html Sent from the R help mailing list archive at Nabble.com.
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110117/e8cd385f/attachment.pl>
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110117/39b64b98/attachment.pl>
It is currently not possible to pass weights in summaryBy. Regards S?ren ________________________________________ Fra: Joshua Wiley [jwiley.psych at gmail.com] Sendt: 17. januar 2011 08:16 Til: Solomon Messing Cc: r-help at r-project.org; S?ren H?jsgaard Emne: Re: [R] Using summaryBy with weighted data Dear Solomon, On Sun, Jan 16, 2011 at 10:27 PM, Solomon Messing
<solomon.messing at gmail.com> wrote:
Dear Soren and R users:
I am trying to use the summaryBy function with weights. Is this possible? An example that illustrates what I am trying to do follows:
library(doBy)
## make up some data
response = rnorm(100)
group = c(rep(1,20), rep(2,20), rep(3,20), rep(4,20), rep(5,20))
weights = runif(100, 0, 1)
mydata = data.frame(response,group,weights)
## run summaryBy without weights:
summaryBy(response~group, data = mydata, FUN = mean)
## attempt to run summaryBy with weights, throws error
summaryBy(x~group, data = mydata, FUN = weighted.mean, w=weights )
## throws the error:
# Error in tapply(lh.data[, lh.var[vv]], rh.string.factor, function(x) { :
# arguments must have same length
My guess is that summaryBy is not giving weighted.mean() each group of weights, but instead is passing all of the weights in the data set each time it calls weighted.mean().
Yes, of course. It has no way of knowing that the weights should also be being broken down by group....they are not in the formula.
Do you know if there is some way to get summaryBy to pass weights to weighted.mean() only for each group?
Ideally there would be a way to pass more than one variable to a function (e.g., response and weights) or just an entire object (mydata) broken down by group. Then you would just make a wrapper function to pass the right values to the x and w arguments of weighted.mean. Instead here is a somewhat hacked version: library(doBy) ## make up some data (easier) mydata <- data.frame(response = rnorm(100), group = rep(1:5, each = 20), weights = runif(100, 0, 1)) ## manually compute weighted mean tmp <- summaryBy(response*weights ~ group, data = mydata, FUN = sum) tmp[,2] <- tmp[,2]/with(mydata, tapply(weights, group, sum)) tmp ## weighted means ## here's the 'problem', if you will, even with +, they are passed one at a time summaryBy(response + weights ~ group, data = mydata, FUN = str) summaryBy(mydata ~ group, data = mydata, FUN = str) ## here is an option using by(): xy <- by(mydata, mydata$group, function(z) weighted.mean(z$response, z$weights)) xy ## if you don't like the formatting.... data.frame(group = names(c(xy)), weighted.mean = c(xy)) HTH, Josh
I suspect this functionality would be a tremendous benefit to R users who regularly work with weighted data, such as myself. Thanks, Solomon Messing www.stanford.edu/~messing PS I know this basic example can be done using lapply(split(...)) approach referenced here: http://www.mail-archive.com/r-help at stat.math.ethz.ch/msg12349.html but for more complex tasks the lapply approach will mean writing a lot of extra code to run everything and then to get things formatted as nicely as summaryBy() was designed to do. [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110117/989dbcf3/attachment.pl>
Hi everyone, I am trying to run Sweave.bat (batchfiles_0.6-1) from the command line on Windows, but I get this error: C:\batchfiles_0.6-1>Sweave.bat Sweave-test-1 "Error: rterm.exe not found" I don't know how to set up the path if this one were the problem... I ran rcmd.bat and I got this... so I don't know if it is a path problem. C:\batchfiles_0.6-1>Rcmd,bat R_ARCH=/x64 R_ARCH0=x64 R_ARCH0=x64 cmdpath=C:\R\R-2.12.1\bin\x64\Rcmd.exe args=,bat 'bat' is not recognized as an internal or external command, operable program or batch file. the path of rterm.exe in my computer is: C:\R\R-2.12.1\bin\x64 thank you in advance!
Sebasti?n Daza sebastian.daza at gmail.com
Hi everyone, I am trying to run Sweave.bat (batchfiles_0.6-1) from the command line on Windows, but I get this error: C:\batchfiles_0.6-1>Sweave.bat Sweave-test-1 "Error: rterm.exe not found" I don't know how to set up the path if this one were the problem... I ran rcmd.bat and I got this... so I don't know if it is a path problem. C:\batchfiles_0.6-1>Rcmd,bat R_ARCH=/x64 R_ARCH0=x64 R_ARCH0=x64 cmdpath=C:\R\R-2.12.1\bin\x64\Rcmd.exe args=,bat 'bat' is not recognized as an internal or external command, operable program or batch file. the path of rterm.exe in my computer is: C:\R\R-2.12.1\bin\x64 thank you in advance!
Sebasti?n Daza sebastian.daza at gmail.com
2011/1/17 Sebasti?n Daza <sebastian.daza at gmail.com>:
Hi everyone, I am trying to run Sweave.bat (batchfiles_0.6-1) from the command line on Windows, but I get this error: C:\batchfiles_0.6-1>Sweave.bat Sweave-test-1 "Error: rterm.exe not found" I don't know how to set up the path if this one were the problem... I ran rcmd.bat and I got this... so I don't know if it is a path problem. C:\batchfiles_0.6-1>Rcmd,bat R_ARCH=/x64 R_ARCH0=x64 R_ARCH0=x64 cmdpath=C:\R\R-2.12.1\bin\x64\Rcmd.exe args=,bat 'bat' is not recognized as an internal or external command, operable program or batch file. the path of rterm.exe in my computer is: C:\R\R-2.12.1\bin\x64 thank you in advance!
You should not have to set up any paths. The whole point of these batch commands are to save you from doing that. If you contact me privately we can try to determine what has gone wrong with Sweave.bat . Regarding Rcmd.bat, the comma in your command line should be a dot.
Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com