MASS fitdistr with plyr or data.table?
Hi:
Here's one way to do this with plyr and data.table.
# plyr
As Hadley inferred, when using ddply(), it's convenient to write a
function for a generic (sub-)data frame and have it return a data
frame. Here's my function and call:
f <- function(d) {
require(MASS)
est <- fitdistr(d$wind_speed, 'weibull')$estimate
data.frame(shape = est[1], scale = est[2])
}
Notice that f() takes a data frame d as input, uses the wind_speed
component of d to fit the Weibull, and returns a data frame with names
for the estimates.
ddply(weib.test.too, 'site', f)
site shape scale
1 1 3.063853 8.049467
2 2 2.945982 8.067252
3 3 2.879392 7.999636
4 4 3.097191 8.084453
5 5 3.091117 8.012450
6 6 2.943254 7.912792
7 7 2.957455 7.947545
8 8 2.975732 7.901587
9 9 3.045563 8.061838
10 10 2.995324 8.056820
Warning messages:
1: In dweibull(x, shape, scale, log) : NaNs produced
2: In dweibull(x, shape, scale, log) : NaNs produced
<snip the other eight>
## data.table
In writing a function for data.table, you want a variable as input and
a list as output. We therefore modify the above function slightly:
g <- function(x) {
require(MASS)
est <- fitdistr(x, 'weibull')$estimate
list(shape = est[1], scale = est[2])
}
library(data.table)
weib.test.dt <- data.table(weib.test.too, key = 'site')
weib.test.dt[, g(wind_speed), by = site]
site shape scale [1,] 1 3.063853 8.049467 [2,] 2 2.945982 8.067252 [3,] 3 2.879392 7.999636 [4,] 4 3.097191 8.084453 [5,] 5 3.091117 8.012450 [6,] 6 2.943254 7.912792 [7,] 7 2.957455 7.947545 [8,] 8 2.975732 7.901587 [9,] 9 3.045563 8.061838 [10,] 10 2.995324 8.056820 ## <warnings snipped> HTH, Dennis
On Wed, Apr 27, 2011 at 1:55 PM, Justin Haynes <jtor14 at gmail.com> wrote:
I am trying to extract the shape and scale parameters of a wind speed
distribution for different sites. ?I can do this in a clunky way, but
I was hoping to find a way using data.table or plyr. ?However, when I
try I am met with the following:
set.seed(144)
weib.dist<-rweibull(10000,shape=3,scale=8)
weib.test<-data.table(cbind(1:10,weib.dist))
names(weib.test)<-c('site','wind_speed')
fitted<-weib.test[,fitdistr(wind_speed,'weibull'),by=site]
Error in class(ans[[length(byval) + jj]]) = class(testj[[jj]]) :
?invalid to set the class to matrix unless the dimension attribute is
of length 2 (was 0)
In addition: Warning messages:
1: In dweibull(x, shape, scale, log) : NaNs produced
...
10: In dweibull(x, shape, scale, log) : NaNs produced
(the warning messages are normal from what I can tell)
or using plyr:
set.seed(144)
weib.dist<-rweibull(10000,shape=3,scale=8)
weib.test.too<-data.frame(cbind(1:10,weib.dist))
names(weib.test.too)<-c('site','wind_speed')
fitted<-ddply(weib.test.too,.(site),fitdistr,'weibull')
Error in .fun(piece, ...) : 'x' must be a non-empty numeric vector
those sound like similar errors to me, but I can't figure out how to
make them go away!
to prove I'm not crazy:
fitdistr(weib.dist,'weibull')$estimate
? shape ? ?scale
2.996815 8.009757
Warning messages:
1: In dweibull(x, shape, scale, log) : NaNs produced
2: In dweibull(x, shape, scale, log) : NaNs produced
3: In dweibull(x, shape, scale, log) : NaNs produced
4: In dweibull(x, shape, scale, log) : NaNs produced
Thanks
Justin
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.