Percentiles with R for a big data.frame
On Jan 23, 2013, at 5:45 AM, Simonas Kecorius wrote:
I found a code: y.ts <- ts(data, frequency=12) aggregate(y.ts, FUN=quantile, probs=0.10) Seems it works fine even for a big data.frame.
Except for the fact that 'y.ts' is not a dataframe, so you are using a function that has different arguments than `aggregate.data.frame`. With the `ts` call you implicitly constructed `ts(data.matrix(data), frequency=12)` and will be getting quantile estimates on groups of 12, which is not at all what you asked for in the first place.
David. > > Thanks for your help. > > 2013/1/22 David Winsemius <dwinsemius at comcast.net> > > On Jan 22, 2013, at 5:58 AM, Simonas Kecorius wrote: > > Hey Duncan, > > Neither me do imagine what formula OpenOffice uses for quantiles. I > have > checked a data string, 24 values, to calculate a quantiles with > OpenOffice > and R. The result is identical. The problem arises when I try to > implement > quantile calculation in this form: > dat2<-with(dat1,aggregate(cbind(dat1[, > 1:71]),by=list(newID),quantiles,0.1,type=4)) > . This code does not generate an error, but I guess neither a right > result. > > You guess? What result and what is "right"? > > > So my question would be: > How I could calculate quantiles for a big data.frame in R (71 > columns and > 288 rows). I need to take 24 rows, calculate quantiles, then take > another > > 24 rows etc..for 71 columns. > > > You have already been told that you are misspelling the name of the > R function. > > The other open question in my mind is whether you were hoping for > something other than a single quantile (in this case the 10th > percentile, or perhaps wanted the quantiles that would divide your > data into deciles? > > If you want to do the calculation within groups then the second > argument to `aggregate` must specify the grouping. By design > `aggregate` will apply the function on all columns. > -- > David. > > Thanks in advance. > > > > > 2013/1/22 Duncan Murdoch <murdoch.duncan at gmail.com> > > On 13-01-21 6:41 PM, Simonas Kecorius wrote: > > Dear R users, > > I came up to a problem dealing with percentiles in R. > > From my previous questions: I do have a big data.frame, with lots of > > columns and rows. The following command enables me to calculate > means for > all data frame. > > dat1$newID<-rep(1:(nrow(dat1)/**12),each=12) #if nrow(dat1)/12 is > integer > > dat2<-with(dat1,aggregate(**cbind(dat1[,1:71]),by=list(**newID),mean)) > > > What I need is to calculate percentiles for each group (there are 12 > values > in a group). I tried the following: > > duomenai<-with(dat1,aggregate(**cbind(dat1[,1:71]),by=list(** > newID),quantiles,0.1,type=4)) > > > You didn't define quantiles, so that won't work. Assuming that's a > typo, > and you meant quantile... > > > > First, is the following syntax is right? > Secondly, I tried to calculate percentiles using OpenOffice and > there is > disagreement between values. If I do calculation for some number > row, than > R and OpenOffice numbers coincide, but for a data.frame it seams that > something goes wrong. > > > There are lots of different formulas for empirical quantiles. The > ones > available in R are described in the ?quantile help topic. What > formula > does OpenOffice use? > > Duncan Murdoch > > > > > -- > Simonas Kecorius > ** > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > David Winsemius, MD > Alameda, CA, USA > > > > > -- > Simonas Kecorius > David Winsemius, MD Alameda, CA, USA