Dear all,
First, let's create some data to play around:
set.seed(1)
(df <- data.frame(Group=rep(c("Group1","Group2","Group3"), each=10),
Value=c(rexp(10, 1), rexp(10, 4), rexp(10, 10)))[sample(1:30,30),])
## Now we need the empirical distribution function:
edf <- function(x) ecdf(x)(x) # empirical distribution function evaluated at x
## The big question is how one can apply the empirical distribution function to
## each subset of df determined by "Group", so how to apply it to Group1, then
## to Group2, and finally to Group3. You might suggest (?) to use tapply:
(edf. <- tapply(df$Value, df$Group, FUN=edf))
## That's correct. But typically, one would like to obtain not only the values,
## but a data.frame containing the original information and the new (edf-)values.
## What's a simple way to get this? (one would be required to first sort df
## according to Group, then paste the values computed by edf to the sorted df;
## seems a bit tedious).
## A solution I have is the following (but I would like to know if there is a
## simpler one):
(edf.. <- do.call("rbind", lapply(unique(df$Group), function(strg){
subdata <- subset(df, Group==strg) # sub-data
subdata <- cbind(subdata, edf=edf(subdata$Value))
})) )
Cheers,
Marius
How to apply a function to subsets of a data frame *and* obtain a data frame again?
8 messages · Marius Hofert, Nick Sabbe, Paul Hiemstra +3 more
You might want to look at package plyr and use ddply. HTH, Nick Sabbe -- ping: nick.sabbe at ugent.be link: http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
project.org] On Behalf Of Marius Hofert
Sent: woensdag 17 augustus 2011 12:42
To: Help R
Subject: [R] How to apply a function to subsets of a data frame *and*
obtain a data frame again?
Dear all,
First, let's create some data to play around:
set.seed(1)
(df <- data.frame(Group=rep(c("Group1","Group2","Group3"), each=10),
Value=c(rexp(10, 1), rexp(10, 4), rexp(10,
10)))[sample(1:30,30),])
## Now we need the empirical distribution function:
edf <- function(x) ecdf(x)(x) # empirical distribution function
evaluated at x
## The big question is how one can apply the empirical distribution
function to
## each subset of df determined by "Group", so how to apply it to
Group1, then
## to Group2, and finally to Group3. You might suggest (?) to use
tapply:
(edf. <- tapply(df$Value, df$Group, FUN=edf))
## That's correct. But typically, one would like to obtain not only the
values,
## but a data.frame containing the original information and the new
(edf-)values.
## What's a simple way to get this? (one would be required to first
sort df
## according to Group, then paste the values computed by edf to the
sorted df;
## seems a bit tedious).
## A solution I have is the following (but I would like to know if
there is a
## simpler one):
(edf.. <- do.call("rbind", lapply(unique(df$Group), function(strg){
subdata <- subset(df, Group==strg) # sub-data
subdata <- cbind(subdata, edf=edf(subdata$Value))
})) )
Cheers,
Marius
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code.
On 08/17/2011 11:24 AM, Nick Sabbe wrote:
You might want to look at package plyr and use ddply.
The following example does what you want using ddply: library(plyr) edfPerGroup = ddply(df, .(Group), summarise, edf = edf(Value), Value = Value)
edfPerGroup
Group edf Value 1 Group1 0.5 0.539682840 2 Group1 0.2 0.145706727 3 Group1 0.7 0.956567494 4 Group1 0.3 0.147045991 5 Group1 0.9 1.229562053 6 Group1 0.4 0.436068626 7 Group1 0.8 1.181642779 8 Group1 0.1 0.139795262 9 Group1 1.0 2.894968537 10 Group1 0.6 0.755181833 cheers, Paul
HTH, Nick Sabbe -- ping: nick.sabbe at ugent.be link: http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
project.org] On Behalf Of Marius Hofert
Sent: woensdag 17 augustus 2011 12:42
To: Help R
Subject: [R] How to apply a function to subsets of a data frame *and*
obtain a data frame again?
Dear all,
First, let's create some data to play around:
set.seed(1)
(df <- data.frame(Group=rep(c("Group1","Group2","Group3"), each=10),
Value=c(rexp(10, 1), rexp(10, 4), rexp(10,
10)))[sample(1:30,30),])
## Now we need the empirical distribution function:
edf <- function(x) ecdf(x)(x) # empirical distribution function
evaluated at x
## The big question is how one can apply the empirical distribution
function to
## each subset of df determined by "Group", so how to apply it to
Group1, then
## to Group2, and finally to Group3. You might suggest (?) to use
tapply:
(edf. <- tapply(df$Value, df$Group, FUN=edf))
## That's correct. But typically, one would like to obtain not only the
values,
## but a data.frame containing the original information and the new
(edf-)values.
## What's a simple way to get this? (one would be required to first
sort df
## according to Group, then paste the values computed by edf to the
sorted df;
## seems a bit tedious).
## A solution I have is the following (but I would like to know if
there is a
## simpler one):
(edf.. <- do.call("rbind", lapply(unique(df$Group), function(strg){
subdata <- subset(df, Group==strg) # sub-data
subdata <- cbind(subdata, edf=edf(subdata$Value))
})) )
Cheers,
Marius
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Paul Hiemstra, Ph.D. Global Climate Division Royal Netherlands Meteorological Institute (KNMI) Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39 P.O. Box 201 | 3730 AE | De Bilt tel: +31 30 2206 494 http://intamap.geo.uu.nl/~paul http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770
The following example does what you want using ddply: library(plyr) edfPerGroup = ddply(df, .(Group), summarise, edf = edf(Value), Value = Value)
Or slightly more succinctly: ddply(df, .(Group), mutate, edf = edf(Value)) Hadley
Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/
Dear all,
thanks a lot for the quick help.
Below is what I built with the hint of Nick.
Cheers,
Marius
library(plyr)
set.seed(1)
(df <- data.frame(Group=rep(c("Group1","Group2","Group3"), each=10),
Value=c(rexp(10, 1), rexp(10, 4), rexp(10, 10)))[sample(1:30,30),])
edf <- function(x) ecdf(x)(x)
ddply(df, .(Group), function(df.) cbind(df., edf=edf(df.$Value)))
On 2011-08-17, at 13:38 , Hadley Wickham wrote:
The following example does what you want using ddply: library(plyr) edfPerGroup = ddply(df, .(Group), summarise, edf = edf(Value), Value = Value)
Or slightly more succinctly: ddply(df, .(Group), mutate, edf = edf(Value)) Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/
On 08/17/2011 11:51 AM, Marius Hofert wrote:
Dear all,
thanks a lot for the quick help.
Below is what I built with the hint of Nick.
Cheers,
Marius
library(plyr)
set.seed(1)
(df <- data.frame(Group=rep(c("Group1","Group2","Group3"), each=10),
Value=c(rexp(10, 1), rexp(10, 4), rexp(10, 10)))[sample(1:30,30),])
edf <- function(x) ecdf(x)(x)
ddply(df, .(Group), function(df.) cbind(df., edf=edf(df.$Value)))
Hadley's code is much shorter, I would use that syntax. cheers, Paul
On 2011-08-17, at 13:38 , Hadley Wickham wrote:
The following example does what you want using ddply: library(plyr) edfPerGroup = ddply(df, .(Group), summarise, edf = edf(Value), Value = Value)
Or slightly more succinctly: ddply(df, .(Group), mutate, edf = edf(Value)) Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/
Paul Hiemstra, Ph.D. Global Climate Division Royal Netherlands Meteorological Institute (KNMI) Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39 P.O. Box 201 | 3730 AE | De Bilt tel: +31 30 2206 494 http://intamap.geo.uu.nl/~paul http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770
Have a look at function ave(), e.g.,
set.seed(1)
(df <- data.frame(Group=rep(c("Group1","Group2","Group3"), each=10),
Value=c(rexp(10, 1), rexp(10, 4), rexp(10, 10)))[sample(1:30,30),])
edf <- function(x) ecdf(x)(x)
df$edf <- with(df, ave(Value, Group, FUN = edf))
df
I hope it helps.
Best,
Dimitris
On 8/17/2011 12:42 PM, Marius Hofert wrote:
Dear all,
First, let's create some data to play around:
set.seed(1)
(df<- data.frame(Group=rep(c("Group1","Group2","Group3"), each=10),
Value=c(rexp(10, 1), rexp(10, 4), rexp(10, 10)))[sample(1:30,30),])
## Now we need the empirical distribution function:
edf<- function(x) ecdf(x)(x) # empirical distribution function evaluated at x
## The big question is how one can apply the empirical distribution function to
## each subset of df determined by "Group", so how to apply it to Group1, then
## to Group2, and finally to Group3. You might suggest (?) to use tapply:
(edf.<- tapply(df$Value, df$Group, FUN=edf))
## That's correct. But typically, one would like to obtain not only the values,
## but a data.frame containing the original information and the new (edf-)values.
## What's a simple way to get this? (one would be required to first sort df
## according to Group, then paste the values computed by edf to the sorted df;
## seems a bit tedious).
## A solution I have is the following (but I would like to know if there is a
## simpler one):
(edf..<- do.call("rbind", lapply(unique(df$Group), function(strg){
subdata<- subset(df, Group==strg) # sub-data
subdata<- cbind(subdata, edf=edf(subdata$Value))
})) )
Cheers,
Marius
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014 Web: http://www.erasmusmc.nl/biostatistiek/
Hi: I would agree with Paul Hiemstra about using Hadley's code instead; see ?plyr:::mutate for details. It would also make sense to sort the data and edf by group - this does it in one line: arrange(ddply(df, .(Group), mutate, edf = edf(Value)), Group, edf) HTH, Dennis
On Wed, Aug 17, 2011 at 4:51 AM, Marius Hofert <m_hofert at web.de> wrote:
Dear all,
thanks a lot for the quick help.
Below is what I built with the hint of Nick.
Cheers,
Marius
library(plyr)
set.seed(1)
(df <- data.frame(Group=rep(c("Group1","Group2","Group3"), each=10),
? ? ? ? ? ? ? ?Value=c(rexp(10, 1), rexp(10, 4), rexp(10, 10)))[sample(1:30,30),])
edf <- function(x) ecdf(x)(x)
ddply(df, .(Group), function(df.) cbind(df., edf=edf(df.$Value)))
On 2011-08-17, at 13:38 , Hadley Wickham wrote:
The following example does what you want using ddply: library(plyr) edfPerGroup = ddply(df, .(Group), summarise, edf = edf(Value), Value = Value)
Or slightly more succinctly: ddply(df, .(Group), mutate, edf = edf(Value)) Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.