I have a large dataset, a sample of which is:
a<- c(?A?, ?B?,?A?, ?B?,?A?, ?B?,?A?, ?B?,?A?, ?B?)
b <-c(15, 35, 20, 99, 75, 64, 33, 78, 45, 20)
c<- c( 111, 234, 456, 876, 246, 662, 345, 480, 512, 179)
d<- c(1.1, 3.2, 14.2, 8.7, 12.5, 5.9, 8.3, 6.0, 2.9, 9.3)
df <- data.frame(a,b,c,d)
I?m trying to construct a data frame that shows the means of c & b based on the condition of d and grouped by a.
I want to create the data frame below, then use ggplot2 to create a line plot of b at various conditions of d.
I can compute the grouped means (d>=2, d>=4, etc.) one at a time using dplyr but haven?t figured out how to put them all together or put them in one data frame.
I?d rather not use a loop and am relatively new to R. Is there a way i can use tapply and set it to the conditions above so that I can create the df below?
condition mean(b) mean(c)
A d>=2 ____ _____
B d>=2 ____ _____
A d>=4 ____ _____
B d>=4 ____ _____
A d>=6 ____ _____
B d>=6 ____ _____
Ken
kmnanus at gmail.com
914-450-0816 (tel)
347-730-4813 (fax)
Computing means of multiple variables based on a condition
5 messages · KMNanus, William Dunlap, Jeff Newmiller +1 more
Just to be clear, do you really want your 'condition' groups to be be subsets of one another? Most (all?) of the *ply functions assume you want non-overlapping groups so they do a split-summarize-combine sequence. You would have to replace the split part of that. Bill Dunlap TIBCO Software wdunlap tibco.com
On Wed, May 25, 2016 at 3:37 PM, KMNanus <kmnanus at gmail.com> wrote:
I have a large dataset, a sample of which is:
a<- c(?A?, ?B?,?A?, ?B?,?A?, ?B?,?A?, ?B?,?A?, ?B?)
b <-c(15, 35, 20, 99, 75, 64, 33, 78, 45, 20)
c<- c( 111, 234, 456, 876, 246, 662, 345, 480, 512, 179)
d<- c(1.1, 3.2, 14.2, 8.7, 12.5, 5.9, 8.3, 6.0, 2.9, 9.3)
df <- data.frame(a,b,c,d)
I?m trying to construct a data frame that shows the means of c & b based
on the condition of d and grouped by a.
I want to create the data frame below, then use ggplot2 to create a line
plot of b at various conditions of d.
I can compute the grouped means (d>=2, d>=4, etc.) one at a time using
dplyr but haven?t figured out how to put them all together or put them in
one data frame.
I?d rather not use a loop and am relatively new to R. Is there a way i
can use tapply and set it to the conditions above so that I can create the
df below?
condition mean(b) mean(c)
A d>=2 ____ _____
B d>=2 ____ _____
A d>=4 ____ _____
B d>=4 ____ _____
A d>=6 ____ _____
B d>=6 ____ _____
Ken
kmnanus at gmail.com
914-450-0816 (tel)
347-730-4813 (fax)
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
These will be overlapping subgroups from the same data frame. For example, d<=2 will have length=9, d<=4 will have length=7, etc. Ken kmnanus at gmail.com 914-450-0816 (tel) 347-730-4813 (fax)
On May 25, 2016, at 9:06 PM, William Dunlap <wdunlap at tibco.com> wrote: Just to be clear, do you really want your 'condition' groups to be be subsets of one another? Most (all?) of the *ply functions assume you want non-overlapping groups so they do a split-summarize-combine sequence. You would have to replace the split part of that. Bill Dunlap TIBCO Software wdunlap tibco.com <http://tibco.com/> On Wed, May 25, 2016 at 3:37 PM, KMNanus <kmnanus at gmail.com <mailto:kmnanus at gmail.com>> wrote: I have a large dataset, a sample of which is: a<- c(?A?, ?B?,?A?, ?B?,?A?, ?B?,?A?, ?B?,?A?, ?B?) b <-c(15, 35, 20, 99, 75, 64, 33, 78, 45, 20) c<- c( 111, 234, 456, 876, 246, 662, 345, 480, 512, 179) d<- c(1.1, 3.2, 14.2, 8.7, 12.5, 5.9, 8.3, 6.0, 2.9, 9.3) df <- data.frame(a,b,c,d) I?m trying to construct a data frame that shows the means of c & b based on the condition of d and grouped by a. I want to create the data frame below, then use ggplot2 to create a line plot of b at various conditions of d. I can compute the grouped means (d>=2, d>=4, etc.) one at a time using dplyr but haven?t figured out how to put them all together or put them in one data frame. I?d rather not use a loop and am relatively new to R. Is there a way i can use tapply and set it to the conditions above so that I can create the df below? condition mean(b) mean(c) A d>=2 ____ _____ B d>=2 ____ _____ A d>=4 ____ _____ B d>=4 ____ _____ A d>=6 ____ _____ B d>=6 ____ _____ Ken kmnanus at gmail.com <mailto:kmnanus at gmail.com> 914-450-0816 <tel:914-450-0816> (tel) 347-730-4813 <tel:347-730-4813> (fax)
______________________________________________ R-help at r-project.org <mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help <https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html <http://www.r-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.
Thank you for including some sample data, but I have to ask that you
please invest some time in learning how to edit your code in a text editor
and to post in plain text. The quote marks in your example were "curly",
which R does not understand. There are other ways in which HTML email
leads to corruption on this mailing list as well, so you will save
everyone numerous headaches by investing this time sooner rather than
later.
The type of operation you are looking for is referred to as an "outer
join" in SQL nomenclature, and it is intrinsically slow because the only
way to accomplish it is computationally equivalent to a for loop that
successively applies each minimum "d" value to your whole data set.
Having said that, you can accomplish this in the "dplyr" syntax instead of
using a for loop, if that makes you happy, but it is not really any
"better" than a for loop (and some people might consider it misleading
to drape a for loop in such fancy syntax):
DF <- data.frame( a = c( "A", "B", "A", "B", "A", "B", "A", "B", "A", "B" )
, b = c( 15, 35, 20, 99, 75, 64, 33, 78, 45, 20 )
, c = c( 111, 234, 456, 876, 246, 662, 345, 480, 512, 179 )
, d = c( 1.1, 3.2, 14.2, 8.7, 12.5, 5.9, 8.3, 6.0, 2.9, 9.3 )
, stringsAsFactors = FALSE
)
passes <- data.frame( dmin = c( 2, 4, 6 ) )
library(dplyr)
DF2 <- ( passes
%>% rowwise
%>% do({ # run once for each row in "passes"
dmin <- .$dmin # dot here refers to row of
# "passes" data frame
( DF
%>% filter( d >= dmin )
%>% group_by( a )
%>% summarise( meanb = mean( b )
, meanc = mean( c )
)
%>% mutate( condition = paste0( "d>=", dmin ) )
)
})
%>% select( a, condition, meanb, meanc )
%>% as.data.frame
)
On Wed, 25 May 2016, KMNanus wrote:
These will be overlapping subgroups from the same data frame. For example, d<=2 will have length=9, d<=4 will have length=7, etc. Ken kmnanus at gmail.com 914-450-0816 (tel) 347-730-4813 (fax)
On May 25, 2016, at 9:06 PM, William Dunlap <wdunlap at tibco.com> wrote: Just to be clear, do you really want your 'condition' groups to be be subsets of one another? Most (all?) of the *ply functions assume you want non-overlapping groups so they do a split-summarize-combine sequence. You would have to replace the split part of that. Bill Dunlap TIBCO Software wdunlap tibco.com <http://tibco.com/> On Wed, May 25, 2016 at 3:37 PM, KMNanus <kmnanus at gmail.com <mailto:kmnanus at gmail.com>> wrote: I have a large dataset, a sample of which is: a<- c(?A?, ?B?,?A?, ?B?,?A?, ?B?,?A?, ?B?,?A?, ?B?) b <-c(15, 35, 20, 99, 75, 64, 33, 78, 45, 20) c<- c( 111, 234, 456, 876, 246, 662, 345, 480, 512, 179) d<- c(1.1, 3.2, 14.2, 8.7, 12.5, 5.9, 8.3, 6.0, 2.9, 9.3) df <- data.frame(a,b,c,d) I?m trying to construct a data frame that shows the means of c & b based on the condition of d and grouped by a. I want to create the data frame below, then use ggplot2 to create a line plot of b at various conditions of d. I can compute the grouped means (d>=2, d>=4, etc.) one at a time using dplyr but haven?t figured out how to put them all together or put them in one data frame. I?d rather not use a loop and am relatively new to R. Is there a way i can use tapply and set it to the conditions above so that I can create the df below? condition mean(b) mean(c) A d>=2 ____ _____ B d>=2 ____ _____ A d>=4 ____ _____ B d>=4 ____ _____ A d>=6 ____ _____ B d>=6 ____ _____ Ken kmnanus at gmail.com <mailto:kmnanus at gmail.com> 914-450-0816 <tel:914-450-0816> (tel) 347-730-4813 <tel:347-730-4813> (fax)
______________________________________________ R-help at r-project.org <mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help <https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html <http://www.r-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
Another option would be to convert the data into a long format and add
columns for each condition.
library(dplyr)
library(tidyr)
DF %>%
gather(key = "key", value = "value", -a, -d) %>%
mutate(
"d>=2" = ifelse(d >= 2, value, NA),
"d>=4" = ifelse(d >= 4, value, NA),
"d>=6" = ifelse(d >= 6, value, NA)
) %>%
select(-d, -value) %>%
gather(key = "condition", value = "value", -a, -key, na.rm = TRUE) %>%
group_by(a, key, condition) %>%
summarise(mean = mean(value)) %>%
spread(key = key, value = mean) %>%
arrange(condition, a)
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium
To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
2016-05-26 8:34 GMT+02:00 Jeff Newmiller <jdnewmil at dcn.davis.ca.us>:
Thank you for including some sample data, but I have to ask that you
please invest some time in learning how to edit your code in a text editor
and to post in plain text. The quote marks in your example were "curly",
which R does not understand. There are other ways in which HTML email leads
to corruption on this mailing list as well, so you will save everyone
numerous headaches by investing this time sooner rather than later.
The type of operation you are looking for is referred to as an "outer
join" in SQL nomenclature, and it is intrinsically slow because the only
way to accomplish it is computationally equivalent to a for loop that
successively applies each minimum "d" value to your whole data set.
Having said that, you can accomplish this in the "dplyr" syntax instead of
using a for loop, if that makes you happy, but it is not really any
"better" than a for loop (and some people might consider it misleading to
drape a for loop in such fancy syntax):
DF <- data.frame( a = c( "A", "B", "A", "B", "A", "B", "A", "B", "A", "B" )
, b = c( 15, 35, 20, 99, 75, 64, 33, 78, 45, 20 )
, c = c( 111, 234, 456, 876, 246, 662, 345, 480, 512, 179 )
, d = c( 1.1, 3.2, 14.2, 8.7, 12.5, 5.9, 8.3, 6.0, 2.9,
9.3 )
, stringsAsFactors = FALSE
)
passes <- data.frame( dmin = c( 2, 4, 6 ) )
library(dplyr)
DF2 <- ( passes
%>% rowwise
%>% do({ # run once for each row in "passes"
dmin <- .$dmin # dot here refers to row of
# "passes" data frame
( DF
%>% filter( d >= dmin )
%>% group_by( a )
%>% summarise( meanb = mean( b )
, meanc = mean( c )
)
%>% mutate( condition = paste0( "d>=", dmin ) )
)
})
%>% select( a, condition, meanb, meanc )
%>% as.data.frame
)
On Wed, 25 May 2016, KMNanus wrote:
These will be overlapping subgroups from the same data frame. For
example, d<=2 will have length=9, d<=4 will have length=7, etc. Ken kmnanus at gmail.com 914-450-0816 (tel) 347-730-4813 (fax) On May 25, 2016, at 9:06 PM, William Dunlap <wdunlap at tibco.com> wrote:
Just to be clear, do you really want your 'condition' groups to be be subsets of one another? Most (all?) of the *ply functions assume you want non-overlapping groups so they do a split-summarize-combine sequence. You would have to replace the split part of that. Bill Dunlap TIBCO Software wdunlap tibco.com <http://tibco.com/> On Wed, May 25, 2016 at 3:37 PM, KMNanus <kmnanus at gmail.com <mailto: kmnanus at gmail.com>> wrote: I have a large dataset, a sample of which is: a<- c(?A?, ?B?,?A?, ?B?,?A?, ?B?,?A?, ?B?,?A?, ?B?) b <-c(15, 35, 20, 99, 75, 64, 33, 78, 45, 20) c<- c( 111, 234, 456, 876, 246, 662, 345, 480, 512, 179) d<- c(1.1, 3.2, 14.2, 8.7, 12.5, 5.9, 8.3, 6.0, 2.9, 9.3) df <- data.frame(a,b,c,d) I?m trying to construct a data frame that shows the means of c & b based on the condition of d and grouped by a. I want to create the data frame below, then use ggplot2 to create a line plot of b at various conditions of d. I can compute the grouped means (d>=2, d>=4, etc.) one at a time using dplyr but haven?t figured out how to put them all together or put them in one data frame. I?d rather not use a loop and am relatively new to R. Is there a way i can use tapply and set it to the conditions above so that I can create the df below? condition mean(b) mean(c) A d>=2 ____ _____ B d>=2 ____ _____ A d>=4 ____ _____ B d>=4 ____ _____ A d>=6 ____ _____ B d>=6 ____ _____ Ken kmnanus at gmail.com <mailto:kmnanus at gmail.com> 914-450-0816 <tel:914-450-0816> (tel) 347-730-4813 <tel:347-730-4813> (fax)
______________________________________________ R-help at r-project.org <mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help < https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html < http://www.r-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live
Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.