An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120515/a67d44e5/attachment.pl>
How to Un-group a grouped data set?
6 messages · R. Michael Weylandt, Cheenghee AM Koh, David L Carlson
Don't use subset for a function name -- it's already the name of a
rather important function as is data (but at least that one's not a
function in your use so it's not quite so bad). Finally, use dput()
when sending data so we get a plaintext reproducible version.
I'd try something like this:
dats <- structure(list(Study = c(1L, 1L, 2L, 2L, 3L, 3L), TX = c(1L,
0L, 1L, 0L, 1L, 0L), AEs = c(3L, 2L, 1L, 2L, 1L, 1L), N = c(5L,
7L, 10L, 7L, 8L, 4L)), .Names = c("Study", "TX", "AEs", "N"), class =
"data.frame", row.names = c("1",
"2", "3", "4", "5", "6"))
# See how handy dput can be :-)
dats[unlist(mapply(FUN = function(x,y) rep(x, y), 1:NROW(dats), dats$N)), -4]
which isn't super elegant, but others might have something better.
Best,
Michael
On Tue, May 15, 2012 at 1:24 AM, Cheenghee AM Koh <sigontw at gmail.com> wrote:
Hello, R-fellows, I have a question that I really don't know how to solve. I have spent hours on line surfing for possible solutions but in veil. Please if anyone could help me handle this issue, you would be so appreciated! I have a "grouped" dataset like this:
data
?Study TX AEs ? N
1 ? ? 1 ? ? 1 ? ?3 ? ? ? 5
2 ? ? 1 ? ? 0 ? ?2 ? ? ? 7
3 ? ? 2 ? ? 1 ? ?1 ? ? ?10
4 ? ? 2 ? ? 0 ? ?2 ? ? ? 7
5 ? ? 3 ? ? 1 ? ?1 ? ? ? 8
6 ? ? 3 ? ? 0 ? ?1 ? ? ? 4
where Study is the study id, TX is treatment, AEs is how many people in
this trial is positive, and N is the number of the subjects. Therefore, for
the row 1, it stands for: It is the treatment arm for the study one, where
there are 5 subjects and 3 of them are positive. The row 2 stands for: It
is the control arm of the study 1 where there are 7 subjects and 2 of them
are positive.
Now I would like to "un-group them", make it like:
Study ?TX ? AEs
? 1 ? ? ? ? 1 ? ? ?1
? 1 ? ? ? ? 1 ? ? ?1
? 1 ? ? ? ? 1 ? ? ?1
? 1 ? ? ? ? 1 ? ? ?0
? 1 ? ? ? ? 1 ? ? ?0
? 1 ? ? ? ? 0 ? ? ?1
? 1 ? ? ? ? 0 ? ? ?1
? 1 ? ? ? ? 0 ? ? ?0
? 1 ? ? ? ? 0 ? ? ?0
? 1 ? ? ? ? 0 ? ? ?0
? 1 ? ? ? ? 0 ? ? ?0
? 1 ? ? ? ? 0 ? ? ?0
? 2 ? ? ? ? 1 ? ? ?1
? .....................
?.....................
But I wasn't able to do it. In fact I wrote a small function, and use
"lapply" to get what I want. It worked well, and did give me what I want.
But I wasn't able to collapse all the returns into one single data frame
for subsequent analysis.
The function I wrote:
subset = function(i){
d = c(rep(data[i,1], data[i,4]), rep(data[i,2], data[i,4]), rep(0:1,
c(data[i,4] - data[i,3],data[i,3])))
d = matrix(d, data[i,4],3)
d
}
then:
Data = lapply(1:6, subset)
Data
Therefore, I tried to write a loop. But no matter how I tried, I can't get
what I want.
Any idea?
Thank you so much!
Best,
--
Cheenghee Masaki Koh, MSW, MS(c), PhD Student
School of Social Service Administration
Department of Health Studies, Division of Biological Science
University of Chicago
? ? ? ?[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Sorry -- I missed the bit about the AE in your original post. Perhaps
you can work with my bit for the repeats, but it looks like if you
want to use your function, it should suffice to do something like
do.call("rbind", lapply(NewFuncName, 1:6))
Best,
Michael
On Tue, May 15, 2012 at 1:50 AM, R. Michael Weylandt
<michael.weylandt at gmail.com> wrote:
Don't use subset for a function name -- it's already the name of a
rather important function as is data (but at least that one's not a
function in your use so it's not quite so bad). Finally, use dput()
when sending data so we get a plaintext reproducible version.
I'd try something like this:
dats <- structure(list(Study = c(1L, 1L, 2L, 2L, 3L, 3L), TX = c(1L,
0L, 1L, 0L, 1L, 0L), AEs = c(3L, 2L, 1L, 2L, 1L, 1L), N = c(5L,
7L, 10L, 7L, 8L, 4L)), .Names = c("Study", "TX", "AEs", "N"), class =
"data.frame", row.names = c("1",
"2", "3", "4", "5", "6"))
# See how handy dput can be :-)
dats[unlist(mapply(FUN = function(x,y) rep(x, y), 1:NROW(dats), dats$N)), -4]
which isn't super elegant, but others might have something better.
Best,
Michael
On Tue, May 15, 2012 at 1:24 AM, Cheenghee AM Koh <sigontw at gmail.com> wrote:
Hello, R-fellows, I have a question that I really don't know how to solve. I have spent hours on line surfing for possible solutions but in veil. Please if anyone could help me handle this issue, you would be so appreciated! I have a "grouped" dataset like this:
data
?Study TX AEs ? N
1 ? ? 1 ? ? 1 ? ?3 ? ? ? 5
2 ? ? 1 ? ? 0 ? ?2 ? ? ? 7
3 ? ? 2 ? ? 1 ? ?1 ? ? ?10
4 ? ? 2 ? ? 0 ? ?2 ? ? ? 7
5 ? ? 3 ? ? 1 ? ?1 ? ? ? 8
6 ? ? 3 ? ? 0 ? ?1 ? ? ? 4
where Study is the study id, TX is treatment, AEs is how many people in
this trial is positive, and N is the number of the subjects. Therefore, for
the row 1, it stands for: It is the treatment arm for the study one, where
there are 5 subjects and 3 of them are positive. The row 2 stands for: It
is the control arm of the study 1 where there are 7 subjects and 2 of them
are positive.
Now I would like to "un-group them", make it like:
Study ?TX ? AEs
? 1 ? ? ? ? 1 ? ? ?1
? 1 ? ? ? ? 1 ? ? ?1
? 1 ? ? ? ? 1 ? ? ?1
? 1 ? ? ? ? 1 ? ? ?0
? 1 ? ? ? ? 1 ? ? ?0
? 1 ? ? ? ? 0 ? ? ?1
? 1 ? ? ? ? 0 ? ? ?1
? 1 ? ? ? ? 0 ? ? ?0
? 1 ? ? ? ? 0 ? ? ?0
? 1 ? ? ? ? 0 ? ? ?0
? 1 ? ? ? ? 0 ? ? ?0
? 1 ? ? ? ? 0 ? ? ?0
? 2 ? ? ? ? 1 ? ? ?1
? .....................
?.....................
But I wasn't able to do it. In fact I wrote a small function, and use
"lapply" to get what I want. It worked well, and did give me what I want.
But I wasn't able to collapse all the returns into one single data frame
for subsequent analysis.
The function I wrote:
subset = function(i){
d = c(rep(data[i,1], data[i,4]), rep(data[i,2], data[i,4]), rep(0:1,
c(data[i,4] - data[i,3],data[i,3])))
d = matrix(d, data[i,4],3)
d
}
then:
Data = lapply(1:6, subset)
Data
Therefore, I tried to write a loop. But no matter how I tried, I can't get
what I want.
Any idea?
Thank you so much!
Best,
--
Cheenghee Masaki Koh, MSW, MS(c), PhD Student
School of Social Service Administration
Department of Health Studies, Division of Biological Science
University of Chicago
? ? ? ?[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120515/5dd56520/attachment.pl>
It is a nifty and surprisingly useful construct whenever you need to construct a function call programmatically or apply it to a list. R-News 2/2 has some useful tips on this and related functions in the Programmer's Note section if you're interested. Best, Michael
On Tue, May 15, 2012 at 2:05 AM, Cheenghee AM Koh <sigontw at gmail.com> wrote:
Thank you so much! ?I can't believe I spent the whole night by not knowing this one command "do.call" This is so handy! Best, Koh On Tue, May 15, 2012 at 12:52 AM, R. Michael Weylandt <michael.weylandt at gmail.com> wrote:
Sorry -- I missed the bit about the AE in your original post. Perhaps
you can work with my bit for the repeats, but it looks like if you
want to use your function, it should suffice to do something like
do.call("rbind", lapply(NewFuncName, 1:6))
Best,
Michael
On Tue, May 15, 2012 at 1:50 AM, R. Michael Weylandt
<michael.weylandt at gmail.com> wrote:
Don't use subset for a function name -- it's already the name of a
rather important function as is data (but at least that one's not a
function in your use so it's not quite so bad). Finally, use dput()
when sending data so we get a plaintext reproducible version.
I'd try something like this:
dats <- structure(list(Study = c(1L, 1L, 2L, 2L, 3L, 3L), TX = c(1L,
0L, 1L, 0L, 1L, 0L), AEs = c(3L, 2L, 1L, 2L, 1L, 1L), N = c(5L,
7L, 10L, 7L, 8L, 4L)), .Names = c("Study", "TX", "AEs", "N"), class =
"data.frame", row.names = c("1",
"2", "3", "4", "5", "6"))
# See how handy dput can be :-)
dats[unlist(mapply(FUN = function(x,y) rep(x, y), 1:NROW(dats),
dats$N)), -4]
which isn't super elegant, but others might have something better.
Best,
Michael
On Tue, May 15, 2012 at 1:24 AM, Cheenghee AM Koh <sigontw at gmail.com>
wrote:
Hello, R-fellows, I have a question that I really don't know how to solve. I have spent hours on line surfing for possible solutions but in veil. Please if anyone could help me handle this issue, you would be so appreciated! I have a "grouped" dataset like this:
data
?Study TX AEs ? N
1 ? ? 1 ? ? 1 ? ?3 ? ? ? 5
2 ? ? 1 ? ? 0 ? ?2 ? ? ? 7
3 ? ? 2 ? ? 1 ? ?1 ? ? ?10
4 ? ? 2 ? ? 0 ? ?2 ? ? ? 7
5 ? ? 3 ? ? 1 ? ?1 ? ? ? 8
6 ? ? 3 ? ? 0 ? ?1 ? ? ? 4
where Study is the study id, TX is treatment, AEs is how many people in
this trial is positive, and N is the number of the subjects. Therefore,
for
the row 1, it stands for: It is the treatment arm for the study one,
where
there are 5 subjects and 3 of them are positive. The row 2 stands for:
It
is the control arm of the study 1 where there are 7 subjects and 2 of
them
are positive.
Now I would like to "un-group them", make it like:
Study ?TX ? AEs
? 1 ? ? ? ? 1 ? ? ?1
? 1 ? ? ? ? 1 ? ? ?1
? 1 ? ? ? ? 1 ? ? ?1
? 1 ? ? ? ? 1 ? ? ?0
? 1 ? ? ? ? 1 ? ? ?0
? 1 ? ? ? ? 0 ? ? ?1
? 1 ? ? ? ? 0 ? ? ?1
? 1 ? ? ? ? 0 ? ? ?0
? 1 ? ? ? ? 0 ? ? ?0
? 1 ? ? ? ? 0 ? ? ?0
? 1 ? ? ? ? 0 ? ? ?0
? 1 ? ? ? ? 0 ? ? ?0
? 2 ? ? ? ? 1 ? ? ?1
? .....................
?.....................
But I wasn't able to do it. In fact I wrote a small function, and use
"lapply" to get what I want. It worked well, and did give me what I
want.
But I wasn't able to collapse all the returns into one single data
frame
for subsequent analysis.
The function I wrote:
subset = function(i){
d = c(rep(data[i,1], data[i,4]), rep(data[i,2], data[i,4]), rep(0:1,
c(data[i,4] - data[i,3],data[i,3])))
d = matrix(d, data[i,4],3)
d
}
then:
Data = lapply(1:6, subset)
Data
Therefore, I tried to write a loop. But no matter how I tried, I can't
get
what I want.
Any idea?
Thank you so much!
Best,
--
Cheenghee Masaki Koh, MSW, MS(c), PhD Student
School of Social Service Administration
Department of Health Studies, Division of Biological Science
University of Chicago
? ? ? ?[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Cheenghee Masaki Koh, MSW, MS(c), PhD Student School of Social Service Administration Department of Health Studies, Division of Biological Science University of Chicago
newdats <- rbind(cbind(dats[rep(1:nrow(dats), dats$AEs), 1:2], AEs=1), cbind(dats[rep(1:nrow(dats), dats$N-dats$AEs),1:2], AEs=0)) But the data will not be in the order you specified unless you add newdats <- newdats[order(newdats$Study, -newdats$TX, -newdats$AEs),] and you may want to clean up the rownumbers with rownames(newdats) <- 1:nrow(newdats) ---------------------------------------------- David L Carlson Associate Professor of Anthropology Texas A&M University College Station, TX 77843-4352
-----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- project.org] On Behalf Of R. Michael Weylandt Sent: Tuesday, May 15, 2012 1:09 AM To: Cheenghee AM Koh Cc: r-help at r-project.org Subject: Re: [R] How to Un-group a grouped data set? It is a nifty and surprisingly useful construct whenever you need to construct a function call programmatically or apply it to a list. R-News 2/2 has some useful tips on this and related functions in the Programmer's Note section if you're interested. Best, Michael On Tue, May 15, 2012 at 2:05 AM, Cheenghee AM Koh <sigontw at gmail.com> wrote:
Thank you so much! ?I can't believe I spent the whole night by not
knowing
this one command "do.call" This is so handy! Best, Koh On Tue, May 15, 2012 at 12:52 AM, R. Michael Weylandt <michael.weylandt at gmail.com> wrote:
Sorry -- I missed the bit about the AE in your original post.
Perhaps
you can work with my bit for the repeats, but it looks like if you
want to use your function, it should suffice to do something like
do.call("rbind", lapply(NewFuncName, 1:6))
Best,
Michael
On Tue, May 15, 2012 at 1:50 AM, R. Michael Weylandt
<michael.weylandt at gmail.com> wrote:
Don't use subset for a function name -- it's already the name of a rather important function as is data (but at least that one's not
a
function in your use so it's not quite so bad). Finally, use
dput()
when sending data so we get a plaintext reproducible version. I'd try something like this: dats <- structure(list(Study = c(1L, 1L, 2L, 2L, 3L, 3L), TX =
c(1L,
0L, 1L, 0L, 1L, 0L), AEs = c(3L, 2L, 1L, 2L, 1L, 1L), N = c(5L,
7L, 10L, 7L, 8L, 4L)), .Names = c("Study", "TX", "AEs", "N"),
class =
"data.frame", row.names = c("1",
"2", "3", "4", "5", "6"))
# See how handy dput can be :-)
dats[unlist(mapply(FUN = function(x,y) rep(x, y), 1:NROW(dats),
dats$N)), -4]
which isn't super elegant, but others might have something better.
Best,
Michael
On Tue, May 15, 2012 at 1:24 AM, Cheenghee AM Koh
<sigontw at gmail.com>
wrote:
Hello, R-fellows, I have a question that I really don't know how to solve. I have
spent
hours on line surfing for possible solutions but in veil. Please if
anyone
could help me handle this issue, you would be so appreciated! I have a "grouped" dataset like this:
data
?Study TX AEs ? N 1 ? ? 1 ? ? 1 ? ?3 ? ? ? 5 2 ? ? 1 ? ? 0 ? ?2 ? ? ? 7 3 ? ? 2 ? ? 1 ? ?1 ? ? ?10 4 ? ? 2 ? ? 0 ? ?2 ? ? ? 7 5 ? ? 3 ? ? 1 ? ?1 ? ? ? 8 6 ? ? 3 ? ? 0 ? ?1 ? ? ? 4 where Study is the study id, TX is treatment, AEs is how many
people in
this trial is positive, and N is the number of the subjects.
Therefore,
for the row 1, it stands for: It is the treatment arm for the study
one,
where there are 5 subjects and 3 of them are positive. The row 2 stands
for:
It is the control arm of the study 1 where there are 7 subjects and
2 of
them are positive. Now I would like to "un-group them", make it like: Study ?TX ? AEs ? 1 ? ? ? ? 1 ? ? ?1 ? 1 ? ? ? ? 1 ? ? ?1 ? 1 ? ? ? ? 1 ? ? ?1 ? 1 ? ? ? ? 1 ? ? ?0 ? 1 ? ? ? ? 1 ? ? ?0 ? 1 ? ? ? ? 0 ? ? ?1 ? 1 ? ? ? ? 0 ? ? ?1 ? 1 ? ? ? ? 0 ? ? ?0 ? 1 ? ? ? ? 0 ? ? ?0 ? 1 ? ? ? ? 0 ? ? ?0 ? 1 ? ? ? ? 0 ? ? ?0 ? 1 ? ? ? ? 0 ? ? ?0 ? 2 ? ? ? ? 1 ? ? ?1 ? ..................... ?..................... But I wasn't able to do it. In fact I wrote a small function, and
use
"lapply" to get what I want. It worked well, and did give me what
I
want. But I wasn't able to collapse all the returns into one single
data
frame
for subsequent analysis.
The function I wrote:
subset = function(i){
d = c(rep(data[i,1], data[i,4]), rep(data[i,2], data[i,4]),
rep(0:1,
c(data[i,4] - data[i,3],data[i,3]))) d = matrix(d, data[i,4],3) d } then: Data = lapply(1:6, subset) Data Therefore, I tried to write a loop. But no matter how I tried, I
can't
get what I want. Any idea? Thank you so much! Best, -- Cheenghee Masaki Koh, MSW, MS(c), PhD Student School of Social Service Administration Department of Health Studies, Division of Biological Science University of Chicago ? ? ? ?[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible
code.
-- Cheenghee Masaki Koh, MSW, MS(c), PhD Student School of Social Service Administration Department of Health Studies, Division of Biological Science University of Chicago
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code.