An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20111114/80b9be76/attachment.pl>
max & min values within dataframe
6 messages · B Laura, Sarah Goslee, Joshua Wiley +2 more
Hi Laura, This looks suspiciously like homework. Nonetheless, you may wish to check out ?cbind. Sarah
On Mon, Nov 14, 2011 at 11:10 AM, B Laura <gm.spam2011 at gmail.com> wrote:
dear R-team I need to find the min, max values for each patient from dataset and keep the output of it as a dataframe with the following columns ?- Patient nr ?- Region (remains same per patient) ?- Min score ?- Max score ? ?Patient Region Score Time 1 ? ? ? ?1 ? ? ?X ? ?19 ? 28 2 ? ? ? ?1 ? ? ?X ? ?20 ?126 3 ? ? ? ?1 ? ? ?X ? ?22 ?100 4 ? ? ? ?1 ? ? ?X ? ?25 ?191 5 ? ? ? ?2 ? ? ?Y ? ?12 ? ?1 6 ? ? ? ?2 ? ? ?Y ? ?12 ? ?2 7 ? ? ? ?2 ? ? ?Y ? ?25 ? ?4 8 ? ? ? ?2 ? ? ?Y ? ?26 ? ?7 9 ? ? ? ?3 ? ? ?X ? ? 6 ? ?1 10 ? ? ? 3 ? ? ?X ? ? 6 ? ?4 11 ? ? ? 3 ? ? ?X ? ?21 ? 31 12 ? ? ? 3 ? ? ?X ? ?22 ? 68 13 ? ? ? 3 ? ? ?X ? ?23 ? 31 14 ? ? ? 3 ? ? ?X ? ?24 ? 38 15 ? ? ? 3 ? ? ?X ? ?21 ? 15 16 ? ? ? 3 ? ? ?X ? ?22 ? 24 17 ? ? ? 3 ? ? ?X ? ?23 ? 15 18 ? ? ? 3 ? ? ?X ? ?24 ?243 19 ? ? ? 3 ? ? ?X ? ?25 ? 77 20 ? ? ? 4 ? ? ?Y ? ? 6 ? ?5 21 ? ? ? 4 ? ? ?Y ? ?22 ? 28 22 ? ? ? 4 ? ? ?Y ? ?23 ? 75 23 ? ? ? 4 ? ? ?Y ? ?24 ? 19 24 ? ? ? 5 ? ? ?Y ? ?23 ? ?3 25 ? ? ? 5 ? ? ?Y ? ?24 ? ?1 26 ? ? ? 5 ? ? ?Y ? ?23 ? 33 27 ? ? ? 5 ? ? ?Y ? ?24 ? 13 28 ? ? ? 5 ? ? ?Y ? ?25 ? 42 29 ? ? ? 5 ? ? ?Y ? ?26 ? 21 30 ? ? ? 5 ? ? ?Y ? ?27 ? ?4 31 ? ? ? 6 ? ? ?Y ? ?24 ? ?4 32 ? ? ? 6 ? ? ?Y ? ?32 ? ?8 So far I could find the min and max values for each patient, but the output of it is not (yet) what I need.
Patient.nr = unique(Patient) aggregate(Score, list(Patient), max)
?Group.1 ?x 1 ? ? ? 1 25 2 ? ? ? 2 26 3 ? ? ? 3 25 4 ? ? ? 4 24 5 ? ? ? 5 27 6 ? ? ? 6 32
aggregate(Score, list(Patient), min)
?Group.1 ?x 1 ? ? ? 1 19 2 ? ? ? 2 12 3 ? ? ? 3 ?6 4 ? ? ? 4 ?6 5 ? ? ? 5 23 6 ? ? ? 6 24 I would like to do same but writing this new information (min, max values) in a dataframe with following columns ?- Patient nr - Region (remains same per patient) - Min score - Max score Can anybody help me with this? Thanks Laura
Sarah Goslee http://www.functionaldiversity.org
Hi Laura,
You were close. Just use range() instead of min/max:
## your data (read in and then pasted the output of dput() to make it easy)
dat <- structure(list(Patient = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 6L, 6L), Region = structure(c(1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("X",
"Y"), class = "factor"), Score = c(19L, 20L, 22L, 25L, 12L, 12L,
25L, 26L, 6L, 6L, 21L, 22L, 23L, 24L, 21L, 22L, 23L, 24L, 25L,
6L, 22L, 23L, 24L, 23L, 24L, 23L, 24L, 25L, 26L, 27L, 24L, 32L
), Time = c(28L, 126L, 100L, 191L, 1L, 2L, 4L, 7L, 1L, 4L, 31L,
68L, 31L, 38L, 15L, 24L, 15L, 243L, 77L, 5L, 28L, 75L, 19L, 3L,
1L, 33L, 13L, 42L, 21L, 4L, 4L, 8L)), .Names = c("Patient", "Region",
"Score", "Time"), class = "data.frame", row.names = c("1", "2",
"3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14",
"15", "16", "17", "18", "19", "20", "21", "22", "23", "24", "25",
"26", "27", "28", "29", "30", "31", "32"))
tmp <- with(dat, aggregate(Score, list(Patient), range))
tmpreg <- with(dat, Region[!duplicated(Patient)])
results <- data.frame(tmp$Group.1, tmpreg, tmp$x)
colnames(results) <- c("Patient", "Region", "Min", "Max")
Note it is a little tricky to get the results in a data frame, because
tmp is a bit of an odd data frame---due to the way aggregate works,
the the first column of the data frame is a regular vector, but the
second column actually contains a two column matrix. To get it into
regular form, I extracted them separately when creating 'results'.
Cheers,
Josh
On Mon, Nov 14, 2011 at 8:10 AM, B Laura <gm.spam2011 at gmail.com> wrote:
dear R-team I need to find the min, max values for each patient from dataset and keep the output of it as a dataframe with the following columns ?- Patient nr ?- Region (remains same per patient) ?- Min score ?- Max score ? ?Patient Region Score Time 1 ? ? ? ?1 ? ? ?X ? ?19 ? 28 2 ? ? ? ?1 ? ? ?X ? ?20 ?126 3 ? ? ? ?1 ? ? ?X ? ?22 ?100 4 ? ? ? ?1 ? ? ?X ? ?25 ?191 5 ? ? ? ?2 ? ? ?Y ? ?12 ? ?1 6 ? ? ? ?2 ? ? ?Y ? ?12 ? ?2 7 ? ? ? ?2 ? ? ?Y ? ?25 ? ?4 8 ? ? ? ?2 ? ? ?Y ? ?26 ? ?7 9 ? ? ? ?3 ? ? ?X ? ? 6 ? ?1 10 ? ? ? 3 ? ? ?X ? ? 6 ? ?4 11 ? ? ? 3 ? ? ?X ? ?21 ? 31 12 ? ? ? 3 ? ? ?X ? ?22 ? 68 13 ? ? ? 3 ? ? ?X ? ?23 ? 31 14 ? ? ? 3 ? ? ?X ? ?24 ? 38 15 ? ? ? 3 ? ? ?X ? ?21 ? 15 16 ? ? ? 3 ? ? ?X ? ?22 ? 24 17 ? ? ? 3 ? ? ?X ? ?23 ? 15 18 ? ? ? 3 ? ? ?X ? ?24 ?243 19 ? ? ? 3 ? ? ?X ? ?25 ? 77 20 ? ? ? 4 ? ? ?Y ? ? 6 ? ?5 21 ? ? ? 4 ? ? ?Y ? ?22 ? 28 22 ? ? ? 4 ? ? ?Y ? ?23 ? 75 23 ? ? ? 4 ? ? ?Y ? ?24 ? 19 24 ? ? ? 5 ? ? ?Y ? ?23 ? ?3 25 ? ? ? 5 ? ? ?Y ? ?24 ? ?1 26 ? ? ? 5 ? ? ?Y ? ?23 ? 33 27 ? ? ? 5 ? ? ?Y ? ?24 ? 13 28 ? ? ? 5 ? ? ?Y ? ?25 ? 42 29 ? ? ? 5 ? ? ?Y ? ?26 ? 21 30 ? ? ? 5 ? ? ?Y ? ?27 ? ?4 31 ? ? ? 6 ? ? ?Y ? ?24 ? ?4 32 ? ? ? 6 ? ? ?Y ? ?32 ? ?8 So far I could find the min and max values for each patient, but the output of it is not (yet) what I need.
Patient.nr = unique(Patient) aggregate(Score, list(Patient), max)
?Group.1 ?x 1 ? ? ? 1 25 2 ? ? ? 2 26 3 ? ? ? 3 25 4 ? ? ? 4 24 5 ? ? ? 5 27 6 ? ? ? 6 32
aggregate(Score, list(Patient), min)
?Group.1 ?x 1 ? ? ? 1 19 2 ? ? ? 2 12 3 ? ? ? 3 ?6 4 ? ? ? 4 ?6 5 ? ? ? 5 23 6 ? ? ? 6 24 I would like to do same but writing this new information (min, max values) in a dataframe with following columns ?- Patient nr - Region (remains same per patient) - Min score - Max score Can anybody help me with this? Thanks Laura ? ? ? ?[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, ATS Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/
I took a stab at this using ddply() from the plyr package. How's this
look to you?
x<- textConnection("Col Patient Region Score Time
1 1 X 19 28
2 1 X 20 126
3 1 X 22 100
4 1 X 25 191
5 2 Y 12 1
6 2 Y 12 2
7 2 Y 25 4
8 2 Y 26 7
9 3 X 6 1
10 3 X 6 4
11 3 X 21 31
12 3 X 22 68
13 3 X 23 31
14 3 X 24 38
15 3 X 21 15
16 3 X 22 24
17 3 X 23 15
18 3 X 24 243
19 3 X 25 77
20 4 Y 6 5
21 4 Y 22 28
22 4 Y 23 75
23 4 Y 24 19
24 5 Y 23 3
25 5 Y 24 1
26 5 Y 23 33
27 5 Y 24 13
28 5 Y 25 42
29 5 Y 26 21
30 5 Y 27 4
31 6 Y 24 4
32 6 Y 32 8")
V = read.table(x, header = T)[,-1]
closeAllConnections()
rm("x")
# Everything above is just stuff to get the data in.
R <- ddply(V, c("Patient","Region"), function(d) {c(max =
max(d$Score),min = min(d$Score))})
Patient Region max min
1 1 X 25 19
2 2 Y 26 12
3 3 X 25 6
4 4 Y 24 6
5 5 Y 27 23
6 6 Y 32 24
Michael
On Mon, Nov 14, 2011 at 11:32 AM, Joshua Wiley <jwiley.psych at gmail.com> wrote:
Hi Laura,
You were close. ?Just use range() instead of min/max:
## your data (read in and then pasted the output of dput() to make it easy)
dat <- structure(list(Patient = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 6L, 6L), Region = structure(c(1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("X",
"Y"), class = "factor"), Score = c(19L, 20L, 22L, 25L, 12L, 12L,
25L, 26L, 6L, 6L, 21L, 22L, 23L, 24L, 21L, 22L, 23L, 24L, 25L,
6L, 22L, 23L, 24L, 23L, 24L, 23L, 24L, 25L, 26L, 27L, 24L, 32L
), Time = c(28L, 126L, 100L, 191L, 1L, 2L, 4L, 7L, 1L, 4L, 31L,
68L, 31L, 38L, 15L, 24L, 15L, 243L, 77L, 5L, 28L, 75L, 19L, 3L,
1L, 33L, 13L, 42L, 21L, 4L, 4L, 8L)), .Names = c("Patient", "Region",
"Score", "Time"), class = "data.frame", row.names = c("1", "2",
"3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14",
"15", "16", "17", "18", "19", "20", "21", "22", "23", "24", "25",
"26", "27", "28", "29", "30", "31", "32"))
tmp <- with(dat, aggregate(Score, list(Patient), range))
tmpreg <- ?with(dat, Region[!duplicated(Patient)])
results <- data.frame(tmp$Group.1, tmpreg, tmp$x)
colnames(results) <- c("Patient", "Region", "Min", "Max")
Note it is a little tricky to get the results in a data frame, because
tmp is a bit of an odd data frame---due to the way aggregate works,
the the first column of the data frame is a regular vector, but the
second column actually contains a two column matrix. ?To get it into
regular form, I extracted them separately when creating 'results'.
Cheers,
Josh
On Mon, Nov 14, 2011 at 8:10 AM, B Laura <gm.spam2011 at gmail.com> wrote:
dear R-team I need to find the min, max values for each patient from dataset and keep the output of it as a dataframe with the following columns ?- Patient nr ?- Region (remains same per patient) ?- Min score ?- Max score ? ?Patient Region Score Time 1 ? ? ? ?1 ? ? ?X ? ?19 ? 28 2 ? ? ? ?1 ? ? ?X ? ?20 ?126 3 ? ? ? ?1 ? ? ?X ? ?22 ?100 4 ? ? ? ?1 ? ? ?X ? ?25 ?191 5 ? ? ? ?2 ? ? ?Y ? ?12 ? ?1 6 ? ? ? ?2 ? ? ?Y ? ?12 ? ?2 7 ? ? ? ?2 ? ? ?Y ? ?25 ? ?4 8 ? ? ? ?2 ? ? ?Y ? ?26 ? ?7 9 ? ? ? ?3 ? ? ?X ? ? 6 ? ?1 10 ? ? ? 3 ? ? ?X ? ? 6 ? ?4 11 ? ? ? 3 ? ? ?X ? ?21 ? 31 12 ? ? ? 3 ? ? ?X ? ?22 ? 68 13 ? ? ? 3 ? ? ?X ? ?23 ? 31 14 ? ? ? 3 ? ? ?X ? ?24 ? 38 15 ? ? ? 3 ? ? ?X ? ?21 ? 15 16 ? ? ? 3 ? ? ?X ? ?22 ? 24 17 ? ? ? 3 ? ? ?X ? ?23 ? 15 18 ? ? ? 3 ? ? ?X ? ?24 ?243 19 ? ? ? 3 ? ? ?X ? ?25 ? 77 20 ? ? ? 4 ? ? ?Y ? ? 6 ? ?5 21 ? ? ? 4 ? ? ?Y ? ?22 ? 28 22 ? ? ? 4 ? ? ?Y ? ?23 ? 75 23 ? ? ? 4 ? ? ?Y ? ?24 ? 19 24 ? ? ? 5 ? ? ?Y ? ?23 ? ?3 25 ? ? ? 5 ? ? ?Y ? ?24 ? ?1 26 ? ? ? 5 ? ? ?Y ? ?23 ? 33 27 ? ? ? 5 ? ? ?Y ? ?24 ? 13 28 ? ? ? 5 ? ? ?Y ? ?25 ? 42 29 ? ? ? 5 ? ? ?Y ? ?26 ? 21 30 ? ? ? 5 ? ? ?Y ? ?27 ? ?4 31 ? ? ? 6 ? ? ?Y ? ?24 ? ?4 32 ? ? ? 6 ? ? ?Y ? ?32 ? ?8 So far I could find the min and max values for each patient, but the output of it is not (yet) what I need.
Patient.nr = unique(Patient) aggregate(Score, list(Patient), max)
?Group.1 ?x 1 ? ? ? 1 25 2 ? ? ? 2 26 3 ? ? ? 3 25 4 ? ? ? 4 24 5 ? ? ? 5 27 6 ? ? ? 6 32
aggregate(Score, list(Patient), min)
?Group.1 ?x 1 ? ? ? 1 19 2 ? ? ? 2 12 3 ? ? ? 3 ?6 4 ? ? ? 4 ?6 5 ? ? ? 5 23 6 ? ? ? 6 24 I would like to do same but writing this new information (min, max values) in a dataframe with following columns ?- Patient nr - Region (remains same per patient) - Min score - Max score Can anybody help me with this? Thanks Laura ? ? ? ?[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, ATS Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Groupwise data summarization is a very common task, and it is worth learning the various ways to do it in R. Josh showed you one way to use aggregate() from the base package and Michael showed you one way of using the plyr package to do the same; another way would be ddply(df, .(Patient, Region), summarise, max = max(Score), min = min(Score)) to save on writing an explicit function. Similarly, if you have a version of R >= 2.11.0, the aggregate() function now has a nice formula interface, so Josh's code could also be written as aggregate(Score ~ Patient + Region, data = df, FUN = range) with a subsequent renaming of the variables as shown. Other packages that could perform this task with ease include the doBy package, the data.table package, the remix package, the Hmisc package and, if you are comfortable with SQL, the sqldf package. For relative novices, the doBy package is a very nice place to start because it comes with a well written vignette and the function names correspond well with the tasks they perform (e.g., summaryBy(), transformBy()). The plyr and data.table packages are more general and more powerful in terms of the types of tasks to which each is suited. Unlike aggregate() and doBy:::summaryBy(), these packages can process multivariable functions. As noted above, if you have an SQL background, sqldf operates on R data objects as though they were SQL tables, which is advantageous in complex data extraction tasks. Package remix is useful if you want to organize results into a tabular form that is reminiscent of SAS. HTH, Dennis
On Mon, Nov 14, 2011 at 8:10 AM, B Laura <gm.spam2011 at gmail.com> wrote:
dear R-team I need to find the min, max values for each patient from dataset and keep the output of it as a dataframe with the following columns ?- Patient nr ?- Region (remains same per patient) ?- Min score ?- Max score ? ?Patient Region Score Time 1 ? ? ? ?1 ? ? ?X ? ?19 ? 28 2 ? ? ? ?1 ? ? ?X ? ?20 ?126 3 ? ? ? ?1 ? ? ?X ? ?22 ?100 4 ? ? ? ?1 ? ? ?X ? ?25 ?191 5 ? ? ? ?2 ? ? ?Y ? ?12 ? ?1 6 ? ? ? ?2 ? ? ?Y ? ?12 ? ?2 7 ? ? ? ?2 ? ? ?Y ? ?25 ? ?4 8 ? ? ? ?2 ? ? ?Y ? ?26 ? ?7 9 ? ? ? ?3 ? ? ?X ? ? 6 ? ?1 10 ? ? ? 3 ? ? ?X ? ? 6 ? ?4 11 ? ? ? 3 ? ? ?X ? ?21 ? 31 12 ? ? ? 3 ? ? ?X ? ?22 ? 68 13 ? ? ? 3 ? ? ?X ? ?23 ? 31 14 ? ? ? 3 ? ? ?X ? ?24 ? 38 15 ? ? ? 3 ? ? ?X ? ?21 ? 15 16 ? ? ? 3 ? ? ?X ? ?22 ? 24 17 ? ? ? 3 ? ? ?X ? ?23 ? 15 18 ? ? ? 3 ? ? ?X ? ?24 ?243 19 ? ? ? 3 ? ? ?X ? ?25 ? 77 20 ? ? ? 4 ? ? ?Y ? ? 6 ? ?5 21 ? ? ? 4 ? ? ?Y ? ?22 ? 28 22 ? ? ? 4 ? ? ?Y ? ?23 ? 75 23 ? ? ? 4 ? ? ?Y ? ?24 ? 19 24 ? ? ? 5 ? ? ?Y ? ?23 ? ?3 25 ? ? ? 5 ? ? ?Y ? ?24 ? ?1 26 ? ? ? 5 ? ? ?Y ? ?23 ? 33 27 ? ? ? 5 ? ? ?Y ? ?24 ? 13 28 ? ? ? 5 ? ? ?Y ? ?25 ? 42 29 ? ? ? 5 ? ? ?Y ? ?26 ? 21 30 ? ? ? 5 ? ? ?Y ? ?27 ? ?4 31 ? ? ? 6 ? ? ?Y ? ?24 ? ?4 32 ? ? ? 6 ? ? ?Y ? ?32 ? ?8 So far I could find the min and max values for each patient, but the output of it is not (yet) what I need.
Patient.nr = unique(Patient) aggregate(Score, list(Patient), max)
?Group.1 ?x 1 ? ? ? 1 25 2 ? ? ? 2 26 3 ? ? ? 3 25 4 ? ? ? 4 24 5 ? ? ? 5 27 6 ? ? ? 6 32
aggregate(Score, list(Patient), min)
?Group.1 ?x 1 ? ? ? 1 19 2 ? ? ? 2 12 3 ? ? ? 3 ?6 4 ? ? ? 4 ?6 5 ? ? ? 5 23 6 ? ? ? 6 24 I would like to do same but writing this new information (min, max values) in a dataframe with following columns ?- Patient nr - Region (remains same per patient) - Min score - Max score Can anybody help me with this? Thanks Laura ? ? ? ?[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20111115/973182c5/attachment.pl>