Skip to content

Finding unique terms

11 messages · Tóth Dénes, Jeff Newmiller, Jim Lemon +3 more

#
Dear r-users,

I have this data:

structure(list(STUDENT_ID = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L), .Label = c("AA15285", "AA15286"), class = "factor"),
    COURSE_CODE = structure(c(1L, 2L, 5L, 6L, 7L, 8L, 2L, 3L,
    4L, 5L, 6L), .Label = c("BAA1113", "BAA1322", "BAA2113",
    "BAA2513", "BAA2713", "BAA2921", "BAA4273", "BAA4513"), class =
"factor"),
    PO1M = c(155.7, 48.9, 83.2, NA, NA, NA, 48.05, 68.4, 41.65,
    82.35, NA), PO1T = c(180, 70, 100, NA, NA, NA, 70, 100, 60,
    100, NA), PO2M = c(NA, NA, NA, 37, NA, NA, NA, NA, NA, NA,
    41), PO2T = c(NA, NA, NA, 50, NA, NA, NA, NA, NA, NA, 50),
    X = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), X.1 = c(NA,
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)), .Names = c("STUDENT_ID",
"COURSE_CODE", "PO1M", "PO1T", "PO2M", "PO2T", "X", "X.1"), class =
"data.frame", row.names = c(NA,
-11L))

I want to combine the same Student ID and add up all the values for PO1M,
PO1T,...,PO2T obtained by the same ID.

How do I do that?
Thank you for any help given.
#
On 10/12/2018 12:12 AM, roslinazairimah zakaria wrote:
dat <- structure(list(STUDENT_ID = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L), .Label = c("AA15285", "AA15286"), class = "factor"),
     COURSE_CODE = structure(c(1L, 2L, 5L, 6L, 7L, 8L, 2L, 3L,
     4L, 5L, 6L), .Label = c("BAA1113", "BAA1322", "BAA2113",
     "BAA2513", "BAA2713", "BAA2921", "BAA4273", "BAA4513"), class =
"factor"),
     PO1M = c(155.7, 48.9, 83.2, NA, NA, NA, 48.05, 68.4, 41.65,
     82.35, NA), PO1T = c(180, 70, 100, NA, NA, NA, 70, 100, 60,
     100, NA), PO2M = c(NA, NA, NA, 37, NA, NA, NA, NA, NA, NA,
     41), PO2T = c(NA, NA, NA, 50, NA, NA, NA, NA, NA, NA, 50),
     X = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), X.1 = c(NA,
     NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)), .Names = c("STUDENT_ID",
"COURSE_CODE", "PO1M", "PO1T", "PO2M", "PO2T", "X", "X.1"), class =
"data.frame", row.names = c(NA,
-11L))

# I assume you would like to add up the values with na.rm = TRUE
meanFn <- function(x) mean(x, na.rm = TRUE)

# see ?aggregate
aggregate(dat[, c("PO1M", "PO1T", "PO2M")],
           by = dat["STUDENT_ID"],
           FUN = meanFn)

# if you have largish or large data
library(data.table)
dat2 <- as.data.table(dat)
dat2[, lapply(.SD, meanFn),
      by = STUDENT_ID,
      .SDcols = c("PO1M", "PO1T", "PO2M")]


Regards,
Denes
#
Hi Denes,

It works perfectly as I want!

Thanks a lot.
On Fri, Oct 12, 2018 at 6:29 AM D?nes T?th <toth.denes at kogentum.hu> wrote:

            

  
    
#
You said "add up"... so you did not mean to say that? Denes computed the mean...
On October 11, 2018 3:56:23 PM PDT, roslinazairimah zakaria <roslinaump at gmail.com> wrote:

  
    
#
Yes, I thought that as well and had worked out this but didn't send it:

add_Pscores<-function(x) {
return(sum(unlist(x),na.rm=TRUE))
}
by(rzdf[,c("PO1M", "PO1T", "PO2M", "PO2T")],rzdf$STUDENT_ID,FUN=add_Pscores)
rzdf$STUDENT_ID: AA15285
[1] 724.8
------------------------------------------------------------
rzdf$STUDENT_ID: AA15286
[1] 661.45

Jim
On Fri, Oct 12, 2018 at 1:37 PM Jeff Newmiller <jdnewmil at dcn.davis.ca.us> wrote:
#
On 10/12/2018 04:36 AM, Jeff Newmiller wrote:
Nice catch, Jeff. Of course I wanted to use 'sum' instead of 'mean'.
#
On 10/12/2018 08:58 AM, D?nes T?th wrote:
Oh, and one more note: If you have NAs in your columns, 'sum' is rarely 
the aggregate statistic that you are after. Probably this is why my 
subconscious statistician suggested 'mean'.

  
    
3 days later
#
# load data

# Enter dataframe by hand
dat <- structure(list(STUDENT_ID = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L), .Label = c("AA15285", "AA15286"), class = "factor"),
 ??? COURSE_CODE = structure(c(1L, 2L, 5L, 6L, 7L, 8L, 2L, 3L,
 ??? 4L, 5L, 6L), .Label = c("BAA1113", "BAA1322", "BAA2113",
 ??? "BAA2513", "BAA2713", "BAA2921", "BAA4273", "BAA4513"), class =
"factor"),
 ??? PO1M = c(155.7, 48.9, 83.2, NA, NA, NA, 48.05, 68.4, 41.65,
 ??? 82.35, NA), PO1T = c(180, 70, 100, NA, NA, NA, 70, 100, 60,
 ??? 100, NA), PO2M = c(NA, NA, NA, 37, NA, NA, NA, NA, NA, NA,
 ??? 41), PO2T = c(NA, NA, NA, 50, NA, NA, NA, NA, NA, NA, 50),
 ??? X = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), X.1 = c(NA,
 ??? NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)), .Names = c("STUDENT_ID",
"COURSE_CODE", "PO1M", "PO1T", "PO2M", "PO2T", "X", "X.1"), class =
"data.frame", row.names = c(NA,
-11L))

# Create sums by student ID

library(dplyr)
dat %>%
 ? group_by(STUDENT_ID) %>%
 ? summarize(sum.PO1M = sum(PO1M, na.rm = TRUE),
 ??????????? sum.PO1T = sum(PO1M, na.rm = TRUE),
 ??????????? sum.PO2M = sum(PO1M, na.rm = TRUE),
 ??????????? sum.PO2T = sum(PO1M, na.rm = TRUE))
#
On 10/11/2018 5:12 PM, roslinazairimah zakaria wrote:
oops!? Forgot to clean up after my cut and paste. Solution with dplyr 
looks like this:
# Create sums by student ID
library(dplyr)
dat %>%
 ? group_by(STUDENT_ID) %>%
 ? summarize(sum.PO1M = sum(PO1M, na.rm = TRUE),
 ??????????? sum.PO1T = sum(PO1T, na.rm = TRUE),
 ??????????? sum.PO2M = sum(PO2M, na.rm = TRUE),
 ??????????? sum.PO2T = sum(PO2T, na.rm = TRUE))
#
Here is a base R solution:
"dat" is the data frame as in Robert's solution.
STUDENT_ID   PO1M PO1T PO2M PO2T
1    AA15285 287.80  350   37   50
2    AA15286 240.45  330   41   50

Cheers,
Bert



Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Mon, Oct 15, 2018 at 6:42 PM Robert Baer <rbaer at atsu.edu> wrote:

            

  
  
#
Yes you are right, I want the sum. I wll change the formula accordingly.

On Fri, Oct 12, 2018 at 10:36 AM Jeff Newmiller <jdnewmil at dcn.davis.ca.us>
wrote: