Hello First, thanks for the help for an earlier question about error handling! I have problem filtering a dataset. I'm trying to filter the data in the y columns based on the values in the x column, e.g.: x y1 y2 yn 1.0 1 NA 3 2.0 1 NA 11 2.0 2 NA NA 3.0 1 5 16 3.0 7 5 2 4.0 8 4 1 and want to keep the highest y if x is identical, like this: x y1 y2 yn 1.0 1 NA 3 2.0 2 NA 11 3.0 7 5 16 4.0 8 4 1 or just as good: x y1 y2 yn 1.0 1 NA 3 2.0 NA* NA NA 2.0 2 NA 11 3.0 NA* 5 16 3.0 7 NA* NA* 4.0 8 4 1 If any has any suggestions or pointers how to do this I would really appreciate it. /Anders
filter data set unique, duplicate..
3 messages · Anders Bjørgesæter, Dimitris Rizopoulos, Sundar Dorai-Raj
maybe you could consider something like this:
dat <- data.frame(x = c(1, 2, 2, 3, 3, 4),
y1 = c(1, 1, 2, 1, 7, 8),
y2 = c(NA, NA, NA, 5, 5, 4),
y3 = c(3, 11, NA, 16, 2, 1))
#############
out <- as.data.frame(lapply(dat[-1], function(y, x) tapply(y, x, max,
na.rm = TRUE), x = dat["x"]))
out[out == -Inf] <- NA
out$x <- unique(dat["x"])
out
I hope it helps.
Best,
Dimitris
----
Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven
Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/16/336899
Fax: +32/16/337015
Web: http://www.med.kuleuven.be/biostat/
http://www.student.kuleuven.be/~m0390867/dimitris.htm
----- Original Message -----
From: "Anders Bj??rges??ter" <anders.bjorgesater at bio.uio.no>
To: <r-help at stat.math.ethz.ch>
Sent: Wednesday, August 03, 2005 10:40 AM
Subject: [R] filter data set unique, duplicate..
Hello First, thanks for the help for an earlier question about error handling! I have problem filtering a dataset. I'm trying to filter the data in the y columns based on the values in the x column, e.g.: x y1 y2 yn 1.0 1 NA 3 2.0 1 NA 11 2.0 2 NA NA 3.0 1 5 16 3.0 7 5 2 4.0 8 4 1 and want to keep the highest y if x is identical, like this: x y1 y2 yn 1.0 1 NA 3 2.0 2 NA 11 3.0 7 5 16 4.0 8 4 1 or just as good: x y1 y2 yn 1.0 1 NA 3 2.0 NA* NA NA 2.0 2 NA 11 3.0 NA* 5 16 3.0 7 NA* NA* 4.0 8 4 1 If any has any suggestions or pointers how to do this I would really appreciate it. /Anders
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Hi, Anders/Dimitris,
Dimitris Rizopoulos wrote:
maybe you could consider something like this:
dat <- data.frame(x = c(1, 2, 2, 3, 3, 4),
y1 = c(1, 1, 2, 1, 7, 8),
y2 = c(NA, NA, NA, 5, 5, 4),
y3 = c(3, 11, NA, 16, 2, 1))
#############
out <- as.data.frame(lapply(dat[-1], function(y, x) tapply(y, x, max,
na.rm = TRUE), x = dat["x"]))
out[out == -Inf] <- NA
out$x <- unique(dat["x"])
Beware this line. If "x" is not sorted as it is in "dat" then your rows
will be misaligned.
Here's another solution using "by" though it's no more efficient than
what Dimitris has given.
out <- by(dat[-1], dat[1], function(y) {
max.na <- function(x)
if(all(is.na(x))) NA else max(x, na.rm = TRUE)
apply(y, 2, max.na)
})
out <- as.data.frame(do.call("rbind", out))
out <- cbind(x = as.numeric(row.names(out)), out)
out
HTH,
--sundar
out I hope it helps. Best, Dimitris ---- Dimitris Rizopoulos Ph.D. Student Biostatistical Centre School of Public Health Catholic University of Leuven Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/16/336899 Fax: +32/16/337015 Web: http://www.med.kuleuven.be/biostat/ http://www.student.kuleuven.be/~m0390867/dimitris.htm ----- Original Message ----- From: "Anders Bj??rges??ter" <anders.bjorgesater at bio.uio.no> To: <r-help at stat.math.ethz.ch> Sent: Wednesday, August 03, 2005 10:40 AM Subject: [R] filter data set unique, duplicate..
Hello First, thanks for the help for an earlier question about error handling! I have problem filtering a dataset. I'm trying to filter the data in the y columns based on the values in the x column, e.g.: x y1 y2 yn 1.0 1 NA 3 2.0 1 NA 11 2.0 2 NA NA 3.0 1 5 16 3.0 7 5 2 4.0 8 4 1 and want to keep the highest y if x is identical, like this: x y1 y2 yn 1.0 1 NA 3 2.0 2 NA 11 3.0 7 5 16 4.0 8 4 1 or just as good: x y1 y2 yn 1.0 1 NA 3 2.0 NA* NA NA 2.0 2 NA 11 3.0 NA* 5 16 3.0 7 NA* NA* 4.0 8 4 1 If any has any suggestions or pointers how to do this I would really appreciate it. /Anders
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html