An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20111101/ba9eb07c/attachment.pl>
Removal/selecting specific rows in a dataframe conditional on 2 columns
3 messages · Aurelie Cosandey Godin, R. Michael Weylandt, Dennis Murphy
Perhaps use tapply() to split by the survey unit and write a little identity function that returns only those rows you want, then patch them all back together with something like simplify2array(). Michael
On Tue, Nov 1, 2011 at 1:16 PM, Aurelie Cosandey Godin <godina at dal.ca> wrote:
Dear list, After reading different mails, blogs, and tried a few different codes without any success, I am asking your help! I have the following data frame where each row represent a survey unit with the following variables:
names(RV09)
?[1] "record.t" ?"trip" ? ? ?"set" ? ? ? "month" ? ? "stratum" ? "NAFO" ?[7] "unit.area" "time" ? ? ?"dur.set" ? "distance" ?"operation" "mean.d" [13] "min.d" ? ? "max.d" ? ? "temp.d" ? ?"slat" ? ? ?"slong" ? ? "spp" [19] "number" ? ?"weight" ? ?"elat" ? ? ?"elong" Each survey unit generates one set record, denoted by a 5 in column "record.t". Each species identified in this particular survey unit generates an additional set record, denoted by a 6.
unique(RV09$record.t)
[1] 5 6 Each survey unit are identified by a specific "trip" and "set" number, so if there is a 5 record type with no associated 6 records, it means that no species were observed in that survey unit. I would like to be able to select all and only these survey units, which represent my zeros. So as an exemple, in this trip number 913, set 1, 3, and 4 would be part of my "zeros" data.frame as they appear with no record.t 6, such that no species were observed in this survey unit.
head(RV09)
? record.t trip set month stratum NAFO unit.area time dur.set distance 585 ? ? ? ?5 ?913 ? 1 ? ?10 ? ? 351 ? 3O ? ? ? R31 1044 ? ? ?17 ? ? ? ?9 586 ? ? ? ?5 ?913 ? 2 ? ?10 ? ? 351 ? 3O ? ? ? R31 1440 ? ? ?17 ? ? ? ?9 587 ? ? ? ?6 ?913 ? 2 ? ?10 ? ? 351 ? 3O ? ? ? R31 1440 ? ? ?17 ? ? ? ?9 588 ? ? ? ?5 ?913 ? 3 ? ?10 ? ? 340 ? 3O ? ? ? Q31 1800 ? ? ?18 ? ? ? ?9 589 ? ? ? ?5 ?913 ? 4 ? ?10 ? ? 340 ? 3O ? ? ? Q32 2142 ? ? ?17 ? ? ? ?9 Any tips on how extract this "zero" data.frame in R? Thank you very much in advance! Best, ~Aurelie Aurelie Cosandey-Godin Ph.D. student, Department of Biology Industrial Graduate Fellow, WWF-Canada Dalhousie University | Email: godina at dal.ca ? ? ? ?[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Does this work?
library('plyr')
# Function to return a data frame if it has one row, else return NULL:
f <- function(d) if(nrow(d) == 1L) d else NULL
ddply(RV09, .(set, month), f)
record.t trip set month stratum NAFO unit.area time dur.set distance 1 5 913 1 10 351 3O R31 1044 17 9 2 5 913 3 10 340 3O Q31 1800 18 9 3 5 913 4 10 340 3O Q32 2142 17 9 ddply() is an apply-like function that takes a data frame as input and a data frame as output (hence the dd). The first argument is the data frame name, the second argument the set of grouping variables and the third is the function to be called (in this application). HTH, Dennis
On Tue, Nov 1, 2011 at 10:16 AM, Aurelie Cosandey Godin <godina at dal.ca> wrote:
Dear list, After reading different mails, blogs, and tried a few different codes without any success, I am asking your help! I have the following data frame where each row represent a survey unit with the following variables:
names(RV09)
?[1] "record.t" ?"trip" ? ? ?"set" ? ? ? "month" ? ? "stratum" ? "NAFO" ?[7] "unit.area" "time" ? ? ?"dur.set" ? "distance" ?"operation" "mean.d" [13] "min.d" ? ? "max.d" ? ? "temp.d" ? ?"slat" ? ? ?"slong" ? ? "spp" [19] "number" ? ?"weight" ? ?"elat" ? ? ?"elong" Each survey unit generates one set record, denoted by a 5 in column "record.t". Each species identified in this particular survey unit generates an additional set record, denoted by a 6.
unique(RV09$record.t)
[1] 5 6 Each survey unit are identified by a specific "trip" and "set" number, so if there is a 5 record type with no associated 6 records, it means that no species were observed in that survey unit. I would like to be able to select all and only these survey units, which represent my zeros. So as an exemple, in this trip number 913, set 1, 3, and 4 would be part of my "zeros" data.frame as they appear with no record.t 6, such that no species were observed in this survey unit.
head(RV09)
? record.t trip set month stratum NAFO unit.area time dur.set distance 585 ? ? ? ?5 ?913 ? 1 ? ?10 ? ? 351 ? 3O ? ? ? R31 1044 ? ? ?17 ? ? ? ?9 586 ? ? ? ?5 ?913 ? 2 ? ?10 ? ? 351 ? 3O ? ? ? R31 1440 ? ? ?17 ? ? ? ?9 587 ? ? ? ?6 ?913 ? 2 ? ?10 ? ? 351 ? 3O ? ? ? R31 1440 ? ? ?17 ? ? ? ?9 588 ? ? ? ?5 ?913 ? 3 ? ?10 ? ? 340 ? 3O ? ? ? Q31 1800 ? ? ?18 ? ? ? ?9 589 ? ? ? ?5 ?913 ? 4 ? ?10 ? ? 340 ? 3O ? ? ? Q32 2142 ? ? ?17 ? ? ? ?9 Any tips on how extract this "zero" data.frame in R? Thank you very much in advance! Best, ~Aurelie Aurelie Cosandey-Godin Ph.D. student, Department of Biology Industrial Graduate Fellow, WWF-Canada Dalhousie University | Email: godina at dal.ca ? ? ? ?[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.