Subset dataframe
Hi Jade,
One way that you can do this is using the 'sqldf' package.
LRVS_cpue <- rnorm(100)*10
outing_ID <- rep(c("51801", "51802", "51803", "51804", "51805"), each = 20)
cpueData <- data.frame(outing_ID = outing_ID, LRVS_cpue = LRVS_cpue)
library(sqldf)
# This returns only the max values for each outing_ID and drops all the
other rows. It sounds
# like you may want the rows with out the max values.
# Also, you'll want to check on how missing values might influence your
query.
sqldf("SELECT outing_ID, max(LRVS_cpue) FROM cpueData GROUP BY outing_ID
ORDER BY outing_ID;" )
Hope this helps,
Patrick
On 4/16/13 5:12 AM, Jade Maggs wrote:
Hi list, I need to subset the dataframe below by selecting rows with maximum LRVS_cpue values for each outing_ID. For example, where outing_ID == 51801, the new dataframe should have only one row with LRVS_cpue = 0.5. LRVS_cpue in all other rows should remain as 0. I have over 650 000 rows, so looping is very slow. I have tried: >cpueData1 <- data.frame(unique(cpueData[max(cpueData$LRVS_cpue),])) but this does not work. Any help would be greatly appreciated. patrol_ID outing_ID num_anglers hours_fish ang_hours LRVS_cpue 51709 51795 2 3.5 7 0 51709 51796 1 0.5 0.5 0 51709 51797 1 1 1 0 51709 51798 1 2 2 0 51709 51799 5 5.5 27.5 0 51709 51800 1 3 3 0 51709 51801 2 1 2 0 51709 51801 2 1 2 0.5 51709 51802 1 1.5 1.5 0 51709 51803 3 1 3 0 51709 51804 4 1 4 0 JADE MAGGS Assistant Scientist [[alternative HTML version deleted]]
_______________________________________________ R-sig-ecology mailing list R-sig-ecology at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology