Subset dataframe

Hi Jade,

One way that you can do this is using the 'sqldf' package.

LRVS_cpue <- rnorm(100)*10
outing_ID <- rep(c("51801", "51802", "51803", "51804", "51805"), each = 20)

cpueData <- data.frame(outing_ID = outing_ID, LRVS_cpue = LRVS_cpue)

library(sqldf)
# This returns only the max values for each outing_ID and drops all the 
other rows. It sounds
# like you may want the rows with out the max values.
# Also, you'll want to check on how missing values might influence your 
query.
sqldf("SELECT outing_ID, max(LRVS_cpue) FROM cpueData GROUP BY outing_ID 
ORDER BY outing_ID;" )

Hope this helps,
Patrick

Subset dataframe

Thread (3 messages)