Skip to content
Prev 274487 / 398506 Next

Split a list

Hi:

Following the lead of others, here's a reproducible example that I
believe achieves what you want.

# Q1:
L <- lapply(1:3, function(n)
    data.frame(x = rnorm(6), y = rnorm(6), g = rep(1:2, each = 3)))

# Using David's suggestion:
L1 <- lapply(L, function(d) subset(d, g == 1L))
L2 <- lapply(L, function(d) subset(d, g == 2L))

# Q2:
# Let range > 2 to retain in this small example:
# Find the range of the second column of each list component:
sapply(L, function(x) diff(range(x[, 2], na.rm = TRUE)))

# The code retains the data frame if the range of the second
# column is > 2, otherwise it is set to NULL:
lapply(L, function(d) if(diff(range(d[, 2], na.rm = TRUE)) > 2) d else NULL)

# If you want to collapse the result into a data frame, the base R
approach would be
do.call('rbind', lapply(L, function(d) if(diff(range(d[, 2], na.rm =
TRUE)) > 2) d else NULL))

# An equivalent way to do all of this in the plyr package is:
library('plyr')
L1 <- llply(L, function(d) subset(d, g == 1L))
L2 <- llply(L, function(d) subset(d, g == 2L))

ldply(L, function(d) if(diff(range(d[, 2], na.rm = TRUE)) > 2) d else NULL)

There are advantages to naming the list components if this is what you
have in mind, since both ldply() and the rbind from do.call() will
output indicators of which component data frame each observation
belongs; ldply() uses an .id variable to designate the list component
name whereas do.call(rbind, ...) uses rownames to distinguish
observations. For this example, try

names(L) <- paste('d', 1:3, sep = '')

and run the code above again to see the difference.

HTH,
Dennis
On Fri, Oct 14, 2011 at 6:06 AM, Juliet Ndukum <jpntsang at yahoo.com> wrote: