Skip to content
Prev 175207 / 398506 Next

: how to select rows at random

On Fri, 2009-03-27 at 15:11 -0400, Laura Rodriguez Murillo wrote:
Not sure what you mean by identifiers, but to select a subset of the
2000 cells in that column, you could use sample(). See ?sample for
details, but here is an example.

## choose a random subset of 500 out of 2000 entries
## dummy data
dat <- data.frame(identifiers = sample(2000, 2000), X = rnorm(2000))
## set seed to make this the same on your PC as mine
## comment this if you want a different subset each time you run
set.seed(1234)
## random subset of 500
want <- sample(2000, 500)
## select out that subset
## head to show only first n of the selected
head(dat$identifiers[want])

Gives:
[1] 1327  587  835  430 1422 1687

This assumes the identifiers are unique.

HTH

G