Skip to content
Prev 342771 / 398506 Next

How to randomly extract a number of rows in a data frame

On Aug 1, 2014, at 1:58 PM, Stephen HK Wong <honkit at stanford.edu> wrote:

            
If your data frame is called 'DF':

  DF.Rand <- DF[sample(nrow(DF), 1000000), ]

See ?sample which will generate a random sample from a uniform distribution.

In the above, nrow(DF) returns the number of rows in DF and defines the sample space of 1:nrow(DF), from which 1000000 random integer values will be selected and used as indices to return the rows.

Using the built in 'iris' dataset, select 20 random rows from the 150 total:
Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
122          5.6         2.8          4.9         2.0  virginica
79           6.0         2.9          4.5         1.5 versicolor
109          6.7         2.5          5.8         1.8  virginica
106          7.6         3.0          6.6         2.1  virginica
49           5.3         3.7          1.5         0.2     setosa
125          6.7         3.3          5.7         2.1  virginica
1            5.1         3.5          1.4         0.2     setosa
68           5.8         2.7          4.1         1.0 versicolor
84           6.0         2.7          5.1         1.6 versicolor
110          7.2         3.6          6.1         2.5  virginica
113          6.8         3.0          5.5         2.1  virginica
64           6.1         2.9          4.7         1.4 versicolor
102          5.8         2.7          5.1         1.9  virginica
71           5.9         3.2          4.8         1.8 versicolor
69           6.2         2.2          4.5         1.5 versicolor
65           5.6         2.9          3.6         1.3 versicolor
74           6.1         2.8          4.7         1.2 versicolor
99           5.1         2.5          3.0         1.1 versicolor
135          6.1         2.6          5.6         1.4  virginica
41           5.0         3.5          1.3         0.3     setosa



Regards,

Marc Schwartz