Using sample to create Training and Test sets
Note that the single split sample technique is not competitive with other approaches unless the sample size exceeds around 20,000. Frank
Chris Arthur wrote:
Forgive the newbie question, I want to select random rows from my
data.frame to create a test set (which I can do) but then I want to
create a training set using whats left over.
Example code:
acc <- read.table("accOUT.txt", header=T, sep = ",", row.names=1)
#select 400 random rows in data
training <- acc[sample(1:nrow(acc), 400, replace=TRUE),]
#try to get whats left of acc not in training
testset <- acc[-training, ]
Fails with the following error....
Error: invalid subscript type
In addition: Warning message:
- not meaningful for factors in: Ops.factor(left)
I then try.
testset <- acc[!training, ]
Which gives me the warning message
! not meaningful for factors in: Ops.factor(left)
And if i look at testset It is 400 rows of NA's ... which clearly isn't
right.
Can anyone tell me what I'm doing wrong.
Thanks in advance
Chris
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University