Message-ID: <39B6DDB9048D0F4DAD42CB26AAFF0AFA0744942C@usctmx1106.merck.com>
Date: 2009-05-15T12:49:38Z
From: Liaw, Andy
Subject: Using sample to create Training and Test sets
In-Reply-To: <4A0D1CFE.20106@bris.ac.uk>
Here's one possibility:
idx <- sample(nrow(acc))
training <- acc[idx[1:400], ]
testset <- acc[-idx[1:400], ]
Andy
From: Chris Arthur
>
> Forgive the newbie question, I want to select random rows from my
> data.frame to create a test set (which I can do) but then I want to
> create a training set using whats left over.
>
> Example code:
> acc <- read.table("accOUT.txt", header=T, sep = ",", row.names=1)
> #select 400 random rows in data
> training <- acc[sample(1:nrow(acc), 400, replace=TRUE),]
>
> #try to get whats left of acc not in training
> testset <- acc[-training, ]
> Fails with the following error....
> Error: invalid subscript type
> In addition: Warning message:
> - not meaningful for factors in: Ops.factor(left)
>
> I then try.
> testset <- acc[!training, ]
> Which gives me the warning message
> ! not meaningful for factors in: Ops.factor(left)
> And if i look at testset It is 400 rows of NA's ... which
> clearly isn't
> right.
>
> Can anyone tell me what I'm doing wrong.
>
> Thanks in advance
>
> Chris
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Notice: This e-mail message, together with any attachme...{{dropped:12}}