Hi; Does anyone know how to create a calibration and validation set from a particular dataset? I have a dataframe with nearly 20,000 rows! and I would like to select (randomly) a subset from the original dataset (...I found how to do that) to use as calibration set. However, I don't know how to remove this "calibration" set from the original dataframe in order to get my "validation" set.....Any hint will be greatly appreciated. TT
calibration/validation sets
2 messages · Peyuco Porras Porras ., Ko-Kang Kevin Wang
Hi,
On Sat, 14 Aug 2004, Peyuco Porras Porras . wrote:
Hi; Does anyone know how to create a calibration and validation set from a particular dataset? I have a dataframe with nearly 20,000 rows! and I would like to select (randomly) a subset from the original dataset (...I found how to do that) to use as calibration set. However, I don't know how to remove this "calibration" set from the original dataframe in order to get my "validation" set.....Any hint will be greatly appreciated.
A really quick way, suppose you want to have 30% of your dataset as the validation set:
iris.id = sample(nrow(iris), nrow(iris) * 0.3) iris.valid = iris[iris.id, ] iris.train = iris[-iris.id, ] nrow(iris.valid)
[1] 45
nrow(iris.train)
[1] 105 The first line takes a sample of 30% of the number of rows in the Iris data. The second line does a subetting of those samples -- the validation set. The third takes what's left -- the training set. This is perhaps not efficient and the code can definitely be simplified...but it's Sunday morning and I haven't had my morning coffee yet :D Cheers, Kevin -------------------------------- Ko-Kang Kevin Wang PhD Student Centre for Mathematics and its Applications Building 27, Room 1004 Mathematical Sciences Institute (MSI) Australian National University Canberra, ACT 0200 Australia Homepage: http://wwwmaths.anu.edu.au/~wangk/ Ph (W): +61-2-6125-2431 Ph (H): +61-2-6125-7407 Ph (M): +61-40-451-8301