Hi,
Since I've had no replies on my previous post about my
problem I am posting it again in the hope someone
notice it. The problem is that the randomForest
function doesn't take datasets which has instances
only containing a subset of all the classes. So the
dataset with instances that either belong to class "a"
or "b" from the levels "a", "b" and "c" doesn't work
because there is no instance that has class "c". Is
there any way to solve this problem?
library("randomForest")
# load the iris plant data set
dataset <- iris
numberarray <- array(1:nrow(dataset), nrow(dataset),
1)
# include only instances with Species = setosa or
virginica
indices <- t(numberarray[(dataset$Species == "setosa"
|
dataset$Species == "virginica") == TRUE])
finaldataset <- dataset[indices,]
# just to let you see the 3 classes
levels(finaldataset$Species)
# create the random forest
randomForest(formula = Species ~ ., data =
finaldataset, ntree = 5)
# The error message I get
Error in randomForest.default(m, y, ...) :
Can't have empty classes in y.
#The problem is that the finaldataset doesn't contain
#any instances of "versicolor", so I think the only
way #to solve this problem is by changing the levels
the #"Species" have to only "setosa" and "virginica",
# correct me if I'm wrong.
# So I tried to change the levels but I got stuck:
# get the possible unique classes
uniqueItems <- unique(levels(finaldataset$Species))
# the problem!
newlevels <- list(uniqueItems[1] = c(uniqueItems[1],
uniqueItems[2]), uniqueItems[3] = uniqueItems[3])
# Error message
Error: syntax error
# In the help they use constant names to rename the
#levels, so this works (but that's not what I want
#because I don't want to change the code every time I
#use another data set):
newlevels <- list("setosa" = c(uniqueItems[1],
uniqueItems[2]), "virginica" = uniqueItems[3])
levels(finaldataset$Species) <- newlevels
levels(finaldataset$Species)
finaldataset$Species
---------------------------
Thanks in advance,
Martin
problem with certain data sets when using randomForest
3 messages · Martin Lam, Brian Ripley
Look at ?"[.factor": finaldataset$Species <- finaldataset$Species[,drop=TRUE] solves this.
On Fri, 26 Aug 2005, Martin Lam wrote:
Hi,
Since I've had no replies on my previous post about my
problem I am posting it again in the hope someone
notice it. The problem is that the randomForest
function doesn't take datasets which has instances
only containing a subset of all the classes. So the
dataset with instances that either belong to class "a"
or "b" from the levels "a", "b" and "c" doesn't work
because there is no instance that has class "c". Is
there any way to solve this problem?
library("randomForest")
# load the iris plant data set
dataset <- iris
numberarray <- array(1:nrow(dataset), nrow(dataset),
1)
# include only instances with Species = setosa or
virginica
indices <- t(numberarray[(dataset$Species == "setosa"
|
dataset$Species == "virginica") == TRUE])
finaldataset <- dataset[indices,]
# just to let you see the 3 classes
levels(finaldataset$Species)
# create the random forest
randomForest(formula = Species ~ ., data =
finaldataset, ntree = 5)
# The error message I get
Error in randomForest.default(m, y, ...) :
Can't have empty classes in y.
#The problem is that the finaldataset doesn't contain
#any instances of "versicolor", so I think the only
way #to solve this problem is by changing the levels
the #"Species" have to only "setosa" and "virginica",
# correct me if I'm wrong.
# So I tried to change the levels but I got stuck:
# get the possible unique classes
uniqueItems <- unique(levels(finaldataset$Species))
# the problem!
newlevels <- list(uniqueItems[1] = c(uniqueItems[1],
uniqueItems[2]), uniqueItems[3] = uniqueItems[3])
# Error message
Error: syntax error
# In the help they use constant names to rename the
#levels, so this works (but that's not what I want
#because I don't want to change the code every time I
#use another data set):
newlevels <- list("setosa" = c(uniqueItems[1],
uniqueItems[2]), "virginica" = uniqueItems[3])
levels(finaldataset$Species) <- newlevels
levels(finaldataset$Species)
finaldataset$Species
---------------------------
Thanks in advance,
Martin
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Thank you for this and earlier help Mr. Ripley. Martin
--- Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote:
Look at ?"[.factor": finaldataset$Species <- finaldataset$Species[,drop=TRUE] solves this. On Fri, 26 Aug 2005, Martin Lam wrote:
Hi, Since I've had no replies on my previous post
about my
problem I am posting it again in the hope someone notice it. The problem is that the randomForest function doesn't take datasets which has instances only containing a subset of all the classes. So
the
dataset with instances that either belong to class
"a"
or "b" from the levels "a", "b" and "c" doesn't
work
because there is no instance that has class "c".
Is
there any way to solve this problem?
library("randomForest")
# load the iris plant data set
dataset <- iris
numberarray <- array(1:nrow(dataset),
nrow(dataset),
1) # include only instances with Species = setosa or virginica indices <- t(numberarray[(dataset$Species ==
"setosa"
|
dataset$Species == "virginica") == TRUE])
finaldataset <- dataset[indices,]
# just to let you see the 3 classes
levels(finaldataset$Species)
# create the random forest
randomForest(formula = Species ~ ., data =
finaldataset, ntree = 5)
# The error message I get
Error in randomForest.default(m, y, ...) :
Can't have empty classes in y.
#The problem is that the finaldataset doesn't
contain
#any instances of "versicolor", so I think the
only
way #to solve this problem is by changing the
levels
the #"Species" have to only "setosa" and
"virginica",
# correct me if I'm wrong. # So I tried to change the levels but I got stuck: # get the possible unique classes uniqueItems <-
unique(levels(finaldataset$Species))
# the problem! newlevels <- list(uniqueItems[1] =
c(uniqueItems[1],
uniqueItems[2]), uniqueItems[3] = uniqueItems[3]) # Error message Error: syntax error # In the help they use constant names to rename
the
#levels, so this works (but that's not what I want #because I don't want to change the code every
time I
#use another data set):
newlevels <- list("setosa" = c(uniqueItems[1],
uniqueItems[2]), "virginica" = uniqueItems[3])
levels(finaldataset$Species) <- newlevels
levels(finaldataset$Species)
finaldataset$Species
---------------------------
Thanks in advance,
Martin
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595