factor level issue after subsetting
first of all, the subsetting line is overly complicated. dat.sub<-dat[dat$treat!='cont',] will work just fine. R does exactly what you're describing. It knows the levels of the factor. Once you remove 'cont' from the data, that doesn't mean that the level is removed from the factor:
df<-data.frame(let=factor(sample(letters[1:5],100,replace=T)),num=rnorm(100)) str(df)
'data.frame': 100 obs. of 2 variables: $ let: Factor w/ 5 levels "a","b","c","d",..: 1 5 1 4 3 5 2 2 1 3 ... $ num: num 0.224 -0.523 0.974 -0.268 -0.61 ...
df.sub<-df[df$let!='a',] str(df.sub)
'data.frame': 82 obs. of 2 variables: $ let: Factor w/ 5 levels "a","b","c","d",..: 5 4 3 5 2 2 3 3 5 3 ... $ num: num -0.523 -0.268 -0.61 -1.383 -0.193 ...
unique(df.sub$let)
[1] e d c b Levels: a b c d e
df.sub$let<-factor(df.sub$let) unique(df.sub$let)
[1] e d c b Levels: e d c b
str(df.sub$let)
Factor w/ 4 levels "e","d","c","b": 1 2 3 1 4 4 3 3 1 3 ...
by redefining your factor you can eliminate the problem. the other
option, if you don't want factors to begin with is:
options(stringsAsFactors=FALSE) # to set the global option
or
dat<-read.csv("~/MyFiles/data.csv",stringsAsFactors=FALSE) # to set
the option locally for this single read.csv call.
On Tue, Nov 1, 2011 at 2:28 PM, Schreiber, Stefan
<Stefan.Schreiber at ales.ualberta.ca> wrote:
Dear list, I cannot figure out why, after sub-setting my data, that particular item which I don't want to plot is still in the newly created subset (please see example below). R somehow remembers what was in the original data set. A work around is exporting and importing the new subset. Then it's all fine; but I don't like this idea and was wondering what am I missing here? Thanks! Stefan P.S. I am using R 2.13.2 for Mac.
dat<-read.csv("~/MyFiles/data.csv")
class(dat$treat)
[1] "factor"
dat
? treat yield 1 ? cont ?98.7 2 ? cont ?97.2 3 ? cont ?96.1 4 ? cont ?98.1 5 ? ? 10 103.0 6 ? ? 10 101.3 7 ? ? 10 102.1 8 ? ? 10 101.9 9 ? ? 30 121.1 10 ? ?30 123.1 11 ? ?30 119.7 12 ? ?30 118.9 13 ? ?60 109.9 14 ? ?60 110.1 15 ? ?60 113.1 16 ? ?60 112.3
plot(dat$treat,dat$yield) dat.sub<-dat[which(dat$treat!='cont')] class(dat.sub$treat)
[1] "factor"
dat.sub
? treat yield 5 ? ? 10 103.0 6 ? ? 10 101.3 7 ? ? 10 102.1 8 ? ? 10 101.9 9 ? ? 30 121.1 10 ? ?30 123.1 11 ? ?30 119.7 12 ? ?30 118.9 13 ? ?60 109.9 14 ? ?60 110.1 15 ? ?60 113.1 16 ? ?60 112.3
plot(dat.sub$treat,dat.sub$yield)
? ? ? ?[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.