Skip to content
Back to formatted view

Raw Message

Message-ID: <CAFaj53=+MKky5gyT2H8iutdm-z-vAC57c3hL7SrogeE1m4PiWg@mail.gmail.com>
Date: 2011-11-01T21:52:53Z
From: Justin Haynes
Subject: factor level issue after subsetting
In-Reply-To: <70F02259E17B6242B15D81E58EB7EB11064C97C0@afhe-ex.afhe.ualberta.ca>

first of all, the subsetting line is overly complicated.

dat.sub<-dat[dat$treat!='cont',]

will work just fine.  R does exactly what you're describing.  It knows
the levels of the factor.  Once you remove 'cont' from the data, that
doesn't mean that the level is removed from the factor:

> df<-data.frame(let=factor(sample(letters[1:5],100,replace=T)),num=rnorm(100))
> str(df)
'data.frame':	100 obs. of  2 variables:
 $ let: Factor w/ 5 levels "a","b","c","d",..: 1 5 1 4 3 5 2 2 1 3 ...
 $ num: num  0.224 -0.523 0.974 -0.268 -0.61 ...

> df.sub<-df[df$let!='a',]
> str(df.sub)
'data.frame':	82 obs. of  2 variables:
 $ let: Factor w/ 5 levels "a","b","c","d",..: 5 4 3 5 2 2 3 3 5 3 ...
 $ num: num  -0.523 -0.268 -0.61 -1.383 -0.193 ...

> unique(df.sub$let)
[1] e d c b
Levels: a b c d e

> df.sub$let<-factor(df.sub$let)
> unique(df.sub$let)
[1] e d c b
Levels: e d c b

> str(df.sub$let)
 Factor w/ 4 levels "e","d","c","b": 1 2 3 1 4 4 3 3 1 3 ...
>

by redefining your factor you can eliminate the problem.  the other
option, if you don't want factors to begin with is:

options(stringsAsFactors=FALSE)  # to set the global option

or

dat<-read.csv("~/MyFiles/data.csv",stringsAsFactors=FALSE)  # to set
the option locally for this single read.csv call.


On Tue, Nov 1, 2011 at 2:28 PM, Schreiber, Stefan
<Stefan.Schreiber at ales.ualberta.ca> wrote:
> Dear list,
>
> I cannot figure out why, after sub-setting my data, that particular item
> which I don't want to plot is still in the newly created subset (please
> see example below). R somehow remembers what was in the original data
> set. A work around is exporting and importing the new subset. Then it's
> all fine; but I don't like this idea and was wondering what am I missing
> here?
>
> Thanks!
> Stefan
>
> P.S. I am using R 2.13.2 for Mac.
>
>> dat<-read.csv("~/MyFiles/data.csv")
>> class(dat$treat)
> [1] "factor"
>> dat
> ? treat yield
> 1 ? cont ?98.7
> 2 ? cont ?97.2
> 3 ? cont ?96.1
> 4 ? cont ?98.1
> 5 ? ? 10 103.0
> 6 ? ? 10 101.3
> 7 ? ? 10 102.1
> 8 ? ? 10 101.9
> 9 ? ? 30 121.1
> 10 ? ?30 123.1
> 11 ? ?30 119.7
> 12 ? ?30 118.9
> 13 ? ?60 109.9
> 14 ? ?60 110.1
> 15 ? ?60 113.1
> 16 ? ?60 112.3
>> plot(dat$treat,dat$yield)
>> dat.sub<-dat[which(dat$treat!='cont')]
>> class(dat.sub$treat)
> [1] "factor"
>> dat.sub
> ? treat yield
> 5 ? ? 10 103.0
> 6 ? ? 10 101.3
> 7 ? ? 10 102.1
> 8 ? ? 10 101.9
> 9 ? ? 30 121.1
> 10 ? ?30 123.1
> 11 ? ?30 119.7
> 12 ? ?30 118.9
> 13 ? ?60 109.9
> 14 ? ?60 110.1
> 15 ? ?60 113.1
> 16 ? ?60 112.3
>> plot(dat.sub$treat,dat.sub$yield)
>
> ? ? ? ?[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>