Skip to content

Subset by family name?

5 messages · Ophelia Wang, Jarrett Byrnes, Peter Solymos +1 more

#
Hi all,

I thought this should be very simple, but I'm not sure where the  
problem is. I have a .txt data file that contains X and Y coordinates  
of trees and their family names:

"X"	"Y"	"Mark"
0	28	"Sapotaceae"
1	30	"Meliaceae"
1	40	"Meliaceae"
1	60	"Mimosaceae"
1	76	"Olacaceae"
1.5	73	"Myristicaceae"
2	34	"Euphorbiaceae"
2	62	"Olacaceae"
2	86	"Mimosaceae"
2.5	36	"Arecaceae"
3	22	"Nyctaginaceae"
3	25	"Moraceae"
3	38	"Rubiaceae"
3	47	"Desconocido "
3	99	"Mimosaceae"
3.5	24	"Anacardiaceae"
3.5	57	"Sapotaceae"
4	1	"Lecythidaceae"

Now I just want to work on one family for various spatial analyses in
ads and spatstats, so I wrote:
Yut <-read.delim(   
"C:/dissertation/data2006/Parcela_1-3/Yutsun_tree.txt", header = TRUE,  
sep = "\t", quote="\"", dec=".", fill = TRUE )

Yut_are <- subset (Yut, Mark="Arecaceae", select=c(X, Y, Mark))

However, the summary of Yut_are still contains trees of other families:

   X                Y                    Mark
  Min.   :  0.00   Min.   : 0.00   Myristicaceae: 65
  1st Qu.: 24.00   1st Qu.:24.00   Lecythidaceae: 60
  Median : 46.00   Median :51.00   Sapotaceae   : 51
  Mean   : 48.07   Mean   :49.72   Moraceae     : 45
  3rd Qu.: 72.50   3rd Qu.:75.50   Arecaceae    : 41
  Max.   :100.00   Max.   :99.00   Mimosaceae   : 34
                                   (Other)      :313

Please tell me how do I subset a dataset like this to extract trees  
from only one or a few families? Thanks a lot!

Ophelia
#
Sorry to bother everyone---I realized I should have used "==" instead  
of "=" in the subset syntax!


Quoting Ophelia Wang <opheliawang at mail.utexas.edu>:
#
This can still be a problem after subsetting with zombie factors hanging
around.  It's particularly annoying when your boxplotting from a subset,
as you'll have a bunch of empty entries in the plot.  I have a function I
call purgef that deals with eliminating levels of a factor that I have
subsetted out.

purgef<-function(x){
  x<-as.character(x)
  x<-as.factor(x)
  return(x)
}

Gets rid of those pesky zombie levels.

In your case

Yut_are$Mark<-purgef(Yut_are$Mark)
#
Hi All,

maybe a more transparent solution for the zombie factor problem
(dropping unused factor levels) for data frames is (note, this applies
for all factors in the data frame x):

x[] <- lapply(x, function(x) x[drop = TRUE])

As I recall, on the help page of factor(), there is a slight warning
against character or numeric coercion of factors.

Cheers,

Peter
On Sat, Nov 29, 2008 at 9:25 AM, <byrnes at msi.ucsb.edu> wrote:
#
Or even easier:

options(stringsAsFactors = FALSE)

You'll never look bet

Hadley
On Sat, Nov 29, 2008 at 10:38 AM, Peter Solymos <solymos at ualberta.ca> wrote: