CART for 0/1 data
Martin,
Sorry, I don't think I read your message carefully enough.
When you say the error message is "+", that woudl seem to indicate
that you still had an unclosed parenthesis and that the function was
looking for more input.
Using a smaller data set (160 samples, 169 rows, only 5 classes) it
did work fine for me. pa = presence/absence dataframe, opt.5$clustering
= cluster IDs.
*********************************************************************
> test <- tree(factor(opt.5$clustering)~pa)
> test
node), split, n, deviance, yval, (yprob)
* denotes terminal node
1) root 160 371.000 3 ( 0.23750 0.08750 0.57500 0.07500 0.02500 )
2) pa.symore < 0.5 79 216.500 1 ( 0.48101 0.17722 0.15190 0.13924
0.05063 )
4) pa.artarb < 0.5 42 123.600 2 ( 0.07143 0.33333 0.26190 0.23810
0.09524 )
8) pa.macgri < 0.5 31 75.280 2 ( 0.09677 0.45161 0.00000
0.32258 0.12903 )
. . .
. . .
. . .
3) pa.symore > 0.5 81 10.780 3 ( 0.00000 0.00000 0.98765 0.01235
0.00000 )
6) pa.carrss < 0.5 11 6.702 3 ( 0.00000 0.00000 0.90909 0.09091
0.00000 ) *
7) pa.carrss > 0.5 70 0.000 3 ( 0.00000 0.00000 1.00000 0.00000
0.00000 ) *
************************************************************************
I'll try agin with a larger dataset and see if it's a memory limitation.
Dave Roberts
Martin Wegmann wrote:
On Friday 23 September 2005 17:08, Dave Roberts wrote:
Martin,
If the data are actually coded 0/1, the tree function would
probably intepret them as integers and try a regression instead of a
classification. If the dependent variable is called "var", try
thanks, but I think I provided too less informations. My dependent variable are the locations which are names (I could transform them to numbers from 1 - n). The independent variables consist of 0/1 data (species). If I do tree(locations~factor(species1)+factor(species2)+.....+factor(speciesn), sp_data) I receive the same results as without the factor() part. BTW just a subset of the locations are displayed what is pretty weird considering that I included all locations in the analysis. Martin
x <- tree(factor(var)~species) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ David W. Roberts office 406-994-4548 Professor and Head FAX 406-994-3190 Department of Ecology email droberts at montana.edu Montana State University Bozeman, MT 59717-3460 Martin Wegmann wrote:
Dear R-user, I tried to generate classification / regression tree with a absence/presence matrix of species (400) in different locations (50) to visualise species which are important for splitting up two locations. Rpart and tree did not work for more than 10 species which is logical due to the limited amount of locations (n=50). However the error prompt is a "+" and no specific message, but I am pretty sure that I did not enter a false sign by mistake. Is it allowed at all to use 0/1 data for this statistical technique and if yes is there a way or different method to use all 400 species entries? Otherwise I would apply a PCA beforehand but I would prefer to have the raw species informations. using R 2.1.1-1 (debian repos.) regards, Martin
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html