An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090319/85ba0822/attachment-0002.pl>
randomForest
9 messages · Uwe Ligges, Anirudh Kondaveeti, Liaw, Andy
Anirudh Kondaveeti wrote:
Hi! I am dealing with random forest using R. Is there a way to sample a fixed no.of rows from a dataset for use with different trees in random Forest. To be more clear, my data set contains 1500 rows, and I am growing 500 trees in Random Forest Is it possible to sample only 500 rows of data from the data set and use it for different trees in the forest. I mean each tree of the forest should use a different 500 rows from the data set.
See ?randomForest and the argument sampsize. Uwe Ligges
Thanks in advance! Anirudh Kondaveeti ---------------------------- [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090320/39007e2e/attachment-0002.pl>
Anirudh Kondaveeti wrote:
sampsize uses the same sample for all the trees in the random Forest.
No. Uwe Ligges
But I want to use different sample for each tree of the 500 trees in the random Forest. Thanks! Anirudh Kondaveeti ---------------------------- 2009/3/20 Uwe Ligges <ligges at statistik.tu-dortmund.de>
Anirudh Kondaveeti wrote:
Hi! I am dealing with random forest using R. Is there a way to sample a fixed no.of rows from a dataset for use with different trees in random Forest. To be more clear, my data set contains 1500 rows, and I am growing 500 trees in Random Forest Is it possible to sample only 500 rows of data from the data set and use it for different trees in the forest. I mean each tree of the forest should use a different 500 rows from the data set.
See ?randomForest and the argument sampsize. Uwe Ligges
Thanks in advance!
Anirudh Kondaveeti
----------------------------
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090320/accc371c/attachment-0002.pl>
Uwe had been right all along. I don't understand what you don't understand from the documentation. You can use sampsize=c(300, 300) and replace=FALSE to make sure that all 300 class 1 rows are used, but be warned that that leaves no rows for OOB estimate. Andy From: Anirudh Kondaveeti
To be more clear, My data set contains two classes.. Class 1 and Class 2 Class 1 has original data with 300 rows Class 2 is randomly generated data with 1500 rows. I want to sample a new data set with Class 1 - all the rows Class 2 - only 300 rows out of 1500 rows and then use it in random forest with 500 trees. Also the Class 2 should have different 300 rows for different trees in the forest. Thanks! Anirudh Kondaveeti ---------------------------- On Fri, Mar 20, 2009 at 1:45 PM, Anirudh Kondaveeti < anirudh.kondaveeti at gmail.com> wrote:
sampsize uses the same sample for all the trees in the
random Forest.
But I want to use different sample for each tree of the 500
trees in the
random Forest. Thanks! Anirudh Kondaveeti ---------------------------- 2009/3/20 Uwe Ligges <ligges at statistik.tu-dortmund.de>
Anirudh Kondaveeti wrote:
Hi! I am dealing with random forest using R. Is there a way to sample a fixed no.of rows from a
dataset for use with
different trees in random Forest. To be more clear, my data set contains 1500 rows, and I
am growing 500
trees in Random Forest Is it possible to sample only 500 rows of data from the
data set and use
it for different trees in the forest. I mean each tree of
the forest should
use a different 500 rows from the data set.
See ?randomForest and the argument sampsize. Uwe Ligges
Thanks in advance!
Anirudh Kondaveeti
----------------------------
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Notice: This e-mail message, together with any attachme...{{dropped:12}}
Anirudh Kondaveeti wrote:
To be more clear, My data set contains two classes.. Class 1 and Class 2 Class 1 has original data with 300 rows Class 2 is randomly generated data with 1500 rows. I want to sample a new data set with Class 1 - all the rows Class 2 - only 300 rows out of 1500 rows and then use it in random forest with 500 trees. Also the Class 2 should have different 300 rows for different trees in the forest. Thanks!
Ah, in that case (stratified sampling) combine arguments "strata" and "sampsize", in principle, but you cannot select ALL rows of one class: you somehow ignore one of the main ideas of randomForests to bootstrap observations - and randomForest will certainly bootstrap for you. Uwe Ligges
Anirudh Kondaveeti ---------------------------- On Fri, Mar 20, 2009 at 1:45 PM, Anirudh Kondaveeti < anirudh.kondaveeti at gmail.com> wrote:
sampsize uses the same sample for all the trees in the random Forest. But I want to use different sample for each tree of the 500 trees in the random Forest. Thanks! Anirudh Kondaveeti ---------------------------- 2009/3/20 Uwe Ligges <ligges at statistik.tu-dortmund.de>
Anirudh Kondaveeti wrote:
Hi! I am dealing with random forest using R. Is there a way to sample a fixed no.of rows from a dataset for use with different trees in random Forest. To be more clear, my data set contains 1500 rows, and I am growing 500 trees in Random Forest Is it possible to sample only 500 rows of data from the data set and use it for different trees in the forest. I mean each tree of the forest should use a different 500 rows from the data set.
See ?randomForest and the argument sampsize. Uwe Ligges
Thanks in advance!
Anirudh Kondaveeti
----------------------------
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Uwe Ligges wrote:
Anirudh Kondaveeti wrote:
To be more clear, My data set contains two classes.. Class 1 and Class 2 Class 1 has original data with 300 rows Class 2 is randomly generated data with 1500 rows. I want to sample a new data set with Class 1 - all the rows Class 2 - only 300 rows out of 1500 rows and then use it in random forest with 500 trees. Also the Class 2 should have different 300 rows for different trees in the forest. Thanks!
Ah, in that case (stratified sampling) combine arguments "strata" and "sampsize", in principle, but you cannot select ALL rows of one class: you somehow ignore one of the main ideas of randomForests to bootstrap observations - and randomForest will certainly bootstrap for you.
In fact, you can also use replace = FALSE as well, but then, as I said, one of the main ideas of randomForest is ignored.... Uwe Ligges
Uwe Ligges
Anirudh Kondaveeti ---------------------------- On Fri, Mar 20, 2009 at 1:45 PM, Anirudh Kondaveeti < anirudh.kondaveeti at gmail.com> wrote:
sampsize uses the same sample for all the trees in the random Forest. But I want to use different sample for each tree of the 500 trees in the random Forest. Thanks! Anirudh Kondaveeti ---------------------------- 2009/3/20 Uwe Ligges <ligges at statistik.tu-dortmund.de>
Anirudh Kondaveeti wrote:
Hi! I am dealing with random forest using R. Is there a way to sample a fixed no.of rows from a dataset for use with different trees in random Forest. To be more clear, my data set contains 1500 rows, and I am growing 500 trees in Random Forest Is it possible to sample only 500 rows of data from the data set and use it for different trees in the forest. I mean each tree of the forest should use a different 500 rows from the data set.
See ?randomForest and the argument sampsize. Uwe Ligges
Thanks in advance!
Anirudh Kondaveeti
----------------------------
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090320/1b319617/attachment-0002.pl>