Dear All, I have a data set which contains both categorical and numerical variables which I analyze using Cubist+the caret framework. Now, from the generated rules, it is clear that cubist does something to the categorical variables and probably uses some dummy coding for them. However, I cannot right now access the data the way it is transformed by cubist. If caret (or the package) need to do some dummy coding of the factors, how can I access the newly encoded data set? I suppose this applies to plenty of other packages. Any suggestion is welcome. Cheers Lorenzo
Caret Internal Data Representation
3 messages · Lorenzo Isella, Bert Gunter, Max Kuhn
I am not familiar with caret/Cubist, but assuming they follow the usual R procedures that encode categorical factors for conditional fitting, you need to do some homework on your own by reading up on the use of contrasts in regression. See ?factor and ?contrasts (and other linked Help as necessary) to see what are R's usual procedures, but you will undoubtedly need to consult outside statistical references -- the help files will point you to some -- to fully understand what's going on. It is not trivial. Cheers, Bert Bert Gunter "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." -- Clifford Stoll
On Thu, Nov 5, 2015 at 9:38 AM, Lorenzo Isella <lorenzo.isella at gmail.com> wrote:
Dear All, I have a data set which contains both categorical and numerical variables which I analyze using Cubist+the caret framework. Now, from the generated rules, it is clear that cubist does something to the categorical variables and probably uses some dummy coding for them. However, I cannot right now access the data the way it is transformed by cubist. If caret (or the package) need to do some dummy coding of the factors, how can I access the newly encoded data set? I suppose this applies to plenty of other packages. Any suggestion is welcome. Cheers Lorenzo
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Providing a reproducible example and the results of `sessionInfo` will help get your question answered. For example, did you use the formula or non-formula interface to `train` and so on
On Thu, Nov 5, 2015 at 1:10 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote:
I am not familiar with caret/Cubist, but assuming they follow the usual R procedures that encode categorical factors for conditional fitting, you need to do some homework on your own by reading up on the use of contrasts in regression. See ?factor and ?contrasts (and other linked Help as necessary) to see what are R's usual procedures, but you will undoubtedly need to consult outside statistical references -- the help files will point you to some -- to fully understand what's going on. It is not trivial. Cheers, Bert Bert Gunter "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." -- Clifford Stoll On Thu, Nov 5, 2015 at 9:38 AM, Lorenzo Isella <lorenzo.isella at gmail.com> wrote:
Dear All, I have a data set which contains both categorical and numerical variables which I analyze using Cubist+the caret framework. Now, from the generated rules, it is clear that cubist does something to the categorical variables and probably uses some dummy coding for them. However, I cannot right now access the data the way it is transformed by cubist. If caret (or the package) need to do some dummy coding of the factors, how can I access the newly encoded data set? I suppose this applies to plenty of other packages. Any suggestion is welcome. Cheers Lorenzo
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.