Skip to content

Caret Internal Data Representation

3 messages · Lorenzo Isella, Bert Gunter, Max Kuhn

#
Dear All,
I have a data set which contains both categorical and numerical
variables which I analyze using Cubist+the caret framework.
Now, from the generated rules, it is clear that cubist does something
to the categorical variables and probably uses some dummy coding for
them.
However, I cannot right now access the data the way it is transformed
by cubist.
If caret (or the package) need to do some dummy coding of the factors,
how can I access the newly encoded data set?
I suppose this applies to plenty of other packages.
Any suggestion is welcome.
Cheers

Lorenzo
#
I am not familiar with caret/Cubist, but assuming they follow the
usual R procedures that encode categorical factors for conditional
fitting, you need to do some homework on your own by reading up on the
use of contrasts in regression.

See ?factor and ?contrasts (and other linked Help as necessary) to see
what are R's usual procedures, but you will undoubtedly need to
consult outside statistical references -- the help files will point
you to some -- to fully understand what's going on. It is not trivial.

Cheers,
Bert
Bert Gunter

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
   -- Clifford Stoll
On Thu, Nov 5, 2015 at 9:38 AM, Lorenzo Isella <lorenzo.isella at gmail.com> wrote:
#
Providing a reproducible example and the results of `sessionInfo` will help
get your question answered.  For example, did you use the formula or
non-formula interface to `train` and so on
On Thu, Nov 5, 2015 at 1:10 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote: