Dear All, I am fine tuning a Cubist model (see https://cran.r-project.org/web/packages/Cubist/index.html). I am a bit puzzled by its output. On a dataset which contains 275 cases, I get non mutually exclusive rules. E.g., in the output below, rules 2 and 3 cover all the 275 cases of the data set and rule 1 overlaps partially. Am I misunderstanding something? Many thanks Lorenzo Cubist [Release 2.07 GPL Edition] Thu Jan 12 23:10:40 2017 --------------------------------- Target attribute `outcome' Read 275 cases (21 attributes) from undefined.data Model: Rule 1: [204 cases, mean 0.5393324, range 0 to 2.285714, est err 0.2598495] if home_copub_after_all <= 0.7142857 host_copub_after_all <= 1.833333 then outcome = 0.1666667 + 0.9 home_copub_after_all + 0.11 home_copub_before_all Rule 2: [259 cases, mean 0.7445303, range 0 to 3.166667, est err 0.1866440] if host_copub_after_all <= 1.833333 then outcome = 0.0433333 + 0.75 home_copub_after_all + 0.33 host_copub_after_all + 0.37 top_10_after_all Rule 3: [16 cases, mean 4.4285712, range 2.142857 to 8.857142, est err 1.0346190] if host_copub_after_all > 1.833333 then outcome = 1.595 + 1.03 top_10_after_all + 0.45 home_copub_after_all Evaluation on training data (275 cases): Average |error| 0.2678023 Relative |error| 0.38 Correlation coefficient 0.94 Attribute usage: Conds Model 100% 54% host_copub_after_all 43% 100% home_copub_after_all 57% top_10_after_all 43% home_copub_before_all Time: 0.0 secs
Question about Cubist Model
2 messages · Lorenzo Isella, Mxkuhn
On Jan 12, 2017, at 5:37 PM, Lorenzo Isella <lorenzo.isella at gmail.com> wrote: Dear All, I am fine tuning a Cubist model (see https://cran.r-project.org/web/packages/Cubist/index.html). I am a bit puzzled by its output. On a dataset which contains 275 cases, I get non mutually exclusive rules. E.g., in the output below, rules 2 and 3 cover all the 275 cases of the data set and rule 1 overlaps partially. Am I misunderstanding something?
It is doing the right thing. The rules are first derived from a regression tree and, in the process of pruning the rules, they can produce overlapping sets. When the rules overlap, the regression output is average across the active rules. Thanks, Max
Many thanks
Lorenzo
Cubist [Release 2.07 GPL Edition] Thu Jan 12 23:10:40 2017
---------------------------------
Target attribute `outcome'
Read 275 cases (21 attributes) from undefined.data
Model:
Rule 1: [204 cases, mean 0.5393324, range 0 to 2.285714, est err
0.2598495]
if
home_copub_after_all <= 0.7142857
host_copub_after_all <= 1.833333
then
outcome = 0.1666667 + 0.9 home_copub_after_all
+ 0.11 home_copub_before_all
Rule 2: [259 cases, mean 0.7445303, range 0 to 3.166667, est err
0.1866440]
if
host_copub_after_all <= 1.833333
then
outcome = 0.0433333 + 0.75 home_copub_after_all
+ 0.33 host_copub_after_all + 0.37
top_10_after_all
Rule 3: [16 cases, mean 4.4285712, range 2.142857 to 8.857142, est
err 1.0346190]
if
host_copub_after_all > 1.833333
then
outcome = 1.595 + 1.03 top_10_after_all + 0.45
home_copub_after_all
Evaluation on training data (275 cases):
Average |error| 0.2678023
Relative |error| 0.38
Correlation coefficient 0.94
Attribute usage:
Conds Model
100% 54% host_copub_after_all
43% 100%
home_copub_after_all
57%
top_10_after_all
43%
home_copub_before_all
Time: 0.0 secs