I am trying to understand ``deviance'' in classification tree output
from tree package.
library(tree)
set.seed(911)
mydf <- data.frame(
name = as.factor(rep(c("A", "B"), c(10, 10))),
x = c(rnorm(10, -1), rnorm(10, 1)),
y = c(rnorm(10, 1), rnorm(10, -1)))
mytree <- tree(name ~ ., data = mydf)
mytree
# node), split, n, deviance, yval, (yprob)
# * denotes terminal node
# 1) root 20 27.730 A ( 0.5 0.5 )
# 2) y < -0.00467067 10 6.502 B ( 0.1 0.9 )
# 4) x < 1.50596 5 5.004 B ( 0.2 0.8 ) *
# 5) x > 1.50596 5 0.000 B ( 0.0 1.0 ) *
# 3) y > -0.00467067 10 6.502 A ( 0.9 0.1 )
# 6) x < -0.578851 5 0.000 A ( 1.0 0.0 ) *
# 7) x > -0.578851 5 5.004 A ( 0.8 0.2 ) *
# Replicate results for node 2
# Probabilities tie out
with(subset(mydf, y < -0.00457), table(name))
# name
# A B
# 1 9
# Cannot replicate deviance = -1 * sum(p_mk * log(p_mk))
0.1 * log(0.1) + 0.9 * log(0.9)
# [1] 0.325083
1. In the documentation, is it possible to find the definition of
deviance?
2. Is it possible to see the code where it calculates deviance?
Thanks,
Naresh
replicate results of tree package
1 message · Naresh Gurbuxani