Skip to content
Prev 360777 / 398503 Next

Problem while predicting in regression trees

Please find the sample dataset attached along with R code pasted below to reproduce the issue.


#Loading the data frame

pfi <- read.csv("pfi_data.csv")

#Splitting the data into training and test sets
split <- sample.split(pfi, SplitRatio = 0.7)
trainPFI <- subset(pfi, split == TRUE)
testPFI <- subset(pfi, split == FALSE)

#Cross validating the decision trees
tr.control <- trainControl(method="repeatedcv", number=20)
cp.grid <- expand.grid(.cp = (0:10)*0.001)
tr_m <- train(project_delay ~ project_lon + project_lat + project_duration + sector + contract_type + capital_value, data = trainPFI, method="rpart", trControl=tr.control, tuneGrid = cp.grid)

#Displaying the train results
tr_m

#Fetching the best tree
best_tree <- tr_m$finalModel

#Plotting the best tree
prp(best_tree)

#Using the best tree to make predictions [This command raises the error]
best_tree_pred <- predict(best_tree, newdata = testPFI)

#Calculating the SSE
best_tree_pred.sse <- sum((best_tree_pred - testPFI$project_delay)^2)

#
tree_pred.sse

...


Many Thanks and


Kind Regards



--
Muhammad Bilal
Research Fellow and Doctoral Researcher,
Bristol Enterprise, Research, and Innovation Centre (BERIC),
University of the West of England (UWE),
Frenchay Campus,
Bristol,
BS16 1QY

muhammad2.bilal at live.uwe.ac.uk<mailto:olugbenga2.akinade at live.uwe.ac.uk>
Message-ID: <DB5PR07MB1109395C945654381052CFC5DB700@DB5PR07MB1109.eurprd07.prod.outlook.com>
In-Reply-To: <CAJ9CoWkawwB+9ESAgzs+un90goKifCWza6RegRe_W5GNAtBpBA@mail.gmail.com>