Hello,
I have tried reading the documentation and googling for the answer but reviewing the online matches I end up more confused than before.
My problem is apparently simple. I fit a glm model (2^k experiment), and then I would like to predict the response variable (Throughput) for unseen factor levels.
When I try to predict I get the following error:
throughput.pred <- predict(throughput.fit,experiments,type="response")
Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) :
?factor 'No_databases' has new level(s) 200, 400, 600, 800, 1000
Of course these are new factor levels, it is exactly what I am trying to achieve i.e. extrapolate the values of Throughput.
Can anyone please advice? Below I include all details.
Thanks in advance,
Best regards,
Giovanni
# define the extreme (factors and levels)
experiments <- expand.grid(No_databases ? = seq(1000,100,by=-200),
+ ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Partitioning ? = c("sharding", "replication"),
+ ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?No_middlewares = seq(500,100,by=-100),
+ ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Queue_size ? ? = c(100))
experiments$No_databases <- as.factor(experiments$No_databases)
experiments$Partitioning <- as.factor(experiments$Partitioning)
experiments$No_middlewares <- as.factor(experiments$No_middlewares)
experiments$Queue_size <- as.factor(experiments$Queue_size)
str(experiments)
'data.frame': ? 50 obs. of ?4 variables:
?$ No_databases ?: Factor w/ 5 levels "200","400","600",..: 5 4 3 2 1 5 4 3 2 1 ...
?$ Partitioning ?: Factor w/ 2 levels "sharding","replication": 1 1 1 1 1 2 2 2 2 2 ...
?$ No_middlewares: Factor w/ 5 levels "100","200","300",..: 5 5 5 5 5 5 5 5 5 5 ...
?$ Queue_size ? ?: Factor w/ 1 level "100": 1 1 1 1 1 1 1 1 1 1 ...
?- attr(*, "out.attrs")=List of 2
?..$ dim ? ? : Named int ?5 2 5 1
?.. ..- attr(*, "names")= chr ?"No_databases" "Partitioning" "No_middlewares" "Queue_size"
?..$ dimnames:List of 4
?.. ..$ No_databases ?: chr ?"No_databases=1000" "No_databases= 800" "No_databases= 600" "No_databases= 400" ...
?.. ..$ Partitioning ?: chr ?"Partitioning=sharding" "Partitioning=replication"
?.. ..$ No_middlewares: chr ?"No_middlewares=500" "No_middlewares=400" "No_middlewares=300" "No_middlewares=200" ...
?.. ..$ Queue_size ? ?: chr "Queue_size=100"
?No_databases Partitioning No_middlewares Queue_size
1 ? ? ? ? 1000 ? ? sharding ? ? ? ? ? ?500 ? ? ? ?100
2 ? ? ? ? ?800 ? ? sharding ? ? ? ? ? ?500 ? ? ? ?100
3 ? ? ? ? ?600 ? ? sharding ? ? ? ? ? ?500 ? ? ? ?100
4 ? ? ? ? ?400 ? ? sharding ? ? ? ? ? ?500 ? ? ? ?100
5 ? ? ? ? ?200 ? ? sharding ? ? ? ? ? ?500 ? ? ? ?100
6 ? ? ? ? 1000 ?replication ? ? ? ? ? ?500 ? ? ? ?100
# or
throughput.fit <- glm(log(Throughput)~(No_databases*No_middlewares)+Partitioning+Queue_size,
+ ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? data=throughput)
Call:
glm(formula = log(Throughput) ~ (No_databases * No_middlewares) +
? ?Partitioning + Queue_size, data = throughput)
Deviance Residuals:
? ?Min ? ? ? 1Q ? Median ? ? ? 3Q ? ? ?Max
-2.5966 ?-0.6612 ?-0.1944 ? 0.5548 ? 3.2136
Coefficients:
? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Estimate Std. Error t value Pr(>|t|)
(Intercept) ? ? ? ? ? ? ? ? ? ?5.74701 ? ?0.09127 ?62.970 ?< 2e-16 ***
No_databases4 ? ? ? ? ? ? ? ? ?0.43309 ? ?0.10985 ? 3.943 8.66e-05 ***
No_middlewares2 ? ? ? ? ? ? ? -1.99374 ? ?0.11035 -18.067 ?< 2e-16 ***
No_middlewares4 ? ? ? ? ? ? ? -1.23004 ? ?0.10969 -11.214 ?< 2e-16 ***
Partitioningreplication ? ? ? ?0.33291 ? ?0.06181 ? 5.386 9.15e-08 ***
Queue_size100 ? ? ? ? ? ? ? ? ?0.15850 ? ?0.06181 ? 2.564 ? 0.0105 *
No_databases4:No_middlewares2 ?2.71525 ? ?0.15262 ?17.791 ?< 2e-16 ***
No_databases4:No_middlewares4 ?1.94191 ? ?0.15226 ?12.754 ?< 2e-16 ***
---
Signif. codes: ?0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
(Dispersion parameter for gaussian family taken to be 0.8921778)
? ?Null deviance: 2175.58 ?on 936 ?degrees of freedom
Residual deviance: ?828.83 ?on 929 ?degrees of freedom
AIC: 2562.2
Number of Fisher Scoring iterations: 2
throughput.pred <- predict(throughput.fit,experiments,type="response")
Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) :
?factor 'No_databases' has new level(s) 200, 400, 600, 800, 1000