queue waiting times comparison
If those values represent response times in a system, then when I was responsible for characterizing what the system would do from the viewpoint of an SLA (service level agreement) with customers using the system, we usually specified that "90% of the transactions would have a response time of --- or less". This took care of most "long tails". So it depends on how you are planning to use this data. We usually monitored the 90th or 95th percentile to see how a system was operating day to day.
On Thu, Aug 18, 2011 at 8:52 AM, Petr PIKAL <petr.pikal at precheza.cz> wrote:
Hallo Jim Thank you and see within text. jim holtman <jholtman at gmail.com> napsal dne 18.08.2011 14:09:11:
I am not sure why you say that "lapply(ml, mean)" shows (incorrectly) that the second year has a larger average; it is correct for the data:
lapply(ml, my.func)
$y1 ? ? Count ? ? ?Mean ? ? ? ?SD ? ? ? Min ? ?Median ? ? ? 90% ? ? ? 95% ? ? ?Max ? ? ? Sum ?18.00000 ?16.83333 ?12.42980 ? 4.00000 ?12.50000 ?37.20000 ?41.05000 47.00000 303.00000 $y2 ? ? Count ? ? ?Mean ? ? ? ?SD ? ? ? Min ? ?Median ? ? ? 90% ? ? ? 95% ? ? ?Max ? ? ? Sum ?15.00000 ?20.06667 ?25.27694 ? 4.00000 ?11.00000 ?45.80000 ?70.40000 97.00000 301.00000 You have a larger "outlier" in the second year that causes the mean to be higher. ?The median is lower, but I usually look at the 90th percentile if I am looking at response time from a system and again the second year has a higher value. So exactly why do you not "trust" your data?
Well. I trust them, however mean is "correct" central value only when data are normally distributed or at least symmetrical. As the values are heavily ?distorted I feel that I shall not use mean for comparison of such sets. Anyway t.test tells me that there is no difference between y2 and y1.
t.test(ml[[1]], ml[[2]])
? ? ? ?Welch Two Sample t-test data: ?ml[[1]] and ml[[2]] t = -0.452, df = 19.557, p-value = 0.6563 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: ?-18.17781 ?11.71115 sample estimates: mean of x mean of y ?16.83333 ?20.06667 So based on this I probably will never get conclusive result as sd due to "outliers" will be quite high. When I do plot(ecdf(ml[[2]])) plot(ecdf(ml[[1]]), add=T, col=2) it seems to me that both sets are almost the same and they differ substantially only with those "outlier" values. If I decreased small values of y2 (e.g.) ml[[2]][ml[[2]]<20] <- ml[[2]][ml[[2]]<20]/2 I get same mean lapply(ml, mean) $y1 [1] 16.83333 $y2 [1] 16.1 and t.test tells me that there is no difference between those two sets, although I know that most events take half of the time and only few last longer so for me such set is better (we improved performance for most of the time however there are still scarce events which take a long time). plot(ecdf(ml[[2]])) plot(ecdf(ml[[1]]), add=T, col=2) So still the question stays - what procedure to use for comparison of two or more sets with such long tailed distribution? - Trimmed mean?, Median?, ... Thanks. Regards Petr
On Thu, Aug 18, 2011 at 7:49 AM, Petr PIKAL <petr.pikal at precheza.cz>
wrote:
Hallo all I try to find a way how to compare set of waiting times during
different
periods. I tried learn something from queueing theory and used also R search. There is plenty of ways but I need to find the easiest and
quite
simple.
Here is a list with actual waiting times.
ml <- structure(list(y1 = c(10, 9, 9, 10, 8, 20, 16, 47, 4, 7, 15,
18, 36, 5, 24, 15, 40, 10), y2 = c(97, 10, 26, 11, 11, 10, 5,
13, 19, 5, 5, 59, 4, 16, 10)), .Names = c("y1", "y2"))
par(mfrow=c(1,2))
lapply(ml, hist)
shows that in the first year is more longer waiting times
lapply(ml, mean)
shows (incorrectly) that in the second year there is longer average
waiting time.
lapply(ml, mean)
gives me completely reversed values.
Can you please give me some hints what to use for "correct" and
"simple"
comparison of ?waiting times in two or more periods. Thank you Petr
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve?
Jim Holtman Data Munger Guru What is the problem that you are trying to solve?