Skip to content

Kolmogorov-Smirnov test and the plot of max distance between two ecdf curves

6 messages · Massimo Bressan, Rui Barradas

#
Hi all, 

given this example 

#start 

a<-c(0,70,50,100,70,650,1300,6900,1780,4930,1120,700,190,940, 
   
760,100,300,36270,5610,249680,1760,4040,164890,17230,75140,1870,22380,5890,2430) 
length(a)

b<-c(0,0,10,30,50,440,1000,140,70,90,60,60,20,90,180,30,90, 
     3220,490,20790,290,740,5350,940,3910,0,640,850,260) 
length(b)

out<-ks.test(log10(a+1),log10(b+1)) 

# max distance D 
out$statistic 

f.a<-ecdf(log10(a+1)) 
f.b<-ecdf(log10(b+1)) 

plot(f.a, verticals=TRUE, do.points=FALSE, col="red") 
plot(f.b, verticals=TRUE, do.points=FALSE, col="green", add=TRUE) 

#inverse of ecdf a
x.a<-get("x", environment(f.a))
y.a<-get("y", environment(f.a))

# inverse of ecdf b
x.b<-get("x", environment(f.b))
y.b<-get("y", environment(f.b))


#end

I want to plot the max distance between the two ecdf curves as in the above
given chart

Is that possible and how? 


Thanks for your help

PS: this is an amended version of a previous thread (but no reply followed)
that I?ve deleted from Nabble repository because I realised it was not
enough clear (now I hope it?s a little better, sorry for that)


--
View this message in context: http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-test-and-the-plot-of-max-distance-between-two-ecdf-curves-tp4631437.html
Sent from the R help mailing list archive at Nabble.com.
#
Hello,

Try the following.
(i've changed the color of the first ecdf.)


loga <- log10(a+1) # do this
logb <- log10(b+1) # only once

f.a <- ecdf(loga)
f.b <- ecdf(logb)
# (2) max distance D

x <- seq(min(loga, logb), max(loga, logb), length.out=length(loga))
x0 <- x[which( abs(f.a(x) - f.b(x)) == max(abs(f.a(x) - f.b(x))) )]
y0 <- f.a(x0)
y1 <- f.b(x0)

plot(f.a, verticals=TRUE, do.points=FALSE, col="blue")
plot(f.b, verticals=TRUE, do.points=FALSE, col="green", add=TRUE)
## alternatine, use standard R plot of ecdf
#plot(f.a, col="blue")
#lines(f.b, col="green")

points(c(x0, x0), c(y0, y1), pch=16, col="red")
segments(x0, y0, x0, y1, col="red", lty="dotted")
## alternative, down to x axis
#segments(x0, 0, x0, y1, col="red", lty="dotted")


Hope this helps,

Rui Barradas

maxbre wrote
--
View this message in context: http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-test-and-the-plot-of-max-distance-between-two-ecdf-curves-tp4631437p4631438.html
Sent from the R help mailing list archive at Nabble.com.
1 day later
#
thanks rui

that's what I was looking for

I have another related question: 
- why of the difference between the max distance D calculated with ks.test()
and the max distance D  ?manually? calculated as in (2)?

I guess it has something to do with the fact that KS is obtained with a
maximisation that depends on the range of x values not necessarly coincident
in the two different approaches

...any thought about this?

maxbre


--
View this message in context: http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-test-and-the-plot-of-max-distance-between-two-ecdf-curves-tp4631437p4631564.html
Sent from the R help mailing list archive at Nabble.com.
#
Hello,

That's a very difficult question. See

Marsaglia, Tsang, Wang (2003)
http://www.jstatsoft.org/v08/i18/

Simard, L'Ecuyer (2011)
http://www.jstatsoft.org/v39/i11

R's ks functions are a port of Marsaglia et al. to the .C interface.

Rui Barradas

maxbre wrote
--
View this message in context: http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-test-and-the-plot-of-max-distance-between-two-ecdf-curves-tp4631437p4631571.html
Sent from the R help mailing list archive at Nabble.com.