help speeding up simple Theil regression function

I understand, thank you.   I think I wanted the output to look similar to
that from mblm for quick comparison; that is obviously a problem.  

William Dunlap wrote
My comments have nothing to do with speed of your code,
but with the correctness.

np.lm <-function(dat, X, Y, ...){
	# Ch 9.2: Slope est. (X) for Thiel statistic
...
			num[[i]]  <- dat[j.s[i],Y] - dat[i.s[i],Y]
			dom[[i]]  <- dat[j.s[i],X] - dat[i.s[i],X]
Up to here is looks like X and Y are used to indicate which columns
of dat are the predictor and response, respectively.

..
	X <- median( sort( do.call(c, num) / do.call(c, dom) ) )
Now X is used to mean the slope of the regression.

	# Ch 9.4: Intercept est. for Thiel statistic
	Intercept <- median(dat[,"Y"] - X*dat[,"X"])
Now the predictor column must be named "X" and the response "Y",
no matter what your input X and Y were.  Hence the function will fail
to give the correct answer in most cases.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

-----Original Message-----
From: 
r-help-bounces@
 [mailto:
r-help-bounces@
] On Behalf
Of Berend Hasselman
Sent: Sunday, October 21, 2012 11:59 AM
To: Brad Schneid
Cc: 
r-help@
Subject: Re: [R] help speeding up simple Theil regression function

On 21-10-2012, at 20:06, Brad Schneid wrote:

Hello,

I am working on a simple non-parametric (Theil) regression function and
and
am following Hollander and Wolfe 1999 text.  I would like some help
making
my function faster.  I have compared with pre-packaged version from
"MBLM",
which isnt very fast either, but it appears mine is faster with N =
1000
(see results below).  I plan on running this function repeatedly, and I
generally have data lengths of ~ N = 6000 or more.

# My function following Hollander and Wolfe text, Chapter 9
np.lm <-function(dat, X, Y, ...){
	# Ch 9.2: Slope est. (X) for Thiel statistic
	combos <- combn(nrow(dat), 2)
	i.s <- combos[1,]
	j.s <- combos[2,]
	num <- vector("list", length=length(i.s))
	dom <- vector("list", length=length(i.s))

		for(i in 1:length(i.s)){
			num[[i]]  <- dat[j.s[i],Y] - dat[i.s[i],Y]
			dom[[i]]  <- dat[j.s[i],X] - dat[i.s[i],X]
	        	 	}

	X <- median( sort( do.call(c, num) / do.call(c, dom) ) )
	# Ch 9.4: Intercept est. for Thiel statistic
	Intercept <- median(dat[,"Y"] - X*dat[,"X"])
	out <- data.frame(Intercept, X)
	return(out)
		}   # usage: np.lm(dat, X=1, Y=2)
################################################################

library("mblm") # I will compare to mblm() function

X <- rnorm(1000)
Y <- rnorm(1000)
dat <- data.frame(X, Y)

system.time(np.lm(dat, X=1, Y=2) )
  user  system elapsed
118.610   0.130 119.144
109.000   0.040 109.416 # ran it twice
86.190   0.100  86.589 # 3rd time
Alternative function without your i loop (it isn't needed and can be
vectorized):

np.lm.alt <-function(dat, X, Y, ...){
	# Ch 9.2: Slope est. (X) for Thiel statistic ==> (Pedantic comment: it
is Theil (swap
the i and e)
	combos <- combn(nrow(dat), 2)
	i.s <- combos[1,]
	j.s <- combos[2,]

	Y.num <- dat[j.s,Y] - dat[i.s,Y]
	X.dom <- dat[j.s,X] - dat[i.s,X]
	X <- median( Y.num / X.dom)
	# Ch 9.4: Intercept est. for Thiel statistic ==> (Pedantic comment: it
is Theil (swap
the i and e)
	Intercept <- median(dat[,"Y"] - X*dat[,"X"])
	out <- data.frame(Intercept, X)
	return(out)
}   # usage: np.lm(dat, X=1, Y=2)

Try the compiler package on you original function:

library(compiler)
np.lm.c <- cmpfun(np.lm)

Test speed and correct results:

X <- rnorm(500)
Y <- rnorm(500)
dat <- data.frame(X, Y)

system.time(npout.c <- np.lm.c(dat, X=1, Y=2) )
system.time(npout.1 <- np.lm(dat, X=1, Y=2) )
system.time(npout.a <- np.lm.alt(dat, X=1, Y=2) )
identical(npout.1,npout.c)
identical(npout.1,npout.a)

Results:

system.time(npout.c <- np.lm.c(dat, X=1, Y=2) )
   user  system elapsed
 21.442   0.066  21.517
system.time(npout.1 <- np.lm(dat, X=1, Y=2) )
   user  system elapsed
 21.068   0.073  21.161
system.time(npout.a <- np.lm.alt(dat, X=1, Y=2) )
   user  system elapsed
  0.303   0.010   0.313
identical(npout.1,npout.c)
[1] TRUE
identical(npout.1,npout.a)
[1] TRUE

You may try and test this with larger data lengths.

Berend

______________________________________________

R-help@
 mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________

R-help@
 mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
View this message in context: http://r.789695.n4.nabble.com/help-speeding-up-simple-Theil-regression-function-tp4646923p4646935.html
Sent from the R help mailing list archive at Nabble.com.

help speeding up simple Theil regression function

Thread (6 messages)