Skip to content

vector angle

5 messages · Evan Zane Macosko, Brian Ripley, Laurent Gautier

#
Hi everyone,


I'm translating into R some programs I worked through in Matlab to
calculate the angle between two vectors (very large--like 6200 rows in
each vector).  In Matlab, I used a series of nested for loops, because I
was calculating the angles between many pairs of vectors.  I know for
loops are not desirable in R code, so I was wondering if anyone could
recommend a faster way to complete this task.  Also, I have NAs in my
vectors--I've had trouble performing various operations on my vectors in R
because of these NAs.

Any advice on this would be greatly appreciated.

Thanks!
Evan Macosko


-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
Evan Zane Macosko wrote:

            
As far as I know, the use of apply (sapply and lapply) would make things run
faster than 'for' loops.


About the NAs, you may want to ignore the vectors which have a NA coordinate, or
may be do something else...
to have a foot in this, you may try the help for the functions 'is.na' and
'na.action'.



I hope it helps,




Laurent



--
Laurent Gautier                 CBS, Building 208, DTU
PhD. Student                    D-2800 Lyngby,Denmark
tel: +45 45 25 24 85            http://www.cbs.dtu.dk/laurent


-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://stat.ethz.ch/pipermail/r-help/attachments/20010717/a5a3487b/attachment.html
#
On Tue, 17 Jul 2001, Laurent Gautier wrote:

            
Not very much faster in R (and apply itself is basically a for loop).
Because for loops were slow in S3, the message seems to have got
transferred to S4 and R.  Often the best approach is to see if a loop is
fast enough, first.  (In S-PLUS 5.0 lapply was actually slower than a for
loop.)

The key to speed is usually vectorization (often at the expense of
memory, but these are not very large vectors).  Here it looks as if
outer() might be the key.

I am not clear what is really required here, but if you want the
angles between all pairs of p vectors of length m, say, I would
try

- normalizing them to unit length
- using dist() to compute the Euclidean distances between the pairs,
  since for unit-length vectors x and y ||x - y||^2 = 2 - 2x.y
  = 2(1 - cos th) where th is the angle between them.

That just makes use of for loops written in the C internals of dist,
and doing the computations oneself in C is often a fairly simple option.
#
Prof Brian Ripley wrote:

            
And I lived all these years in ignorance, thinking I was doing good while I was
making things worse.....


I haven't look at R introduction manuals for a while now, so may be the following
remark is already stated in them. but by the time I started with R, the modern
statistics with S-plus was the main reference and being a poooor student at that
time, I learned things through the internet. It prooves to be a bad thing, since
at that time there was a rumour about these functions being faster than the loops
(like the 'map' function is told to be faster than the for loop in Python for
example).
I just looked at what would get as an answer using a webcrawler, and the rumour
seems to be still alive
(see http://www.math.yorku.ca/Who/Faculty/Monette/S-news/2531.html , at the
bottom of the page, or
http://www.usc.edu/isd/doc/statistics/splus/faq/v5/newinv5.shtml ), but I would
have followed better was has been told on this list I would have known it was not
the case
(see http://www.ens.gu.edu.au/robertk/R/help/00a/1999.html).


Thanks for pointing out the mistake I made,




Laurent



-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
On Tue, 17 Jul 2001, Laurent Gautier wrote:

            
Not a mistake, but a misconception perhaps.

Bill Venables and I deliberately illustrated a number of approaches in
Chapter 7 of S Programming using S+3.4, S+2000, S+5.1 and R 0.90.1 to show
that (p. 152)

  Since our aim is not to compare systems, the timings here using
  different engines were done on different systems, all of which had
  ample RAM.  It is worth noting that the different S implementations
  used here do differ, sometimes radically, in their ordering of
  approaches, and the ordering might be different again on machines with
  less RAM available.

One example (using a for loop) shows a 17x speed up in S+5.1, and a 50%
speed-up in R.

Also, later versions of R show some differences, especially due to the
new memory management system (and lapply has been altered since those
timings too).