Skip to content

regression methods for circular(?) data.

7 messages · nwew, (Ted Harding), Wolski

#
Dear R-users,

I have the following data

x <- runif(300,min=1,max=230)

y <- x*0.005 + 0.2
y <- y+rnorm(100,mean=0,sd=0.1)
y <- y%%1 #  <------- modulo operation
plot(x,y)

and would like to recapture the slope (0.005) and intercept(0.2). I wonder if 
there are any clever algorithms to do this. I was looking at the function 
lm.cirucalar. Is this the method to use? If, which of the references is best 
too look at?

Eryk
#
On 26-Sep-05 nwew wrote:
Hi Eryk,

If you know the modulus (in your case 1.0) and you get data that
look like the result of your "plot(x,y)", then I wouldn't mess
about.

I would simply do something like

y1<-y
ix <- ix<-(y < 0.9*(x-50)/200)
y1[ix] <- y1[ix]+1.0
lm(y1~x)

(the constants 0.9/200, -50 being chosen to give a good separation
on the graph).

On the other hand, if there are good reasons why this very simple
approach is not suitable, then if we knew what they were a more
helpful reply would be easier to formulate!

Best wishes,
Ted.


--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 26-Sep-05                                       Time: 15:56:48
------------------------------ XFMail ------------------------------
#
Hi,

I do not know the intercept and slope.
And you have to know them in order to do something like:
ix<-(y < 0.9*(x-50)/200

I am right?

cheers
(Ted Harding) wrote:
#
On 26-Sep-05 Witold Eryk Wolski wrote:
Although I really knew them from the way you generated the data,
I "pretended" I did not know them.

Read below: "If you know the modulus (in your case 1.0)" -- I did
assume that this was known, i.e. that the data "wrap round" to 0
above 1.0. Also: "the constants 0.9/200, -50 being chosen to give
a good separation on the graph" -- I plotted the data, and saw that
the "wrapped" data were well separated, and that 0.9*(x-50)/200
was an adequate discriminant function. This was estimated purely by
eye, by looking at the graph, to find some line that went between
the two groups of data; no attempt was made to calculate anything
precisely. Apart from assuming that the modulus was 1.0, and that
the well-separated data at the bottom right of the graph were
"wrapped round" data, no other information was used by me!

So the question remains: If you can assume that the modulus is 1.0,
and that the wrapped-round data will be well separated, then all
is simple. All you need to do is to "unwrap" the "wrapped" data
by adding 1.0, having first identified them by virtue of their
obvious separation. Then you can estimate the slope by using 'lm'.

But:-- if you, Witold, can not assume these two things for your
real data, what can we assume in considering your question?
Is the modulus unknown, for instance? Is the scatter so large that
the groups are not well separated? Might we have "twice-wrapped"
data (i.e. original y > 2)? 

In short, do your real data look like the data you sent us, and
are they wrapped at 1.0? or what?

With thanks, and best wishes,
Ted.
--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 26-Sep-05                                       Time: 18:08:28
------------------------------ XFMail ------------------------------
#
Ted,

I agree with you that if you unwrap the data you can use lm.
And you can separate the data in the way you describe. However, if you 
have thousands of such datasets I do not want to do it by "looking at 
the graph".

Yes the scatter may be larger as in the example and range(y) may be 
larger than 2.

And as you said in order to unwrap the data you have to separate them 
first. It would be easy to do it using for example single linkage 
clustering if they were no overlaps (but they do sometimes). So I were 
just wondering if there are no more fancy methods to do this.

Thanks,

cheers
(Ted Harding) wrote:
#
On 26-Sep-05 Witold Eryk Wolski wrote:
OK, the problems are now clearer! So we cannot rely on separation
(though there would be ways to detect this automatically if it could
be relied on).

This is where real experts on unwrapping circular data should step in,
but my immediate suggestion would be that developing something out
of the following should be useful.

First, generate the data so that we have something to work with:

  x <- runif(300,min=1,max=230)
  y <- x*0.005 + 0.2
  y <- y+rnorm(100,mean=0,sd=0.1)
  y0 <- y%%1 #  <------- modulo operation

(I've called the wrapped data "y0").

Now, assume

A. That we know the modulus is 1.0

B. That we are looking for a model y0 = (a*x + b)%%1.0

C. The we do have some idea about a range of values for a and b,
   say 0 < a < 0.01 and 0 < b < 1.0

Now try the following and inspect what you get:

  M<-numeric(101)

  for(i in (0:100)){v<-(i*0.01/100);
    M[i+1]<-max(Mod(fft((y0-v*x-0.0)%%1)*2*pi))
  }
  plot(0.01*(0:100)/100,M,ylim=c(0,1000))

  for(j in 0.5*(0:10)/10){
    for(i in (0:100)){
      v<-(i*0.01/100);
      M[i+1]<-max(Mod(fft((y0-v*x-j)%%1)*2*pi))
    }
    points(0.01*(0:100)/100,M)
  }

This gives an indication that a good value for 'a' ('i' in the
plots) is about 0.5 (or slightly larger) for some value of 'b'
('j' in the plots), from which, conditioning on this, a value
for b could be obtained similarly. The plots from the above do
not distinguish between the curves for different values of 'b';
a method of indicating this would be useful.

Just a suggestion. There may be, in some R package, a function
which implements this approach in a better way.

Over to the gurus at this point!

Best wishes,
Ted.


--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 26-Sep-05                                       Time: 20:54:50
------------------------------ XFMail ------------------------------
#
I retract the siggestion I proposed last night -- it was based
on a bad hunch! Sorry for wasting time.

Best wishes,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 27-Sep-05                                       Time: 09:59:29
------------------------------ XFMail ------------------------------