Dear R-users, I have the following data x <- runif(300,min=1,max=230) y <- x*0.005 + 0.2 y <- y+rnorm(100,mean=0,sd=0.1) y <- y%%1 # <------- modulo operation plot(x,y) and would like to recapture the slope (0.005) and intercept(0.2). I wonder if there are any clever algorithms to do this. I was looking at the function lm.cirucalar. Is this the method to use? If, which of the references is best too look at? Eryk
regression methods for circular(?) data.
7 messages · nwew, (Ted Harding), Wolski
On 26-Sep-05 nwew wrote:
Dear R-users, I have the following data x <- runif(300,min=1,max=230) y <- x*0.005 + 0.2 y <- y+rnorm(100,mean=0,sd=0.1) y <- y%%1 # <------- modulo operation plot(x,y) and would like to recapture the slope (0.005) and intercept(0.2). I wonder if there are any clever algorithms to do this. I was looking at the function lm.cirucalar. Is this the method to use? If, which of the references is best too look at? Eryk
Hi Eryk, If you know the modulus (in your case 1.0) and you get data that look like the result of your "plot(x,y)", then I wouldn't mess about. I would simply do something like y1<-y ix <- ix<-(y < 0.9*(x-50)/200) y1[ix] <- y1[ix]+1.0 lm(y1~x) (the constants 0.9/200, -50 being chosen to give a good separation on the graph). On the other hand, if there are good reasons why this very simple approach is not suitable, then if we knew what they were a more helpful reply would be easier to formulate! Best wishes, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 26-Sep-05 Time: 15:56:48 ------------------------------ XFMail ------------------------------
Hi, I do not know the intercept and slope. And you have to know them in order to do something like: ix<-(y < 0.9*(x-50)/200 I am right? cheers
(Ted Harding) wrote:
On 26-Sep-05 nwew wrote:
Dear R-users, I have the following data x <- runif(300,min=1,max=230) y <- x*0.005 + 0.2 y <- y+rnorm(100,mean=0,sd=0.1) y <- y%%1 # <------- modulo operation plot(x,y) and would like to recapture the slope (0.005) and intercept(0.2). I wonder if there are any clever algorithms to do this. I was looking at the function lm.cirucalar. Is this the method to use? If, which of the references is best too look at? Eryk
Hi Eryk, If you know the modulus (in your case 1.0) and you get data that look like the result of your "plot(x,y)", then I wouldn't mess about. I would simply do something like y1<-y ix <- ix<-(y < 0.9*(x-50)/200) y1[ix] <- y1[ix]+1.0 lm(y1~x) (the constants 0.9/200, -50 being chosen to give a good separation on the graph). On the other hand, if there are good reasons why this very simple approach is not suitable, then if we knew what they were a more helpful reply would be easier to formulate! Best wishes, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 26-Sep-05 Time: 15:56:48 ------------------------------ XFMail ------------------------------
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
On 26-Sep-05 Witold Eryk Wolski wrote:
Hi, I do not know the intercept and slope. And you have to know them in order to do something like: ix<-(y < 0.9*(x-50)/200 I am right? cheers
Although I really knew them from the way you generated the data, I "pretended" I did not know them. Read below: "If you know the modulus (in your case 1.0)" -- I did assume that this was known, i.e. that the data "wrap round" to 0 above 1.0. Also: "the constants 0.9/200, -50 being chosen to give a good separation on the graph" -- I plotted the data, and saw that the "wrapped" data were well separated, and that 0.9*(x-50)/200 was an adequate discriminant function. This was estimated purely by eye, by looking at the graph, to find some line that went between the two groups of data; no attempt was made to calculate anything precisely. Apart from assuming that the modulus was 1.0, and that the well-separated data at the bottom right of the graph were "wrapped round" data, no other information was used by me! So the question remains: If you can assume that the modulus is 1.0, and that the wrapped-round data will be well separated, then all is simple. All you need to do is to "unwrap" the "wrapped" data by adding 1.0, having first identified them by virtue of their obvious separation. Then you can estimate the slope by using 'lm'. But:-- if you, Witold, can not assume these two things for your real data, what can we assume in considering your question? Is the modulus unknown, for instance? Is the scatter so large that the groups are not well separated? Might we have "twice-wrapped" data (i.e. original y > 2)? In short, do your real data look like the data you sent us, and are they wrapped at 1.0? or what? With thanks, and best wishes, Ted.
(Ted Harding) wrote:
On 26-Sep-05 nwew wrote:
Dear R-users, I have the following data x <- runif(300,min=1,max=230) y <- x*0.005 + 0.2 y <- y+rnorm(100,mean=0,sd=0.1) y <- y%%1 # <------- modulo operation plot(x,y) and would like to recapture the slope (0.005) and intercept(0.2). I wonder if there are any clever algorithms to do this. I was looking at the function lm.cirucalar. Is this the method to use? If, which of the references is best too look at? Eryk
Hi Eryk, If you know the modulus (in your case 1.0) and you get data that look like the result of your "plot(x,y)", then I wouldn't mess about. I would simply do something like y1<-y ix <- ix<-(y < 0.9*(x-50)/200) y1[ix] <- y1[ix]+1.0 lm(y1~x) (the constants 0.9/200, -50 being chosen to give a good separation on the graph). On the other hand, if there are good reasons why this very simple approach is not suitable, then if we knew what they were a more helpful reply would be easier to formulate! Best wishes, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 26-Sep-05 Time: 15:56:48 ------------------------------ XFMail ------------------------------
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
-------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 26-Sep-05 Time: 18:08:28 ------------------------------ XFMail ------------------------------
Ted, I agree with you that if you unwrap the data you can use lm. And you can separate the data in the way you describe. However, if you have thousands of such datasets I do not want to do it by "looking at the graph". Yes the scatter may be larger as in the example and range(y) may be larger than 2. And as you said in order to unwrap the data you have to separate them first. It would be easy to do it using for example single linkage clustering if they were no overlaps (but they do sometimes). So I were just wondering if there are no more fancy methods to do this. Thanks, cheers
(Ted Harding) wrote:
On 26-Sep-05 Witold Eryk Wolski wrote:
Hi, I do not know the intercept and slope. And you have to know them in order to do something like: ix<-(y < 0.9*(x-50)/200 I am right? cheers
Although I really knew them from the way you generated the data, I "pretended" I did not know them. Read below: "If you know the modulus (in your case 1.0)" -- I did assume that this was known, i.e. that the data "wrap round" to 0 above 1.0. Also: "the constants 0.9/200, -50 being chosen to give a good separation on the graph" -- I plotted the data, and saw that the "wrapped" data were well separated, and that 0.9*(x-50)/200 was an adequate discriminant function. This was estimated purely by eye, by looking at the graph, to find some line that went between the two groups of data; no attempt was made to calculate anything precisely. Apart from assuming that the modulus was 1.0, and that the well-separated data at the bottom right of the graph were "wrapped round" data, no other information was used by me! So the question remains: If you can assume that the modulus is 1.0, and that the wrapped-round data will be well separated, then all is simple. All you need to do is to "unwrap" the "wrapped" data by adding 1.0, having first identified them by virtue of their obvious separation. Then you can estimate the slope by using 'lm'. But:-- if you, Witold, can not assume these two things for your real data, what can we assume in considering your question? Is the modulus unknown, for instance? Is the scatter so large that the groups are not well separated? Might we have "twice-wrapped" data (i.e. original y > 2)? In short, do your real data look like the data you sent us, and are they wrapped at 1.0? or what? With thanks, and best wishes, Ted.
(Ted Harding) wrote:
On 26-Sep-05 nwew wrote:
Dear R-users, I have the following data x <- runif(300,min=1,max=230) y <- x*0.005 + 0.2 y <- y+rnorm(100,mean=0,sd=0.1) y <- y%%1 # <------- modulo operation plot(x,y) and would like to recapture the slope (0.005) and intercept(0.2). I wonder if there are any clever algorithms to do this. I was looking at the function lm.cirucalar. Is this the method to use? If, which of the references is best too look at? Eryk
Hi Eryk, If you know the modulus (in your case 1.0) and you get data that look like the result of your "plot(x,y)", then I wouldn't mess about. I would simply do something like y1<-y ix <- ix<-(y < 0.9*(x-50)/200) y1[ix] <- y1[ix]+1.0 lm(y1~x) (the constants 0.9/200, -50 being chosen to give a good separation on the graph). On the other hand, if there are good reasons why this very simple approach is not suitable, then if we knew what they were a more helpful reply would be easier to formulate! Best wishes, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 26-Sep-05 Time: 15:56:48 ------------------------------ XFMail ------------------------------
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
-------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 26-Sep-05 Time: 18:08:28 ------------------------------ XFMail ------------------------------
On 26-Sep-05 Witold Eryk Wolski wrote:
Ted, I agree with you that if you unwrap the data you can use lm. And you can separate the data in the way you describe. However, if you have thousands of such datasets I do not want to do it by "looking at the graph". Yes the scatter may be larger as in the example and range(y) may be larger than 2. And as you said in order to unwrap the data you have to separate them first. It would be easy to do it using for example single linkage clustering if they were no overlaps (but they do sometimes). So I were just wondering if there are no more fancy methods to do this.
OK, the problems are now clearer! So we cannot rely on separation
(though there would be ways to detect this automatically if it could
be relied on).
This is where real experts on unwrapping circular data should step in,
but my immediate suggestion would be that developing something out
of the following should be useful.
First, generate the data so that we have something to work with:
x <- runif(300,min=1,max=230)
y <- x*0.005 + 0.2
y <- y+rnorm(100,mean=0,sd=0.1)
y0 <- y%%1 # <------- modulo operation
(I've called the wrapped data "y0").
Now, assume
A. That we know the modulus is 1.0
B. That we are looking for a model y0 = (a*x + b)%%1.0
C. The we do have some idea about a range of values for a and b,
say 0 < a < 0.01 and 0 < b < 1.0
Now try the following and inspect what you get:
M<-numeric(101)
for(i in (0:100)){v<-(i*0.01/100);
M[i+1]<-max(Mod(fft((y0-v*x-0.0)%%1)*2*pi))
}
plot(0.01*(0:100)/100,M,ylim=c(0,1000))
for(j in 0.5*(0:10)/10){
for(i in (0:100)){
v<-(i*0.01/100);
M[i+1]<-max(Mod(fft((y0-v*x-j)%%1)*2*pi))
}
points(0.01*(0:100)/100,M)
}
This gives an indication that a good value for 'a' ('i' in the
plots) is about 0.5 (or slightly larger) for some value of 'b'
('j' in the plots), from which, conditioning on this, a value
for b could be obtained similarly. The plots from the above do
not distinguish between the curves for different values of 'b';
a method of indicating this would be useful.
Just a suggestion. There may be, in some R package, a function
which implements this approach in a better way.
Over to the gurus at this point!
Best wishes,
Ted.
--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 26-Sep-05 Time: 20:54:50
------------------------------ XFMail ------------------------------
I retract the siggestion I proposed last night -- it was based on a bad hunch! Sorry for wasting time. Best wishes, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 27-Sep-05 Time: 09:59:29 ------------------------------ XFMail ------------------------------