Skip to content

Automatically fix big jumps in one variable due to anomalies

3 messages · Cesar Terrer Moreno, Duncan Mackay, PIKAL Petr

#
Hi, 
I am attaching a plot where you can see there are a few "jumps" (plots 1, 4,
5 and 6), due to incidents with the measuring sensors (basically someone
touching the sensor). I need to revert those changes to have a plot without
unreal measurements, so make those fragments go back to its original pattern
before the jump. 

I have used the function cpt.mean {changepoints} so I can identify the jumps
and the mean of each segment. Now I don't know how to automatically revert
the jumps, probably subtracting one higher fragment mean by the mean of the
previous one. Does it make sense? 

Example of data set 

                TIMESTAMP          variable   diameter 
38  2012-06-21 13:45:00     r4_3       NA 
86  2012-06-21 14:00:00     r4_3       NA 
134 2012-06-21 14:15:00     r4_3       246 
182 2012-06-21 14:30:00     r4_3       251 
230 2012-06-21 14:45:00     r4_3       250 
278 2012-06-21 15:00:00     r4_3       255 
326 2012-06-21 15:15:00     r4_3       5987 
374 2012-06-21 15:30:00     r4_3       5991 
422 2012-06-21 15:45:00     r4_3       5994 
470 2012-06-21 16:00:00     r4_3       5999 

As an example, this is the current diameter data:
NA-NA-246-251-250-255-5987-5991-5994-599 

I would need this series without the big jump, avoiding the jump and
following the increase/decrease pattern, for example:
NA-NA-246-251-250-255-255-259-262-267 

Any other idea is welcome.
#
Hi Cesar

Not sure what you actually want to accomplish

?rle  may give you some ideas eg (I have added some to return to the 
good section)

x = c(246,251,250,255,5987,5991,5994,599,255,259,262,267)

xdiff = diff(x)
xdiff
  [1]     5    -1     5  5732     4     3 -5395  -344     4     3     5
rle(xdiff)
Run Length Encoding
   lengths: int [1:11] 1 1 1 1 1 1 1 1 1 1 ...
   values : num [1:11] 5 -1 5 5732 4 3 -5395 -344 4 3 ...
which(abs(rle(xdiff)[[2]] ) > 50)
[1] 4 7 8
rle(xdiff)[[2]][abs(rle(xdiff)[[2]] ) > 50]

It is then a matter of removing the required sequences or applying a 
function to them or substituting values ?zoo::na.approx from memory

HTH

Duncan

Duncan Mackay
Department of Agronomy and Soil Science
University of New England
Armidale NSW 2351
Email: home: mackay at northnet.com.au
At 09:13 5/03/2013, you wrote:
2 days later
#
Hi

Not sure if it solves all possible misbehavior with sensor but
changing all jumps start to NA or 0, summing diferences and adding them to start can help you to polish your data
[1]   NA   NA  246  251  250  255 5987 5991 5994 5999
xd<-diff(x)
xd[xd>10]<-NA
xd[is.na(xd)]<-0
[1]  0  0  5  4  9  9 13 16 21
[1] 246 246 251 250 255 255 259 262 267

Regards
Petr