Need a faster function to replace missing data

Here are 2 functions, which.just.above and which.just.below,
which may help you.  They will tell which element in a reference
dataset is the first just above (or just below) each element
in the main dataset (x).  They return NA if there is no reference
element above (or below) an element of x.  The strict argument
lets you say if the inequalities are strict or if equality is
acceptable.
They are vectorized so are pretty quick.

E.g.,
   > which.just.below(c(14,14.5), 11:15, strict=TRUE)
   [1] 3 4
   > which.just.above(c(14,14.5), 11:15, strict=FALSE)
   [1] 4 5
They should work with any class of data that order() and sort()
work on.  In particular, POSIXct times work.  The attached file
has a demonstration function called 'test' with some examples.

In your case the 'reference' data would be the times at which your
backup measurements were taken and the 'x' data would be the
times of the pings.  You can look at the elements of 'reference' just
before and just after each ping (or just the pings that are missing
locations) and decide how to combine the data from the bracketing
reference elements to inpute a location for the ping.

Here are the functions, in case the attachment doesn't make it
through.  I'm sure some mailer will throw in some newlines so
it will be corrupted.

"which.just.above" <-
function(x, reference, strict = T)
{
        # output[k] will be index of smallest value in reference vector
        # larger than x[k].  If strict=F, replace 'larger than' by
        # 'larger than or equal to'.
        # We should allow NA's in x (but we don't). NA's in reference
        # should not be allowed.
        if(any(is.na(x)) || any(is.na(reference))) stop("NA's in input")
        if(strict)
                i <- c(rep(T, length(reference)), rep(F,
length(x)))[order(
                        c(reference, x))]
        else i <- c(rep(F, length(x)), rep(T,
length(reference)))[order(c(
                        x, reference))]
        i <- cumsum(i)[!i] + 1.
        i[i > length(reference)] <- NA
        # i is length of x and has values in range 1:length(reference)
or NA
        # following needed if reference is not sorted
        i <- order(reference)[i]
        # following needed if x is not sorted
        i[order(order(x))]
}

"which.just.below" <-
function(x, reference, strict = T)
{
        # output[k] will be index of largest value in reference vector
        # less than x[k].  If strict=F, replace 'less than' by
        # 'less than or equal to'.  Neither x nor reference need be
        # sorted, although they should not have NA's (in theory, NA's
        # in x are ok, but not in reference).
        if(any(is.na(x)) || any(is.na(reference))) stop("NA's in input")
        if(!strict)
                i <- c(rep(T, length(reference)), rep(F,
length(x)))[order(
                        c(reference, x))]
        else i <- c(rep(F, length(x)), rep(T,
length(reference)))[order(c(
                        x, reference))]
        i <- cumsum(i)[!i]
        i[i <= 0] <- NA
        # i is length of x and has values in range 1:length(reference)
or NA
        # following needed if reference is not sorted
        i <- order(reference)[i]
        # following needed if x is not sorted
        i[order(order(x))]
}

Bill Dunlap
TIBCO Software Inc - Spotfire Division
wdunlap tibco.com

Need a faster function to replace missing data

Thread (4 messages)