fill 0-row data.frame with 1 line of NAs

Wed, Jul 11, 2012 3:37 PM
In that case, I think that using a subscript of NA is the
best way to go.  It works for both matrices and data.frames
(unlike an integer larger than nrow(data)) and its meaning
is pretty clear.

Also, you will probably get better results if the function
in your call to apply() returns the index (perhaps NA) of a row
of a data.frame instead of the row itself.  Then subscript that data.frame
once with the output of apply rather than subscripting it many
times and rbinding the results back together.  This is natural
if you use match(), as it returns NA for no match (merge() does
this sort of thing).

Here is an example of this sort of thing when using a non-standard
sort of match.  The following matches a long/lat pair to that of the
nearest city in the table, but returns NA if the point is too far from
any city:

nearestTo <- function (x, table, limit = 1) 
{
    stopifnot(all(is.element(c("long", "lat"), names(x))), all(is.element(c("long", 
        "lat"), names(table))))
    dists <- sqrt((x["lat"] - table[, "lat"])^2 + (x["long"] - 
        table[, "long"])^2)
    retval <- which.min(dists)
    if (dists[retval] > limit) {
        retval <- NA_integer_
    }
    retval
}

cities <- data.frame(
     long = c(-117.833, -116.217, -123.083, -123.9, -121.733, 
        -117.033, -122.683, -122.333, -117.433),
     lat = c(44.7833, 43.6, 44.05, 46.9833, 42.1667, 
        46.4, 45.5167, 47.6167, 47.6667),
     row.names = c("Baker", "Boise", "Eugene", "Hoquiam", 
        "Klamath Falls", "Lewiston", "Portland", 
        "Seattle", "Spokane")
)

df <- data.frame(
     long = c(-116.77, -123.68, -122.96, -120.81, -116.26, 
        -123.54, -121.22, -115.12),
     lat = c(47.3, 44.53, 44.35, 45.99, 46.75, 43.78, 
        42.71, 46.66))

whichCity <- apply(df, 1, nearestTo, cities, limit=1)
whichCity
# [1]  9  3  3 NA  6  3  5 NA
cbind(df, nearbyCity = rownames(cities)[whichCity])
#      long   lat    nearbyCity
# 1 -116.77 47.30       Spokane
# 2 -123.68 44.53        Eugene
# 3 -122.96 44.35        Eugene
# 4 -120.81 45.99          <NA>
# 5 -116.26 46.75      Lewiston
# 6 -123.54 43.78        Eugene
# 7 -121.22 42.71 Klamath Falls
# 8 -115.12 46.66          <NA>


Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
fill 0-row data.frame with 1 line of NAs

Thread (13 messages)