Skip to content
Prev 173973 / 398502 Next

Best way to do temporal joins in R?

I am assuming that for each species and station_id that you want
the value in that temp record whose date/time is closest to the
date/time in the species record along with the identifying information
(species, station_id) and date/time of the species.   That interpretation
does give the same answer as in the fused data set you posted.

First we read in temp and use chron to convert the date/times to
chron.  Similarly we do that for species.

Then we define a function which measures the "distance" between
two date/times and we define another function f which takes a
species rowname and merges that row with temp.  Finally
we call lapply that function to species.

library(chron)

temp <- read.csv("temperature_data_Rexample.csv")
temp$dt <- as.chron(paste(temp$date, temp$hour), "%Y%m%d %H%M")

species <- read.csv("species_data_Rexample.csv")
ds <- species$Date_Sampled
species$dt <- chron(sub(" .*", "", ds), gsub("[apm]+$|^.* ", "", ds)) +
	(regexpr("pm", ds) > 0)/2  # add half a day if its pm

mydist <- function(x, y) abs(as.numeric(x-y))

f <- function(r) {
	s <- species[r, ]
	out <- by(temp, temp$station_id, function(x) {
		imin <- which.min(mydist(x$dt, s$dt))
		data.frame(Species = s$Species, Date = s$dt,
			station_id = x[imin, "station_id"], value = x[imin, "value"])
	})
	do.call(rbind, out)
}

do.call(rbind, lapply(rownames(species), f))

Result of last line is:

   Species                Date station_id value
1 SpeciesB (06/23/08 13:55:11)        ANH  2.25
2 SpeciesA (06/23/08 13:43:11)        ANH  2.25
3 SpeciesC (06/23/08 13:55:11)        ANH  2.25
4 SpeciesB (06/23/08 13:55:11)        BDT  3.82
5 SpeciesA (06/23/08 13:43:11)        BDT  3.90
6 SpeciesC (06/23/08 13:55:11)        BDT  3.82



On Mon, Mar 16, 2009 at 7:41 PM, Jonathan Greenberg
<greenberg at ucdavis.edu> wrote: