Need a faster function to replace missing data
Many thanks to Jim, Bill, and Carl. Using indexes instead of the for loop gave me my answer in minutes instead of hours! Thanks for all of your great suggestions! Aloha, Tim Tim Clark Department of Zoology University of Hawaii
--- On Fri, 5/22/09, jim holtman <jholtman at gmail.com> wrote:
From: jim holtman <jholtman at gmail.com> Subject: Re: [R] Need a faster function to replace missing data To: "Tim Clark" <mudiver1200 at yahoo.com> Cc: r-help at r-project.org Date: Friday, May 22, 2009, 4:59 PM Here is a modification that should now find the closest: ?
myvscan<-data.frame(c(1,NA,1.5),as.POSIXct(c("12:00:00","12:14:00","12:20:00"),
+ format="%H:%M:%S"))
# convert to numeric
names(myvscan)<-c("Latitude","DateTime")
myvscan$tn <- as.numeric(myvscan$DateTime)? #
numeric for findInterval
mygarmin<-data.frame(c(20,30,40),as.POSIXct(c("12:00:00","12:10:00","12:15:00"),
+ format="%H:%M:%S"))
names(mygarmin)<-c("Latitude","DateTime")
mygarmin$tn <- as.numeric(mygarmin$DateTime) # use 'findInterval'
na.indx <- which(is.na(myvscan$Latitude))? # find
NAs
# create matrix of values to test the range indices <-
findInterval(myvscan$tn[na.indx],mygarmin$tn)
x <- cbind(indices,
+??????????? abs(myvscan$tn[na.indx] - mygarmin$tn[indices]), # lower +??????????? abs(myvscan$tn[na.indx] - mygarmin$tn[indices + 1]))? #higher
# now determine which index is closer
closest <- x[,1] + (x[,2] > x[,3])? # determine
the proper index
# replace with garmin latitude myvscan$Latitude[na.indx] <-
mygarmin$Latitude[closest]
myvscan
? Latitude??????????? DateTime???????? tn 1????? 1.0 2009-05-23 12:00:00 1243080000 2???? 40.0 2009-05-23 12:14:00 1243080840 3????? 1.5 2009-05-23 12:20:00 1243081200
On Fri, May 22, 2009 at 7:39 PM, Tim Clark <mudiver1200 at yahoo.com> wrote: Jim, Thanks! ?I like the way you use indexing instead of the loops. ?However, the find.Interval function does not give the right result. ?I have been playing with it and it seems to give the closest number that is less than the one of interest. ?In this case, the correct replacement should have been 40, not 30, since 12:15 from mygarmin is closer to 12:14 in myvscan than 12:10. ?Is there a way to get the function to find the closest in value instead of the next smaller value? ?I was trying to use which.min to get the closet date but can't seem to get it to work right either. Aloha, Tim Tim Clark Department of Zoology University of Hawaii --- On Fri, 5/22/09, jim holtman <jholtman at gmail.com> wrote:
From: jim holtman <jholtman at gmail.com> Subject: Re: [R] Need a faster function to replace
missing data
To: "Tim Clark" <mudiver1200 at yahoo.com>
Cc: r-help at r-project.org Date: Friday, May 22, 2009, 7:24 AM
I think this does what you want.? It uses 'findInterval' to determine
where a
possible match is: ?
myvscan<-data.frame(c(1,NA,1.5),as.POSIXct(c("12:00:00","12:14:00","12:20:00"),
format="%H:%M:%S"))
# convert to numeric
names(myvscan)<-c("Latitude","DateTime")
myvscan$tn <- as.numeric(myvscan$DateTime)?
#
numeric for findInterval
mygarmin<-data.frame(c(20,30,40),as.POSIXct(c("12:00:00","12:10:00","12:15:00"),
format="%H:%M:%S"))
names(mygarmin)<-c("Latitude","DateTime")
mygarmin$tn <- as.numeric(mygarmin$DateTime) # use 'findInterval' na.indx <- which(is.na(myvscan$Latitude))? # find
NAs
# replace with garmin latitude myvscan$Latitude[na.indx] <-
mygarmin$Latitude[findInterval(myvscan$tn[na.indx], mygarmin$tn)]
myvscan
? Latitude???????????
DateTime????????
tn 1????? 1.0 2009-05-22 12:00:00 1243008000 2???? 30.0 2009-05-22 12:14:00 1243008840 3????? 1.5 2009-05-22 12:20:00 1243009200
On Fri, May 22, 2009 at 12:45 AM, Tim Clark <mudiver1200 at yahoo.com> wrote: Dear List, I need some help in coming up with a function that
will
take two data sets, determine if a value is missing in
one,
find a value in the second that was taken at about the
same
time, and substitute the second value in for where the
first
should have been. ?My problem is from a fish
tracking
study. ?We put acoustic tags in fish and track them
for
several days. ?Location data is supposed to be automatically recorded every time we detect a "ping" from the fish. ?Unfortunately the
GPS had
some problems and sometimes the fishes depth was
recorded
but not its location. ?I fortunately had a back-up
GPS that
was taking location data every five minutes. ?I would
like
to merge the two files, replacing the missing value in
the
vscan (automatic) file with the location from the
garmin
file. ?Since we were getting vscan records every 1-2 seconds and garmin records every 5 minutes, I need to
find
the right place in the vscan file to place the garmin
record
- i.e. the ?closest in time, but not greater than 5 minutes. ?I
have
written a function that does this. However, it works
with my
test data but locks up my computer with my real data.
?I
have several million vscan records and several
thousand
garmin records. ?Is there a better way to do this? My function and test data:
myvscan<-data.frame(c(1,NA,1.5),times(c("12:00:00","12:14:00","12:20:00")))
names(myvscan)<-c("Latitude","DateTime")
mygarmin<-data.frame(c(20,30,40),times(("12:00:00","12:10:00","12:15:00")))
names(mygarmin)<-c("Latitude","DateTime")
minute.diff<-1/24/12 ? #Time diff is in days, so
this
is 5 minutes
for (k in 1:nrow(myvscan))
{
if (is.na(myvscan$Latitude[k]))
{
if ((min(abs(mygarmin$DateTime-myvscan$DateTime[k])))
<
minute.diff )
{
index.min.date<-which.min(abs(mygarmin$DateTime-myvscan$DateTime[k]))
myvscan$Latitude[k]<-mygarmin$Latitude[index.min.date]
}}} I appreciate your help and advice. Aloha, Tim Tim Clark Department of Zoology University of Hawaii
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained,
reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?