Message-ID: <971536df0904060452n67b7bde1i29735ade62ed6142@mail.gmail.com>
Date: 2009-04-06T11:52:39Z
From: Gabor Grothendieck
Subject: time-series data and time-invariant missing values
In-Reply-To: <33A1BBE99423A247A16CC54ECE08D865C8B49C@c1bzaek>
Check out na.locf in the zoo package. Here we fill in
NAs going forward and just in case there were NAs
right at the beginning we fill them in backward as well.
library(zoo)
x <- as.Date(c(NA, "2000-01-01", NA))
x2 <- na.locf(x, na.rm = FALSE)
x2 <- na.locf(x2, fromLast = TRUE, na.rm = FALSE)
gives:
> x2
[1] "2000-01-01" "2000-01-01" "2000-01-01"
On Mon, Apr 6, 2009 at 7:13 AM, Kunzler, Andreas <a.kunzler at bzaek.de> wrote:
> Dear list,
>
> I have some problems with time-series data and missing values of time-invariant informations like sex or the birth-date.
>
> Assume a data (d) structure like
>
> id ? ? ?birth ? ? ? ? ? sex ? ? year of observation
> 1 ? ? ? NA ? ? ? ? ? ? ?NA ? ? ?2006
> 1 ? ? ? 1976-01-01 ? ? ?male ? ?2007
> 1 ? ? ? NA ? ? ? ? ? ? ?NA ? ? ?2008
>
> I am looking for a way to replace the missing values.
>
> Right know my answer to this problem slows down R
>
>
>
> for (i in 1:length(d[,1])){ # for all observations
>
> ? ? ? ?if (is.na(d$birth)[i])==F){ # Check if birth of observation(i) is missing
> ? ? ? ? ? ?d$birth_2[i] <- as.Date(birth[i],"%d.%m.%Y")
> ? ? ? ?}else{
> ? ? ? ? ? ?d$birth2[i] ?<- d$birth[id[i]==d$id & is.na(d$birth)==F],"%d.%m.%Y")[1] # if birth of observation (i) is missing, take a observation of another year
> ? ? ? ?}
> ? ?}
> }
>
> Result:
>
>
> id ? ? ?birth ? ? ? ? ? sex ? ? year of observation ? ? birth2
> 1 ? ? ? NA ? ? ? ? ? ? ?NA ? ? ?2006 ? ? ? ? ? ? ? ? ? ?1976-01-01
> 1 ? ? ? 01.01.1976 ? ? ?male ? ?2007 ? ? ? ? ? ? ? ? ? ?1976-01-01
> 1 ? ? ? NA ? ? ? ? ? ? ?NA ? ? ?2008 ? ? ? ? ? ? ? ? ? ?1976-01-01
>
> unfortunately the data consists of over 20000 observations a year.
>
> Does anybody know a better way?
>
> Thanks
>
> Mit freundlichen Gr??en
>
> Andreas Kunzler
> ____________________________
> Bundeszahn?rztekammer (BZ?K)
> Chausseestra?e 13
> 10115 Berlin
>
> Tel.: 030 40005-113
> Fax: ?030 40005-119
>
> E-Mail: a.kunzler at bzaek.de
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>