Skip to content

time-series data and time-invariant missing values

2 messages · Kunzler, Andreas, Gabor Grothendieck

#
Dear list,

I have some problems with time-series data and missing values of time-invariant informations like sex or the birth-date.

Assume a data (d) structure like

id	birth		sex	year of observation
1	NA		NA	2006
1	1976-01-01	male	2007
1	NA		NA	2008

I am looking for a way to replace the missing values.

Right know my answer to this problem slows down R



for (i in 1:length(d[,1])){ # for all observations

        if (is.na(d$birth)[i])==F){ # Check if birth of observation(i) is missing
            d$birth_2[i] <- as.Date(birth[i],"%d.%m.%Y") 
        }else{
            d$birth2[i]  <- d$birth[id[i]==d$id & is.na(d$birth)==F],"%d.%m.%Y")[1] # if birth of observation (i) is missing, take a observation of another year
        }
    }
}

Result:


id	birth		sex	year of observation	birth2
1	NA		NA	2006			1976-01-01
1	01.01.1976	male	2007			1976-01-01
1	NA		NA	2008			1976-01-01

unfortunately the data consists of over 20000 observations a year.

Does anybody know a better way?

Thanks

Mit freundlichen Gr??en

Andreas Kunzler
____________________________
Bundeszahn?rztekammer (BZ?K)
Chausseestra?e 13
10115 Berlin

Tel.: 030 40005-113
Fax:  030 40005-119

E-Mail: a.kunzler at bzaek.de
#
Check out na.locf in the zoo package.  Here we fill in
NAs going forward and just in case there were NAs
right at the beginning we fill them in backward as well.

library(zoo)
x <- as.Date(c(NA, "2000-01-01", NA))
x2 <- na.locf(x, na.rm = FALSE)
x2 <- na.locf(x2, fromLast = TRUE, na.rm = FALSE)

gives:
[1] "2000-01-01" "2000-01-01" "2000-01-01"
On Mon, Apr 6, 2009 at 7:13 AM, Kunzler, Andreas <a.kunzler at bzaek.de> wrote: