I have a data frame with three columns client ID | date | value For each cilent ID I want to determine Min date and Max date and for any dates in between that are missing I want to insert a row Client ID | date| NA Any help would be appreciated.
filling up holes
3 messages · Bill Venables, analyst41 at hotmail.com
Dear 'analyst41' (it would be a courtesy to know who you are) Here is a low-level way to do it. First create some dummy data
allDates <- seq(as.Date("2010-01-01"), by = 1, length.out = 50)
client_ID <- sample(LETTERS[1:5], 50, rep = TRUE)
value <- 1:50
date <- sample(allDates)
clientData <- data.frame(client_ID, date, value)
At this point clientData has 50 rows, with 5 clients, each with a sample of datas. Everything is in random order execept "value". Now write a little function to fill out a subset of the data consisting of one client's data only:
fixClient <- function(cData) {
+ dateRange <- range(cData$date) + dates <- seq(dateRange[1], dateRange[2], by = 1) + fullSet <- data.frame(client_ID = as.character(cData$client_ID[1]), + date = dates, value = NA) + + fullSet$value[match(cData$date, dates)] <- cData$value + fullSet + } Now split up the data, apply the fixClient function to each section and re-combine them again:
allData <- do.call(rbind,
+ lapply(split(clientData, clientData$client_ID), fixClient)) Check:
head(allData)
client_ID date value A.1 A 2010-01-04 36 A.2 A 2010-01-05 18 A.3 A 2010-01-06 NA A.4 A 2010-01-07 NA A.5 A 2010-01-08 NA A.6 A 2010-01-09 49
Seems OK. At this point the data are in sorted order by client and date, but that should not matter. Bill Venables. -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of analyst41 at hotmail.com Sent: Wednesday, 29 December 2010 10:45 AM To: r-help at r-project.org Subject: [R] filling up holes I have a data frame with three columns client ID | date | value For each cilent ID I want to determine Min date and Max date and for any dates in between that are missing I want to insert a row Client ID | date| NA Any help would be appreciated. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On Dec 28, 10:27?pm, <Bill.Venab... at csiro.au> wrote:
Dear 'analyst41' (it would be a courtesy to know who you are) Here is a low-level way to do it. ? First create some dummy data
allDates <- seq(as.Date("2010-01-01"), by = 1, length.out = 50)
client_ID <- sample(LETTERS[1:5], 50, rep = TRUE)
value <- 1:50
date <- sample(allDates)
clientData <- data.frame(client_ID, date, value)
At this point clientData has 50 rows, with 5 clients, each with a sample of datas. ?Everything is in random order execept "value". Now write a little function to fill out a subset of the data consisting of one client's data only:
fixClient <- function(cData) {
+ ? dateRange <- range(cData$date) + ? dates <- seq(dateRange[1], dateRange[2], by = 1) + ? fullSet <- data.frame(client_ID = as.character(cData$client_ID[1]), + ? ? ? ? ? ? ? ? ? ? ? ? date = dates, value = NA) + + ? fullSet$value[match(cData$date, dates)] <- cData$value + ? fullSet ? + } Now split up the data, apply the fixClient function to each section and re-combine them again:
allData <- do.call(rbind,
+ ? ? ? ? ? ? ? ? ? ?lapply(split(clientData, clientData$client_ID), fixClient)) Check:
head(allData)
? ? client_ID ? ? ? date value A.1 ? ? ? ? A 2010-01-04 ? ?36 A.2 ? ? ? ? A 2010-01-05 ? ?18 A.3 ? ? ? ? A 2010-01-06 ? ?NA A.4 ? ? ? ? A 2010-01-07 ? ?NA A.5 ? ? ? ? A 2010-01-08 ? ?NA A.6 ? ? ? ? A 2010-01-09 ? ?49 Seems OK. ?At this point the data are in sorted order by client and date, but that should not matter. Bill Venables.
It is of course a great honor to receive a reply from you (but please allow me to continue to be an anonymous source of bits and bytes over the net). This is a neat solution, but please watch this space to see my dumber version (the code might need to be changed to a procedural languaage eventually). Thank you.
-----Original Message----- From: r-help-boun... at r-project.org [mailto:r-help-boun... at r-project.org] On Behalf Of analys... at hotmail.com Sent: Wednesday, 29 December 2010 10:45 AM To: r-h... at r-project.org Subject: [R] filling up holes I have a data frame with three columns client ID | date | value For each cilent ID I want to determine Min date and Max date and for any dates in between that are missing I want to insert a row Client ID | date| NA Any help would be appreciated.
______________________________________________ R-h... at r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-h... at r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.- Hide quoted text - - Show quoted text -