Splitting row in function of time
Try this: (it would have been easier if you had used 'dput' on your data)
x <- read.table('/temp/example.txt', skip = 1, as.is = TRUE)
# convert to POSIXct
x$beg <- as.POSIXct(paste(x$V4, x$V5))
x$end <- as.POSIXct(paste(x$V6, x$V7))
# determine breaks over midnight
x$over <- format(x$beg, "%d") != format(x$end, "%d")
x$V1 <- x$V4 <- x$V5 <- x$V6 <- x$V7 <- NULL # remove extra columns
# put names on columns
names(x) <- c('phaseno', 'activity', 'phasetime', 'beg', 'end', 'over')
# extract records that extend over midnight
overSet <- subset(x, over)
normalSet <- subset(x, !over)
newSet <- do.call(rbind, lapply(seq_along(overSet$over), function(.row){
# for each row, make two copies so you can change them individually
.data <- overSet[c(.row, .row), ] # two copies of the row
.data$end[1] <- trunc(.data$end[1], units = 'days')
.data$phasetime[1] <- as.numeric(.data$end[1]) - as.numeric(.data$beg[1])
.data$beg[2] <- .data$end[1]
.data$phasetime[2] <- as.numeric(.data$end[2]) - as.numeric(.data$beg[2])
.data
}))
# combine the data and then sort by 'beg'
result <- rbind(normalSet, newSet)
result <- result[order(result$beg), ]
output:
phaseno activity phasetime beg end over
1 1 L 61033 2010-06-01 00:21:00 2010-06-01 17:18:13 FALSE
2 2 D 7907 2010-06-01 17:18:14 2010-06-01 19:30:01 FALSE
3 3 L 395 2010-06-01 19:30:02 2010-06-01 19:36:37 FALSE
4 4 D 15802 2010-06-01 19:36:38 2010-06-02 00:00:00 TRUE
4.1 4 D 2693 2010-06-02 00:00:00 2010-06-02 00:44:53 TRUE
5 5 W 40 2010-06-02 00:44:54 2010-06-02 00:45:34 FALSE
6 6 D 6425 2010-06-02 00:45:35 2010-06-02 02:32:40 FALSE
7 7 L 379 2010-06-02 02:32:41 2010-06-02 02:39:00 FALSE
8 8 D 1414 2010-06-02 02:39:01 2010-06-02 03:02:35 FALSE
9 9 W 73 2010-06-02 03:02:36 2010-06-02 03:03:49 FALSE
On Wed, Nov 16, 2011 at 2:41 PM, PEL <pierre-etienne.lessard.1 at ulaval.ca> wrote:
Hello all, I have a data frame that looks like this: http://r.789695.n4.nabble.com/file/n4077622/Capture.png I would like to know if it's possible to split a single row into two rows when the time frame between "beg" and "end" overlaps midnight. I want to compare the frequency of each activity for each day so a row for a phase that overlaps on two dates unbalances the graphs I create with this data. Ex:
From the original row:
http://r.789695.n4.nabble.com/file/n4077622/Capture2.png Note: "phasetime" is only a difftime between "end" and "beg". ? ? ? ? ?"phaseno" and "activity" should stay the same for the two new lines. Here is a sample of my data that covers a few days: http://r.789695.n4.nabble.com/file/n4077622/example.txt example.txt Thank you to anyone who takes the time to read this and any idea will be welcome PEL -- View this message in context: http://r.789695.n4.nabble.com/Splitting-row-in-function-of-time-tp4077622p4077622.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it.