Skip to content

Counting consecutive events in R

3 messages · Abhinaba Roy, Johannes Hüsing, Sarah Goslee

#
Hi,

I have the following dataframe

structure(list(Type = c("QRS", "QRS", "QRS", "QRS", "QRS", "QRS",
"QRS", "QRS", "QRS", "QRS", "QRS", "QRS", "RR", "RR", "RR", "PP",
"PP", "PP", "PP", "PP", "PP", "PP", "PP", "PP", "QTc", "QTc",
"QTc", "QTc", "QTc", "QTc", "QTc", "QTc", "QTc", "QTc", "QTc",
"QTc", "QTc", "QTc", "QTc"), Time_Point_Start = c("2015-04-01 14:57:15.0.0312",
"2015-04-01 14:57:15.0.7839", "2015-04-01 14:57:16.0.5343",
"2015-04-01 14:57:17.0.2573",
"2015-04-01 14:57:18.0.0234", "2015-04-01 14:57:18.0.7722",
"2015-04-01 14:57:19.0.5265",
"2015-04-01 14:57:24.0.0195", "2015-04-01 14:57:24.0.7839",
"2015-04-01 14:57:25.0.5343",
"2015-04-01 14:57:26.0.2768", "2015-04-01 14:57:27.0.0273",
"2015-04-01 14:58:03.0.0702",
"2015-04-01 14:58:03.0.8190", "2015-04-01 14:58:04.0.5694",
"2015-04-01 14:57:58.0.4134",
"2015-04-01 14:57:59.0.1637", "2015-04-01 14:57:59.0.9126",
"2015-04-01 14:58:00.0.6630",
"2015-04-01 14:58:01.0.4134", "2015-04-01 14:58:02.0.1637",
"2015-04-01 14:58:02.0.9126",
"2015-04-01 14:58:03.0.6630", "2015-04-01 14:58:04.0.4134",
"2015-04-01 14:57:07.0.4212",
"2015-04-01 14:57:08.0.1715", "2015-04-01 14:57:08.0.9204",
"2015-04-01 14:57:09.0.6864",
"2015-04-01 14:57:10.0.4368", "2015-04-01 14:57:11.0.1871",
"2015-04-01 14:57:11.0.9360",
"2015-04-01 14:57:12.0.6591", "2015-04-01 14:57:13.0.4251",
"2015-04-01 14:57:14.0.1754",
"2015-04-01 14:57:14.0.9243", "2015-04-01 14:57:15.0.6903",
"2015-04-01 14:57:16.0.4407",
"2015-04-01 14:57:17.0.1676", "2015-04-01 14:57:17.0.9321"),
    Time_Point_End = c("2015-04-01 14:57:15.0.0858", "2015-04-01
14:57:15.0.8346",
    "2015-04-01 14:57:16.0.6006", "2015-04-01 14:57:17.0.0351",
    "2015-04-01 14:57:18.0.1403", "2015-04-01 14:57:18.0.8385",
    "2015-04-01 14:57:19.0.5889", "2015-04-01 14:57:24.0.0858",
    "2015-04-01 14:57:24.0.8346", "2015-04-01 14:57:25.0.5772",
    "2015-04-01 14:57:26.0.3939", "2015-04-01 14:57:27.0.0936",
    "2015-04-01 14:58:03.0.8190", "2015-04-01 14:58:04.0.5694",
    "2015-04-01 14:58:05.0.3197", "2015-04-01 14:57:59.0.1637",
    "2015-04-01 14:57:59.0.9126", "2015-04-01 14:58:00.0.6630",
    "2015-04-01 14:58:01.0.4134", "2015-04-01 14:58:02.0.1637",
    "2015-04-01 14:58:02.0.9126", "2015-04-01 14:58:03.0.6630",
    "2015-04-01 14:58:04.0.4134", "2015-04-01 14:58:05.0.1793",
    "2015-04-01 14:57:07.0.8775", "2015-04-01 14:57:08.0.6435",
    "2015-04-01 14:57:09.0.3705", "2015-04-01 14:57:10.0.1209",
    "2015-04-01 14:57:10.0.8697", "2015-04-01 14:57:11.0.6201",
    "2015-04-01 14:57:12.0.3861", "2015-04-01 14:57:13.0.1364",
    "2015-04-01 14:57:13.0.8853", "2015-04-01 14:57:14.0.6513",
    "2015-04-01 14:57:15.0.4017", "2015-04-01 14:57:16.0.1248",
    "2015-04-01 14:57:16.0.9165", "2015-04-01 14:57:17.0.6162",
    "2015-04-01 14:57:18.0.3900"), Value = c(0.0546, 0.0507,
    0.0663, 0.0936, 0.117, 0.0663, 0.0624, 0.0663, 0.0507, 0.0429,
    0.117, 0.0663, 0.7488, 0.7488, 0.7488, 0.7488, 0.7488, 0.7488,
    0.7488, 0.7488, 0.7488, 0.7488, 0.7488, 0.7644, 0.033103481,
    0.034056449, 0.032367699, 0.031000613, 0.031405867, 0.031241866,
    0.032367699, 0.034337907, 0.033125921, 0.034337907, 0.034337907,
    0.031241866, 0.034337907, 0.032367699, 0.032930616), Score = c(0L,
    0L, 0L, 0L, 3L, 0L, 0L, 0L, 0L, 0L, 3L, 0L, 0L, 0L, 0L, 0L,
    0L, 2L, 2L, 2L, 2L, 2L, 0L, 0L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
    3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), Type_Desc = c(NA, NA, NA,
    NA, 1L, NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, 1L,
    1L, 1L, 1L, 1L, NA, NA, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
    0L, 0L, 0L, 0L, 0L, 0L), Pat_id = c(4L, 4L, 4L, 4L, 4L, 4L,
    4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
    4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
    4L, 4L, 4L)), .Names = c("Type", "Time_Point_Start", "Time_Point_End",
"Value", "Score", "Type_Desc", "Pat_id"), class = "data.frame",
row.names = c(NA,
-39L))


For each unique value in column 'Type' , I want to check for
consecutive 5 rows (if any) of 'Score' > 0.

Now, if there are five consecutive rows with Score > 0 and 'Type_Desc'
= 0, then we print "Type_low" , else if

'Type_Desc' = 1, we print "Type_high". The search should end once 5
consecutive rows have been found.

So, for this data frame we will have two statements as follows,


1.PP_high

(reason - consecutive 5 rows of score > 0 and

'Type_Desc' = 1 )

2.QTc_low
(reason - consecutive 5 rows of score > 0 and

'Type_Desc' = 0 )

How can this problem tackled in R?

Thanks,

Abhinaba
#
I normally use rle() for these problems, see ?rle.

for instance,

k <- rbinom(999, 1, .5)                                                                                                               
series <- function(run) {                                                                                                                 r <- rle(run)                                                                                                                        ser <- which(r$lengths > 5 & r$values)                                                                                          }                                                                                                                                     
series(k)    


returns the indices of consecutive runs that have length 5 or longer.
      

Abhinaba Roy <abhinabaroy09 at gmail.com> [Thu, May 14, 2015 at 02:16:31PM CEST]:

  
    
#
Assuming I understand the problem correctly, you want to check for
runs of at least length five where both Score and Test_desc assume
particular values. You don't care where they are or what other data
are associated, you just want to know if at least one such run exists
in your data frame.

Here's a function that does that:


checkruns <- function(testdata) {

        test1 <- ifelse(testdata$Score > 0 & testdata$Type_Desc == 1 &
!is.na(testdata$Type_Desc), 1, 0)
        test0 <- ifelse(testdata$Score > 0 & testdata$Type_Desc == 0 &
!is.na(testdata$Type_Desc), 1, 0)

        test1.rle <- rle(test1)
        test0.rle <- rle(test0)

        if(any(test1.rle$lengths >= 5 & test1.rle$values == 1))
cat("Type_high\n")
        if(any(test0.rle$lengths >= 5 & test0.rle$values == 1))
cat("Type_low\n")

        invisible()
}

Sarah
On Thu, May 14, 2015 at 8:16 AM, Abhinaba Roy <abhinabaroy09 at gmail.com> wrote: