Hi,
I have the following dataframe
structure(list(Type = c("QRS", "QRS", "QRS", "QRS", "QRS", "QRS",
"QRS", "QRS", "QRS", "QRS", "QRS", "QRS", "RR", "RR", "RR", "PP",
"PP", "PP", "PP", "PP", "PP", "PP", "PP", "PP", "QTc", "QTc",
"QTc", "QTc", "QTc", "QTc", "QTc", "QTc", "QTc", "QTc", "QTc",
"QTc", "QTc", "QTc", "QTc"), Time_Point_Start = c("2015-04-01 14:57:15.0.0312",
"2015-04-01 14:57:15.0.7839", "2015-04-01 14:57:16.0.5343",
"2015-04-01 14:57:17.0.2573",
"2015-04-01 14:57:18.0.0234", "2015-04-01 14:57:18.0.7722",
"2015-04-01 14:57:19.0.5265",
"2015-04-01 14:57:24.0.0195", "2015-04-01 14:57:24.0.7839",
"2015-04-01 14:57:25.0.5343",
"2015-04-01 14:57:26.0.2768", "2015-04-01 14:57:27.0.0273",
"2015-04-01 14:58:03.0.0702",
"2015-04-01 14:58:03.0.8190", "2015-04-01 14:58:04.0.5694",
"2015-04-01 14:57:58.0.4134",
"2015-04-01 14:57:59.0.1637", "2015-04-01 14:57:59.0.9126",
"2015-04-01 14:58:00.0.6630",
"2015-04-01 14:58:01.0.4134", "2015-04-01 14:58:02.0.1637",
"2015-04-01 14:58:02.0.9126",
"2015-04-01 14:58:03.0.6630", "2015-04-01 14:58:04.0.4134",
"2015-04-01 14:57:07.0.4212",
"2015-04-01 14:57:08.0.1715", "2015-04-01 14:57:08.0.9204",
"2015-04-01 14:57:09.0.6864",
"2015-04-01 14:57:10.0.4368", "2015-04-01 14:57:11.0.1871",
"2015-04-01 14:57:11.0.9360",
"2015-04-01 14:57:12.0.6591", "2015-04-01 14:57:13.0.4251",
"2015-04-01 14:57:14.0.1754",
"2015-04-01 14:57:14.0.9243", "2015-04-01 14:57:15.0.6903",
"2015-04-01 14:57:16.0.4407",
"2015-04-01 14:57:17.0.1676", "2015-04-01 14:57:17.0.9321"),
Time_Point_End = c("2015-04-01 14:57:15.0.0858", "2015-04-01
14:57:15.0.8346",
"2015-04-01 14:57:16.0.6006", "2015-04-01 14:57:17.0.0351",
"2015-04-01 14:57:18.0.1403", "2015-04-01 14:57:18.0.8385",
"2015-04-01 14:57:19.0.5889", "2015-04-01 14:57:24.0.0858",
"2015-04-01 14:57:24.0.8346", "2015-04-01 14:57:25.0.5772",
"2015-04-01 14:57:26.0.3939", "2015-04-01 14:57:27.0.0936",
"2015-04-01 14:58:03.0.8190", "2015-04-01 14:58:04.0.5694",
"2015-04-01 14:58:05.0.3197", "2015-04-01 14:57:59.0.1637",
"2015-04-01 14:57:59.0.9126", "2015-04-01 14:58:00.0.6630",
"2015-04-01 14:58:01.0.4134", "2015-04-01 14:58:02.0.1637",
"2015-04-01 14:58:02.0.9126", "2015-04-01 14:58:03.0.6630",
"2015-04-01 14:58:04.0.4134", "2015-04-01 14:58:05.0.1793",
"2015-04-01 14:57:07.0.8775", "2015-04-01 14:57:08.0.6435",
"2015-04-01 14:57:09.0.3705", "2015-04-01 14:57:10.0.1209",
"2015-04-01 14:57:10.0.8697", "2015-04-01 14:57:11.0.6201",
"2015-04-01 14:57:12.0.3861", "2015-04-01 14:57:13.0.1364",
"2015-04-01 14:57:13.0.8853", "2015-04-01 14:57:14.0.6513",
"2015-04-01 14:57:15.0.4017", "2015-04-01 14:57:16.0.1248",
"2015-04-01 14:57:16.0.9165", "2015-04-01 14:57:17.0.6162",
"2015-04-01 14:57:18.0.3900"), Value = c(0.0546, 0.0507,
0.0663, 0.0936, 0.117, 0.0663, 0.0624, 0.0663, 0.0507, 0.0429,
0.117, 0.0663, 0.7488, 0.7488, 0.7488, 0.7488, 0.7488, 0.7488,
0.7488, 0.7488, 0.7488, 0.7488, 0.7488, 0.7644, 0.033103481,
0.034056449, 0.032367699, 0.031000613, 0.031405867, 0.031241866,
0.032367699, 0.034337907, 0.033125921, 0.034337907, 0.034337907,
0.031241866, 0.034337907, 0.032367699, 0.032930616), Score = c(0L,
0L, 0L, 0L, 3L, 0L, 0L, 0L, 0L, 0L, 3L, 0L, 0L, 0L, 0L, 0L,
0L, 2L, 2L, 2L, 2L, 2L, 0L, 0L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), Type_Desc = c(NA, NA, NA,
NA, 1L, NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, 1L,
1L, 1L, 1L, 1L, NA, NA, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L), Pat_id = c(4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L)), .Names = c("Type", "Time_Point_Start", "Time_Point_End",
"Value", "Score", "Type_Desc", "Pat_id"), class = "data.frame",
row.names = c(NA,
-39L))
For each unique value in column 'Type' , I want to check for
consecutive 5 rows (if any) of 'Score' > 0.
Now, if there are five consecutive rows with Score > 0 and 'Type_Desc'
= 0, then we print "Type_low" , else if
'Type_Desc' = 1, we print "Type_high". The search should end once 5
consecutive rows have been found.
So, for this data frame we will have two statements as follows,
1.PP_high
(reason - consecutive 5 rows of score > 0 and
'Type_Desc' = 1 )
2.QTc_low
(reason - consecutive 5 rows of score > 0 and
'Type_Desc' = 0 )
How can this problem tackled in R?
Thanks,
Abhinaba
Counting consecutive events in R
3 messages · Abhinaba Roy, Johannes Hüsing, Sarah Goslee
I normally use rle() for these problems, see ?rle.
for instance,
k <- rbinom(999, 1, .5)
series <- function(run) { r <- rle(run) ser <- which(r$lengths > 5 & r$values) }
series(k)
returns the indices of consecutive runs that have length 5 or longer.
Abhinaba Roy <abhinabaroy09 at gmail.com> [Thu, May 14, 2015 at 02:16:31PM CEST]:
Hi,
I have the following dataframe
structure(list(Type = c("QRS", "QRS", "QRS", "QRS", "QRS", "QRS",
"QRS", "QRS", "QRS", "QRS", "QRS", "QRS", "RR", "RR", "RR", "PP",
"PP", "PP", "PP", "PP", "PP", "PP", "PP", "PP", "QTc", "QTc",
"QTc", "QTc", "QTc", "QTc", "QTc", "QTc", "QTc", "QTc", "QTc",
"QTc", "QTc", "QTc", "QTc"), Time_Point_Start = c("2015-04-01 14:57:15.0.0312",
"2015-04-01 14:57:15.0.7839", "2015-04-01 14:57:16.0.5343",
"2015-04-01 14:57:17.0.2573",
"2015-04-01 14:57:18.0.0234", "2015-04-01 14:57:18.0.7722",
"2015-04-01 14:57:19.0.5265",
"2015-04-01 14:57:24.0.0195", "2015-04-01 14:57:24.0.7839",
"2015-04-01 14:57:25.0.5343",
"2015-04-01 14:57:26.0.2768", "2015-04-01 14:57:27.0.0273",
"2015-04-01 14:58:03.0.0702",
"2015-04-01 14:58:03.0.8190", "2015-04-01 14:58:04.0.5694",
"2015-04-01 14:57:58.0.4134",
"2015-04-01 14:57:59.0.1637", "2015-04-01 14:57:59.0.9126",
"2015-04-01 14:58:00.0.6630",
"2015-04-01 14:58:01.0.4134", "2015-04-01 14:58:02.0.1637",
"2015-04-01 14:58:02.0.9126",
"2015-04-01 14:58:03.0.6630", "2015-04-01 14:58:04.0.4134",
"2015-04-01 14:57:07.0.4212",
"2015-04-01 14:57:08.0.1715", "2015-04-01 14:57:08.0.9204",
"2015-04-01 14:57:09.0.6864",
"2015-04-01 14:57:10.0.4368", "2015-04-01 14:57:11.0.1871",
"2015-04-01 14:57:11.0.9360",
"2015-04-01 14:57:12.0.6591", "2015-04-01 14:57:13.0.4251",
"2015-04-01 14:57:14.0.1754",
"2015-04-01 14:57:14.0.9243", "2015-04-01 14:57:15.0.6903",
"2015-04-01 14:57:16.0.4407",
"2015-04-01 14:57:17.0.1676", "2015-04-01 14:57:17.0.9321"),
Time_Point_End = c("2015-04-01 14:57:15.0.0858", "2015-04-01
14:57:15.0.8346",
"2015-04-01 14:57:16.0.6006", "2015-04-01 14:57:17.0.0351",
"2015-04-01 14:57:18.0.1403", "2015-04-01 14:57:18.0.8385",
"2015-04-01 14:57:19.0.5889", "2015-04-01 14:57:24.0.0858",
"2015-04-01 14:57:24.0.8346", "2015-04-01 14:57:25.0.5772",
"2015-04-01 14:57:26.0.3939", "2015-04-01 14:57:27.0.0936",
"2015-04-01 14:58:03.0.8190", "2015-04-01 14:58:04.0.5694",
"2015-04-01 14:58:05.0.3197", "2015-04-01 14:57:59.0.1637",
"2015-04-01 14:57:59.0.9126", "2015-04-01 14:58:00.0.6630",
"2015-04-01 14:58:01.0.4134", "2015-04-01 14:58:02.0.1637",
"2015-04-01 14:58:02.0.9126", "2015-04-01 14:58:03.0.6630",
"2015-04-01 14:58:04.0.4134", "2015-04-01 14:58:05.0.1793",
"2015-04-01 14:57:07.0.8775", "2015-04-01 14:57:08.0.6435",
"2015-04-01 14:57:09.0.3705", "2015-04-01 14:57:10.0.1209",
"2015-04-01 14:57:10.0.8697", "2015-04-01 14:57:11.0.6201",
"2015-04-01 14:57:12.0.3861", "2015-04-01 14:57:13.0.1364",
"2015-04-01 14:57:13.0.8853", "2015-04-01 14:57:14.0.6513",
"2015-04-01 14:57:15.0.4017", "2015-04-01 14:57:16.0.1248",
"2015-04-01 14:57:16.0.9165", "2015-04-01 14:57:17.0.6162",
"2015-04-01 14:57:18.0.3900"), Value = c(0.0546, 0.0507,
0.0663, 0.0936, 0.117, 0.0663, 0.0624, 0.0663, 0.0507, 0.0429,
0.117, 0.0663, 0.7488, 0.7488, 0.7488, 0.7488, 0.7488, 0.7488,
0.7488, 0.7488, 0.7488, 0.7488, 0.7488, 0.7644, 0.033103481,
0.034056449, 0.032367699, 0.031000613, 0.031405867, 0.031241866,
0.032367699, 0.034337907, 0.033125921, 0.034337907, 0.034337907,
0.031241866, 0.034337907, 0.032367699, 0.032930616), Score = c(0L,
0L, 0L, 0L, 3L, 0L, 0L, 0L, 0L, 0L, 3L, 0L, 0L, 0L, 0L, 0L,
0L, 2L, 2L, 2L, 2L, 2L, 0L, 0L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), Type_Desc = c(NA, NA, NA,
NA, 1L, NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, 1L,
1L, 1L, 1L, 1L, NA, NA, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L), Pat_id = c(4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L)), .Names = c("Type", "Time_Point_Start", "Time_Point_End",
"Value", "Score", "Type_Desc", "Pat_id"), class = "data.frame",
row.names = c(NA,
-39L))
For each unique value in column 'Type' , I want to check for
consecutive 5 rows (if any) of 'Score' > 0.
Now, if there are five consecutive rows with Score > 0 and 'Type_Desc'
= 0, then we print "Type_low" , else if
'Type_Desc' = 1, we print "Type_high". The search should end once 5
consecutive rows have been found.
So, for this data frame we will have two statements as follows,
1.PP_high
(reason - consecutive 5 rows of score > 0 and
'Type_Desc' = 1 )
2.QTc_low
(reason - consecutive 5 rows of score > 0 and
'Type_Desc' = 0 )
How can this problem tackled in R?
Thanks,
Abhinaba
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Johannes H?sing There is something fascinating about science.
One gets such wholesale returns of conjecture
mailto:johannes at huesing.name from such a trifling investment of fact.
http://derwisch.wikidot.com (Mark Twain, "Life on the Mississippi")
Assuming I understand the problem correctly, you want to check for
runs of at least length five where both Score and Test_desc assume
particular values. You don't care where they are or what other data
are associated, you just want to know if at least one such run exists
in your data frame.
Here's a function that does that:
checkruns <- function(testdata) {
test1 <- ifelse(testdata$Score > 0 & testdata$Type_Desc == 1 &
!is.na(testdata$Type_Desc), 1, 0)
test0 <- ifelse(testdata$Score > 0 & testdata$Type_Desc == 0 &
!is.na(testdata$Type_Desc), 1, 0)
test1.rle <- rle(test1)
test0.rle <- rle(test0)
if(any(test1.rle$lengths >= 5 & test1.rle$values == 1))
cat("Type_high\n")
if(any(test0.rle$lengths >= 5 & test0.rle$values == 1))
cat("Type_low\n")
invisible()
}
Sarah
On Thu, May 14, 2015 at 8:16 AM, Abhinaba Roy <abhinabaroy09 at gmail.com> wrote:
Hi,
I have the following dataframe
structure(list(Type = c("QRS", "QRS", "QRS", "QRS", "QRS", "QRS",
"QRS", "QRS", "QRS", "QRS", "QRS", "QRS", "RR", "RR", "RR", "PP",
"PP", "PP", "PP", "PP", "PP", "PP", "PP", "PP", "QTc", "QTc",
"QTc", "QTc", "QTc", "QTc", "QTc", "QTc", "QTc", "QTc", "QTc",
"QTc", "QTc", "QTc", "QTc"), Time_Point_Start = c("2015-04-01 14:57:15.0.0312",
"2015-04-01 14:57:15.0.7839", "2015-04-01 14:57:16.0.5343",
"2015-04-01 14:57:17.0.2573",
"2015-04-01 14:57:18.0.0234", "2015-04-01 14:57:18.0.7722",
"2015-04-01 14:57:19.0.5265",
"2015-04-01 14:57:24.0.0195", "2015-04-01 14:57:24.0.7839",
"2015-04-01 14:57:25.0.5343",
"2015-04-01 14:57:26.0.2768", "2015-04-01 14:57:27.0.0273",
"2015-04-01 14:58:03.0.0702",
"2015-04-01 14:58:03.0.8190", "2015-04-01 14:58:04.0.5694",
"2015-04-01 14:57:58.0.4134",
"2015-04-01 14:57:59.0.1637", "2015-04-01 14:57:59.0.9126",
"2015-04-01 14:58:00.0.6630",
"2015-04-01 14:58:01.0.4134", "2015-04-01 14:58:02.0.1637",
"2015-04-01 14:58:02.0.9126",
"2015-04-01 14:58:03.0.6630", "2015-04-01 14:58:04.0.4134",
"2015-04-01 14:57:07.0.4212",
"2015-04-01 14:57:08.0.1715", "2015-04-01 14:57:08.0.9204",
"2015-04-01 14:57:09.0.6864",
"2015-04-01 14:57:10.0.4368", "2015-04-01 14:57:11.0.1871",
"2015-04-01 14:57:11.0.9360",
"2015-04-01 14:57:12.0.6591", "2015-04-01 14:57:13.0.4251",
"2015-04-01 14:57:14.0.1754",
"2015-04-01 14:57:14.0.9243", "2015-04-01 14:57:15.0.6903",
"2015-04-01 14:57:16.0.4407",
"2015-04-01 14:57:17.0.1676", "2015-04-01 14:57:17.0.9321"),
Time_Point_End = c("2015-04-01 14:57:15.0.0858", "2015-04-01
14:57:15.0.8346",
"2015-04-01 14:57:16.0.6006", "2015-04-01 14:57:17.0.0351",
"2015-04-01 14:57:18.0.1403", "2015-04-01 14:57:18.0.8385",
"2015-04-01 14:57:19.0.5889", "2015-04-01 14:57:24.0.0858",
"2015-04-01 14:57:24.0.8346", "2015-04-01 14:57:25.0.5772",
"2015-04-01 14:57:26.0.3939", "2015-04-01 14:57:27.0.0936",
"2015-04-01 14:58:03.0.8190", "2015-04-01 14:58:04.0.5694",
"2015-04-01 14:58:05.0.3197", "2015-04-01 14:57:59.0.1637",
"2015-04-01 14:57:59.0.9126", "2015-04-01 14:58:00.0.6630",
"2015-04-01 14:58:01.0.4134", "2015-04-01 14:58:02.0.1637",
"2015-04-01 14:58:02.0.9126", "2015-04-01 14:58:03.0.6630",
"2015-04-01 14:58:04.0.4134", "2015-04-01 14:58:05.0.1793",
"2015-04-01 14:57:07.0.8775", "2015-04-01 14:57:08.0.6435",
"2015-04-01 14:57:09.0.3705", "2015-04-01 14:57:10.0.1209",
"2015-04-01 14:57:10.0.8697", "2015-04-01 14:57:11.0.6201",
"2015-04-01 14:57:12.0.3861", "2015-04-01 14:57:13.0.1364",
"2015-04-01 14:57:13.0.8853", "2015-04-01 14:57:14.0.6513",
"2015-04-01 14:57:15.0.4017", "2015-04-01 14:57:16.0.1248",
"2015-04-01 14:57:16.0.9165", "2015-04-01 14:57:17.0.6162",
"2015-04-01 14:57:18.0.3900"), Value = c(0.0546, 0.0507,
0.0663, 0.0936, 0.117, 0.0663, 0.0624, 0.0663, 0.0507, 0.0429,
0.117, 0.0663, 0.7488, 0.7488, 0.7488, 0.7488, 0.7488, 0.7488,
0.7488, 0.7488, 0.7488, 0.7488, 0.7488, 0.7644, 0.033103481,
0.034056449, 0.032367699, 0.031000613, 0.031405867, 0.031241866,
0.032367699, 0.034337907, 0.033125921, 0.034337907, 0.034337907,
0.031241866, 0.034337907, 0.032367699, 0.032930616), Score = c(0L,
0L, 0L, 0L, 3L, 0L, 0L, 0L, 0L, 0L, 3L, 0L, 0L, 0L, 0L, 0L,
0L, 2L, 2L, 2L, 2L, 2L, 0L, 0L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), Type_Desc = c(NA, NA, NA,
NA, 1L, NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, 1L,
1L, 1L, 1L, 1L, NA, NA, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L), Pat_id = c(4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L)), .Names = c("Type", "Time_Point_Start", "Time_Point_End",
"Value", "Score", "Type_Desc", "Pat_id"), class = "data.frame",
row.names = c(NA,
-39L))
For each unique value in column 'Type' , I want to check for
consecutive 5 rows (if any) of 'Score' > 0.
Now, if there are five consecutive rows with Score > 0 and 'Type_Desc'
= 0, then we print "Type_low" , else if
'Type_Desc' = 1, we print "Type_high". The search should end once 5
consecutive rows have been found.
So, for this data frame we will have two statements as follows,
1.PP_high
(reason - consecutive 5 rows of score > 0 and
'Type_Desc' = 1 )
2.QTc_low
(reason - consecutive 5 rows of score > 0 and
'Type_Desc' = 0 )
How can this problem tackled in R?
Thanks,
Abhinaba
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.