An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130131/6ce3e2e6/attachment.pl>
Locate Patients who have multiple high blood pressure readings
5 messages · Weijia Wang, Bert Gunter, William Dunlap +2 more
Well, since no one has responded.... Please use ?dput to provide data in your posts. There are likely zillions of way to go about this. Following is one way based on ?duplicated that I think works, but I make no claims for either elegance or efficiency. Others may do lots better. But maybe it suffices. ## Untested ## I assume the data is provided in a data frame named dd. ## All PT_ID's with >=1 high readings in SBP or in DBP
hiS <- with(dd,PT_ID[OBS_TYPE == "SBP" & Blood_Pressure >= 140]) hiD <- with(dd,PT_ID[OBS_TYPE == "DBP" & Blood_Pressure > =90])
## id's that appear more than once in either
union(unique(hiS[duplicated(hiS)]), unique(hiD[duplicated(hiD)])
## you can subset your data frame to match just these, e.g. via %in%, if you like. Cheers, Bert
On Thu, Jan 31, 2013 at 7:51 AM, Weijia Wang <wwang.nyu at gmail.com> wrote:
On Thu, Jan 31, 2013 at 10:29 AM, Weijia Wang <wwang.nyu at gmail.com> wrote:
Hi,
I have a new question about subsetting in R.
Say we have this data frame:
PT_ID Blood_Pressure OBS_TYPE
92 1900 90.0 DBP
94 1900 90.0 DBP
174 2900 140.0 SBP
176 2900 130.0 SBP
180 3900 120.0 SBP
268 3900 150.0 SBP
268 3900 90.0 DBP
I need to obtain those with 2+ DBP>=90 or 2+ SBP>=140.
PT_ID=1900, he has 2 DBP>=90, so he will be included.
PT_ID=2900, he has 1 SBP>=140, so he will NOT be included.
PT_ID=3900, he has 1 SBP>=140 and 1 DBP>=90, so he will still NOT be
included.
So, the condition requires TWO OR MORE values higher than the threshold.
It could be either SBP or DBP or both of them.
I have tried ddply, but I don?t know how to add the condition 2+ inside
ddply.
Any help is appreciated!!
Weijia
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
dd <- # from dput()
structure(list(ColA = c(92L, 94L, 174L, 176L, 180L, 268L, 268L
), PT_ID = c(1900L, 1900L, 2900L, 2900L, 3900L, 3900L, 3900L),
Blood_Pressure = c(90, 90, 140, 130, 120, 150, 90), OBS_TYPE = structure(c(1L,
1L, 2L, 2L, 2L, 2L, 1L), .Label = c("DBP", "SBP"), class = "factor")), .Names = c("ColA",
"PT_ID", "Blood_Pressure", "OBS_TYPE"), class = "data.frame", row.names = c(NA,
-7L))
library(plyr) ddply(dd, .(PT_ID), summarize, Include=sum(OBS_TYPE=="DBP" & Blood_Pressure>=90)>=2 || sum(OBS_TYPE=="SBP" & Blood_Pressure>=140)>=2)
PT_ID Include 1 1900 TRUE 2 2900 FALSE 3 3900 FALSE sum(logicalVector) tells how many TRUE's are in logicalVector. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com
-----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Bert Gunter Sent: Thursday, January 31, 2013 9:52 AM To: Weijia Wang Cc: r-help at r-project.org Subject: Re: [R] Locate Patients who have multiple high blood pressure readings Well, since no one has responded.... Please use ?dput to provide data in your posts. There are likely zillions of way to go about this. Following is one way based on ?duplicated that I think works, but I make no claims for either elegance or efficiency. Others may do lots better. But maybe it suffices. ## Untested ## I assume the data is provided in a data frame named dd. ## All PT_ID's with >=1 high readings in SBP or in DBP
hiS <- with(dd,PT_ID[OBS_TYPE == "SBP" & Blood_Pressure >= 140]) hiD <- with(dd,PT_ID[OBS_TYPE == "DBP" & Blood_Pressure > =90])
## id's that appear more than once in either
union(unique(hiS[duplicated(hiS)]), unique(hiD[duplicated(hiD)])
## you can subset your data frame to match just these, e.g. via %in%, if you like. Cheers, Bert On Thu, Jan 31, 2013 at 7:51 AM, Weijia Wang <wwang.nyu at gmail.com> wrote:
On Thu, Jan 31, 2013 at 10:29 AM, Weijia Wang <wwang.nyu at gmail.com> wrote:
Hi,
I have a new question about subsetting in R.
Say we have this data frame:
PT_ID Blood_Pressure OBS_TYPE
92 1900 90.0 DBP
94 1900 90.0 DBP
174 2900 140.0 SBP
176 2900 130.0 SBP
180 3900 120.0 SBP
268 3900 150.0 SBP
268 3900 90.0 DBP
I need to obtain those with 2+ DBP>=90 or 2+ SBP>=140.
PT_ID=1900, he has 2 DBP>=90, so he will be included.
PT_ID=2900, he has 1 SBP>=140, so he will NOT be included.
PT_ID=3900, he has 1 SBP>=140 and 1 DBP>=90, so he will still NOT be
included.
So, the condition requires TWO OR MORE values higher than the threshold.
It could be either SBP or DBP or both of them.
I have tried ddply, but I don?t know how to add the condition 2+ inside
ddply.
Any help is appreciated!!
Weijia
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb- biostatistics/pdb-ncb-home.htm
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On Thu, Jan 31, 2013 at 10:51 AM, Weijia Wang <wwang.nyu at gmail.com> wrote:
On Thu, Jan 31, 2013 at 10:29 AM, Weijia Wang <wwang.nyu at gmail.com> wrote:
Hi,
I have a new question about subsetting in R.
Say we have this data frame:
PT_ID Blood_Pressure OBS_TYPE
92 1900 90.0 DBP
94 1900 90.0 DBP
174 2900 140.0 SBP
176 2900 130.0 SBP
180 3900 120.0 SBP
268 3900 150.0 SBP
268 3900 90.0 DBP
I need to obtain those with 2+ DBP>=90 or 2+ SBP>=140.
PT_ID=1900, he has 2 DBP>=90, so he will be included.
PT_ID=2900, he has 1 SBP>=140, so he will NOT be included.
PT_ID=3900, he has 1 SBP>=140 and 1 DBP>=90, so he will still NOT be
included.
So, the condition requires TWO OR MORE values higher than the threshold.
It could be either SBP or DBP or both of them.
I have tried ddply, but I don?t know how to add the condition 2+ inside
ddply.
This can be specified in a reasonably natural fashion using SQL. Here DF is the input data frame.:
library(sqldf)
sqldf("select
+ PT_ID, + sum(Blood_Pressure >= 90 and OBS_TYPE == 'DBP') DBP, + sum(Blood_Pressure >= 140 and OBS_TYPE == 'SBP') SBP + from DF + group by PT_ID + having DBP >= 2 or SBP >= 2") PT_ID DBP SBP 1 1900 2 0
Hi, May be this helps: #dd res<-data.frame(Include=with(subset(dd,OBS_TYPE == "SBP" & Blood_Pressure >= 140|OBS_TYPE=="DBP" & Blood_Pressure>=90),apply(tapply(Blood_Pressure,list(PT_ID,OBS_TYPE),length)>=2,1,any,na.rm=T))) res ?# ?? Include #1900??? TRUE #2900?? FALSE #3900?? FALSE A.K. ----- Original Message ----- From: Weijia Wang <wwang.nyu at gmail.com> To: r-help at r-project.org Cc: Sent: Thursday, January 31, 2013 10:51 AM Subject: [R] Locate Patients who have multiple high blood pressure readings
On Thu, Jan 31, 2013 at 10:29 AM, Weijia Wang <wwang.nyu at gmail.com> wrote:
Hi, I have a new question about subsetting in R. Say we have this data frame: ? ? PT_ID Blood_Pressure OBS_TYPE 92? 1900? ? ? 90.0? ? ? DBP 94? 1900? ? ? 90.0? ? ? DBP 174? 2900? ? 140.0? ? ? SBP 176? 2900? ? 130.0? ? ? SBP 180? 3900? ? 120.0? ? ? SBP 268? 3900? ? 150.0? ? ? SBP 268? 3900? ? ? 90.0? ? ? DBP I need to obtain those with 2+ DBP>=90 or 2+ SBP>=140. PT_ID=1900, he has 2 DBP>=90, so he will be included. PT_ID=2900, he has 1 SBP>=140, so he will NOT be included. PT_ID=3900, he has 1 SBP>=140 and 1 DBP>=90, so he will still NOT be included. So, the condition requires TWO OR MORE values higher than the threshold. It could be either SBP or DBP or both of them. I have tried ddply, but I don?t know how to add the condition 2+ inside ddply. Any help is appreciated!! Weijia
??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.