Locate Patients who have multiple high blood pressure readings
On Thu, Jan 31, 2013 at 10:51 AM, Weijia Wang <wwang.nyu at gmail.com> wrote:
On Thu, Jan 31, 2013 at 10:29 AM, Weijia Wang <wwang.nyu at gmail.com> wrote:
Hi,
I have a new question about subsetting in R.
Say we have this data frame:
PT_ID Blood_Pressure OBS_TYPE
92 1900 90.0 DBP
94 1900 90.0 DBP
174 2900 140.0 SBP
176 2900 130.0 SBP
180 3900 120.0 SBP
268 3900 150.0 SBP
268 3900 90.0 DBP
I need to obtain those with 2+ DBP>=90 or 2+ SBP>=140.
PT_ID=1900, he has 2 DBP>=90, so he will be included.
PT_ID=2900, he has 1 SBP>=140, so he will NOT be included.
PT_ID=3900, he has 1 SBP>=140 and 1 DBP>=90, so he will still NOT be
included.
So, the condition requires TWO OR MORE values higher than the threshold.
It could be either SBP or DBP or both of them.
I have tried ddply, but I don?t know how to add the condition 2+ inside
ddply.
This can be specified in a reasonably natural fashion using SQL. Here DF is the input data frame.:
library(sqldf)
sqldf("select
+ PT_ID, + sum(Blood_Pressure >= 90 and OBS_TYPE == 'DBP') DBP, + sum(Blood_Pressure >= 140 and OBS_TYPE == 'SBP') SBP + from DF + group by PT_ID + having DBP >= 2 or SBP >= 2") PT_ID DBP SBP 1 1900 2 0