subsets

Hello Den,

your problem is not as it may seem so Ivan's suggestion is only a partial answer. I see that each patient can have
more then one diagnosis and I take that you want to isolate patients based on particular conditions. 
Thus, simply looking for "ah" or "idh" as Ivan suggests will yield patients which can have either of those but not 
necessarily patients that have both.

Instead, what one must do is apply the condition to the whole set of diagnosis associated with each patient.
I think that its done best with the aggregate function. This function splits the data according to some
factor (in our case it will be the patient id) and performs a routine on each subset (in our case it will be
a condition test):

ids <- aggregate(diagnosis ~ id, df, function(x) "ah" %in% x &&  "ihd" %in% x)
ids <- aggregate(diagnosis ~ id, df, function(x) "ah" %in% x &&  !"ihd" %in% x)
ids <- aggregate(diagnosis ~ id, df, function(x) ! "ah" %in% x &&  "ihd" %in% x)

Now, ids will contain a data frame like:

id	diagnosis
1	TRUE
2	FALSE
3	FALSE
...

which shows which patients have the set of diagnoses you asked for. You can then apply these
patients to the original data by something like:

subset(df, id %in% subset(ids, diagnosis == TRUE)$id)

this will extract only patients from the 'ids' data frame  for which  the diagnosis applies and then extract the associated
diagnosis sets from the original 'df' data frame. 

Hope it helps,

Taras

subsets

Thread (10 messages)