I don't think Ivan's solution meets the OP's needs.
I think you could do it using %in% and the approriate logical operations
e.g.
aDF <- data.frame(id=c(1,2,2,2,3,3,4,4,4,5),
diagnosis=c("ah", "ah", "ihd", "im", "ah", "stroke", "ah", "ihd",
"angina", "ihd"))
aDF[with(aDF,(id %in% id[diagnosis=="ah"]) & (id %in%
id[diagnosis=="ihd"])),]
aDF[with(aDF,(id %in% id[diagnosis=="ah"]) & !(id %in%
id[diagnosis=="ihd"])),]
aDF[with(aDF,!(id %in% id[diagnosis=="ah"]) & (id %in%
id[diagnosis=="ihd"])),]
That starts to feel a bit fiddly for me. You might want to look at package
sqldf.
HTH
Keith J
--------------------------
"Ivan Calandra" <ivan.calandra at uni-hamburg.de> wrote in message
news:4D37FBEA.5070100 at uni-hamburg.de...
Hi!
I think you should read the intro to R, as well as ?"[" and ?subset. It
should help you to understand.
Let's say your data is in a data.frame called df:
# 1. ah and ihd
df_ah_ihd <- df[df$diagnosis=="ah" | df$diagnosis=="ihd", ] ## the "|"
is the boolean OR (you want one OR the other). Note the last comma
#2. ah
df_ah <- df[df$diagnosis=="ah", ]
#3. ihd
df_ihd <- df[df$diagnosis=="ihd", ]
You could do the same using subset() if you feel better with this function.
HTH,
Ivan
Le 1/20/2011 09:53, Den a ?crit :
Dear R people
Could you please help.
Basically, there are two variables in my data set. Each patient ('id')
may have one or more diseases ('diagnosis'). It looks like
id diagnosis
1 ah
2 ah
2 ihd
2 im
3 ah
3 stroke
4 ah
4 ihd
4 angina
5 ihd
..............
Q: How to make three data sets:
1. Patients with ah and ihd
2. Patients with ah but no ihd
3. Patients with ihd but no ah?
If you have any ideas could just guide what should I look for. Is a
subset or aggregate, or loops, or something else??? I am a bit lost. (F1
F1 F1 !!!:)
Thank you