subsets

Petr Savicky · 2011-01-20T13:29:33Z

On Thu, Jan 20, 2011 at 10:53:01AM +0200, Den wrote: > Dear R people > Could you please help. > > Basically, there are two variables in my data set. Each patient ('id') > may have one or more diseases ('diagnosis'). It looks like > > id diagnosis > 1 ah > 2 ah > 2 ihd > 2 im > 3 ah > 3 stroke > 4 ah > 4 ihd > 4 angina > 5 ihd > .............. > Q: How to make three data sets: > 1. Patients with ah and ihd > 2. Patients with ah but no ihd > 3. Patients with ihd but no ah? This may be und

Petr Savicky

Thu, Jan 20, 2011 5:29 AM

On Thu, Jan 20, 2011 at 10:53:01AM +0200, Den wrote:

This may be understood as a two step procedure:
1. Split the id into disjoint groups according the above criteria.
2. Split the data cases into the groups from step 1.

If this is what you want, then function table() may be used to
collect information on each id.

  df <- structure(list(id = c(1L, 2L, 2L, 2L, 3L, 3L, 4L, 4L, 4L, 5L),
      diagnosis = structure(c(1L, 1L, 3L, 4L, 1L, 5L, 1L, 3L, 2L, 3L),
      .Label = c("ah", "angina", "ihd", "im", "stroke"), class = "factor")),
      .Names = c("id", "diagnosis"), class = "data.frame", row.names = c(NA, -10L))

  tab <- table(df$id, df$diag)

Then, for example, the data cases for "2. Patients with ah but no ihd"
may be obtained

  sel <- tab[, "ah"] != 0 & tab[, "ihd"] == 0
  ah.noihd <- dimnames(tab)[[1]][sel] # [1] "1" "3"
  df[df$id %in% ah.noihd, ]
  #   id diagnosis
  # 1  1        ah
  # 5  3        ah
  # 6  3    stroke

I hope, this helps.

Petr Savicky.

subsets

Thread (10 messages)