Skip to content
Prev 247851 / 398503 Next

subsets

On Thu, Jan 20, 2011 at 10:53:01AM +0200, Den wrote:
This may be understood as a two step procedure:
1. Split the id into disjoint groups according the above criteria.
2. Split the data cases into the groups from step 1.

If this is what you want, then function table() may be used to
collect information on each id.

  df <- structure(list(id = c(1L, 2L, 2L, 2L, 3L, 3L, 4L, 4L, 4L, 5L),
      diagnosis = structure(c(1L, 1L, 3L, 4L, 1L, 5L, 1L, 3L, 2L, 3L),
      .Label = c("ah", "angina", "ihd", "im", "stroke"), class = "factor")),
      .Names = c("id", "diagnosis"), class = "data.frame", row.names = c(NA, -10L))

  tab <- table(df$id, df$diag)

Then, for example, the data cases for "2. Patients with ah but no ihd"
may be obtained

  sel <- tab[, "ah"] != 0 & tab[, "ihd"] == 0
  ah.noihd <- dimnames(tab)[[1]][sel] # [1] "1" "3"
  df[df$id %in% ah.noihd, ]
  #   id diagnosis
  # 1  1        ah
  # 5  3        ah
  # 6  3    stroke

I hope, this helps.

Petr Savicky.