Skip to content
Prev 262031 / 398502 Next

Not missing at random

Hi Blaz,

See below.

x <-
matrix(c(1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,3,3,3,4),
 nrow = 7, ncol=7, byrow=TRUE) ####matrix

pMiss <- 30     ####percent of missing values

N <- dim(x)[1]   ####number of cases

candidate <- which(x[,1]<3 | x[,2]<3 | x[,3]<3 | x[,4]<3 | x[,5]<3 | x[,6]<3 |
x[,7]<3)    #### I want to sample all cases with at least 1 value
lower than 3, so I have to find candidates

## easier to use this
## find all x < 3 and return their row and column indices
## select only row indices, and then find unique
candidate <- unique(which(x < 3, arr.ind = TRUE)[, "row"])

idMiss <- sample(candidate, N * pMiss / 100)  #### I sampled cases

## from the subset of x cases that will be missing
## find all that are < 3 and set to NA
x[idMiss, ][x[idMiss, ] < 3] <- NA

## If you are going to do this a lot, consider a function
nmar <- function(x, op = "<", value = 3, p = 30) {
  op <- get(op)
  candidate <- unique(which(op(x, value), arr.ind = TRUE)[, "row"])
  idMiss <- sample(candidate, nrow(x) * p / 100)
  x[idMiss, ][op(x[idMiss, ], value)] <- NA
  return(x)
}

nmar(x)

## has the advantage that you can easily change
## p, the cut off value, the operator (e.g., "<", ">", "<=", etc.)

Cheers,

Josh
On Sun, Jun 5, 2011 at 11:17 PM, Blaz Simcic <blazsimcic at yahoo.com> wrote: