Imputation of missing values

Wed, Jul 10, 2013 3:26 AM #

Hi all,

I would like to impute missing values in a data set based on the distribution of the other values of the variable.

Imagine that 30 % of the values = 1, 20 % = 2 and 50 % = 3, in effect I'd like to do the following :

df$var[df$var==NA]<-1 # for 30 % of the NA occurrences #
df$var[df$var==NA]<-2 # for 20 % of the NA occurrences #
df$var[df$var==NA]<-3 # for 50 % of the NA occurrences #

Can anybody help ?

John