On Wed, Apr 17, 2013 at 4:35 PM, mauvela <mauricioandresvela at gmail.com> wrote:
I need to interpolate some data about PM10 for some location (schools). I have daily data and about 50 stations. I have to interpolate for every day but my problems comes with the missing values of many stations in many days. For example for one day I could have data for 10 stations while for other day data from 50. When ignoring these missing data and interpolating using ordinary kriging for each day, the results for each school varies a lot depending of which stations have available data. For example a school near one station changes a lot when that station have missing in one day. What should be the best way to deal with this missing values, is there a method for imputation that takes into account the temporal and the spatial variability of the data?
Off the top of my head, do multiple imputations of the missing values based on the mean and sd of the values at that site when not missing. You'll then end up with a number (100, say) of kriged maps. You can probably then take the mean over those as your map and for the variance you'll have to combine the kriging variance with the imputation variance... This is probably valid assuming the dropouts are random... Also, it doesn't take into account any temporal correlation which might get you a better estimate of your imputed values... What you do may also depend on what you are doing with the data. If its just to produce pretty maps, then you might not need something so sophisticated. If you are computing the number of days that PM10 in some location exceeds some threshold, then you may have to give it some more thought... Barry