Skip to content

Normalization and missing values

4 messages · Rolf Turner, Achim Zeileis, Weiwei Shi +1 more

#
Bert Gunter wrote:

            
That should ***definitely*** go into the fortune package
	data base!!!

				cheers,

					Rolf Turner
					rolf at math.unb.ca
#
On Wed, 13 Apr 2005 14:33:25 -0300 (ADT) Rolf Turner wrote:

            
:-) added for the next release.
Z
#
the way of scaling, IMHO, really depends on the distribution of each
column in your original files. if each column in your data follows a
normal distrbution, then a standard "normalization" will fit your
requirement.

My previous research in microarray data shows me a simple "linear
standardization" might be good enough for some purpose.

If your columns differ in magnitude, then some data transformation
like (log) might be needed first.

Ed
On 4/13/05, Achim Zeileis <Achim.Zeileis at wu-wien.ac.at> wrote:
#
What the best missing value imputation ? It depends on how the values
were generated (e.g. missing at random, informative missing ) and what
type of data (e.g. counts, continuous).

If you are interested in this you could either :

1) take the dataset of complete cases and impute missing values
according to the pattern of missing-ness you see on the whole data. Then
apply different types of imputation techniques and see which one has the
best results.

2) Or look for studies that have evaluated different techniques in your
_field_ and apply the best one.

Regards, Adai
On Wed, 2005-04-13 at 13:36 -0500, WeiWei Shi wrote: