Normalization and missing values

You can't expect statistical procedures to rescue you from poor
data.
That should ***definitely*** go into the fortune package
	data base!!!

				cheers,

					Rolf Turner
					rolf at math.unb.ca

Bert Gunter wrote:

You can't expect statistical procedures to rescue you from poor
data.
	That should ***definitely*** go into the fortune package
	data base!!!
:-) added for the next release.
Z
				cheers,

					Rolf Turner
					rolf at math.unb.ca

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

the way of scaling, IMHO, really depends on the distribution of each
column in your original files. if each column in your data follows a
normal distrbution, then a standard "normalization" will fit your
requirement.

My previous research in microarray data shows me a simple "linear
standardization" might be good enough for some purpose.

If your columns differ in magnitude, then some data transformation
like (log) might be needed first.

Ed
On Wed, 13 Apr 2005 14:33:25 -0300 (ADT) Rolf Turner wrote:

Bert Gunter wrote:

You can't expect statistical procedures to rescue you from poor
data.
      That should ***definitely*** go into the fortune package
      data base!!!
:-) added for the next release.
Z

                              cheers,

                                      Rolf Turner
                                      rolf at math.unb.ca

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

What the best missing value imputation ? It depends on how the values
were generated (e.g. missing at random, informative missing ) and what
type of data (e.g. counts, continuous).

If you are interested in this you could either :

1) take the dataset of complete cases and impute missing values
according to the pattern of missing-ness you see on the whole data. Then
apply different types of imputation techniques and see which one has the
best results.

2) Or look for studies that have evaluated different techniques in your
_field_ and apply the best one.

Regards, Adai
the way of scaling, IMHO, really depends on the distribution of each
column in your original files. if each column in your data follows a
normal distrbution, then a standard "normalization" will fit your
requirement.

My previous research in microarray data shows me a simple "linear
standardization" might be good enough for some purpose.

If your columns differ in magnitude, then some data transformation
like (log) might be needed first.

Ed

On 4/13/05, Achim Zeileis <Achim.Zeileis at wu-wien.ac.at> wrote:
On Wed, 13 Apr 2005 14:33:25 -0300 (ADT) Rolf Turner wrote:

Bert Gunter wrote:

You can't expect statistical procedures to rescue you from poor
data.
      That should ***definitely*** go into the fortune package
      data base!!!
:-) added for the next release.
Z

                              cheers,

                                      Rolf Turner
                                      rolf at math.unb.ca

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html