Bert Gunter wrote:
You can't expect statistical procedures to rescue you from poor data.
That should ***definitely*** go into the fortune package data base!!! cheers, Rolf Turner rolf at math.unb.ca
4 messages · Rolf Turner, Achim Zeileis, Weiwei Shi +1 more
You can't expect statistical procedures to rescue you from poor data.
That should ***definitely*** go into the fortune package data base!!! cheers, Rolf Turner rolf at math.unb.ca
Bert Gunter wrote:
You can't expect statistical procedures to rescue you from poor data.
That should ***definitely*** go into the fortune package data base!!!
:-) added for the next release. Z
cheers, Rolf Turner rolf at math.unb.ca
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
the way of scaling, IMHO, really depends on the distribution of each column in your original files. if each column in your data follows a normal distrbution, then a standard "normalization" will fit your requirement. My previous research in microarray data shows me a simple "linear standardization" might be good enough for some purpose. If your columns differ in magnitude, then some data transformation like (log) might be needed first. Ed
On Wed, 13 Apr 2005 14:33:25 -0300 (ADT) Rolf Turner wrote:
Bert Gunter wrote:
You can't expect statistical procedures to rescue you from poor data.
That should ***definitely*** go into the fortune package
data base!!!
:-) added for the next release. Z
cheers,
Rolf Turner
rolf at math.unb.ca
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
What the best missing value imputation ? It depends on how the values were generated (e.g. missing at random, informative missing ) and what type of data (e.g. counts, continuous). If you are interested in this you could either : 1) take the dataset of complete cases and impute missing values according to the pattern of missing-ness you see on the whole data. Then apply different types of imputation techniques and see which one has the best results. 2) Or look for studies that have evaluated different techniques in your _field_ and apply the best one. Regards, Adai
the way of scaling, IMHO, really depends on the distribution of each column in your original files. if each column in your data follows a normal distrbution, then a standard "normalization" will fit your requirement. My previous research in microarray data shows me a simple "linear standardization" might be good enough for some purpose. If your columns differ in magnitude, then some data transformation like (log) might be needed first. Ed On 4/13/05, Achim Zeileis <Achim.Zeileis at wu-wien.ac.at> wrote:
On Wed, 13 Apr 2005 14:33:25 -0300 (ADT) Rolf Turner wrote:
Bert Gunter wrote:
You can't expect statistical procedures to rescue you from poor data.
That should ***definitely*** go into the fortune package
data base!!!
:-) added for the next release. Z
cheers,
Rolf Turner
rolf at math.unb.ca
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html