Dear group, My question, perhaps is more of a statistical question using R I have a data matrix ( 400 x 400 normally distributed) with data points ranging from -1 to +1.. For certain clustering algorithms, I suspect the tight data range is not helping resolving the clusters. Is there a way to transform the data something similar to logit, where I dont lose normality of the data and yet I can better expand the data ranges. Thanks Adrian
data transformation
5 messages · Adrian Johnson, David L Carlson, Richard M. Heiberger +1 more
I apologize, I forgot to mention another key operation. in my matrix -1 to <0 has a different meaning while values between >0 to 1 has a different set of meaning. So If I do logit transformation some of the positives becomes negative (values < 0.5 etc.). In such case, the resulting transformed matrix is incorrect. I want to transform numbers ranging from -1 to <0 and numbers between >0 and 1 independently. Thanks
I don't think you have given us enough information. For example, is the 500x500 matrix a distance matrix or does it represent 500 columns of information about 500 rows of observations? If a distance matrix, how is distance being measured? You clarification suggests it may be a distance matrix of correlation coefficients? If distance has different meanings between -1 and 0 and 0 and +1, getting interpretable results from cluster analysis will be difficult, but it is not clear what you mean by that. ------------------------------------------------- David L. Carlson Department of Anthropology Texas A&M University -----Original Message----- From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Adrian Johnson Sent: Sunday, January 20, 2019 8:02 AM To: r-help <r-help at r-project.org> Subject: [R] data transformation Dear group, My question, perhaps is more of a statistical question using R I have a data matrix ( 400 x 400 normally distributed) with data points ranging from -1 to +1.. For certain clustering algorithms, I suspect the tight data range is not helping resolving the clusters. Is there a way to transform the data something similar to logit, where I dont lose normality of the data and yet I can better expand the data ranges. Thanks Adrian ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained -----Original Message----- From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Adrian Johnson Sent: Sunday, January 20, 2019 10:08 AM To: r-help <r-help at r-project.org> Subject: Re: [R] data transformation I apologize, I forgot to mention another key operation. in my matrix -1 to <0 has a different meaning while values between >0 to 1 has a different set of meaning. So If I do logit transformation some of the positives becomes negative (values < 0.5 etc.). In such case, the resulting transformed matrix is incorrect. I want to transform numbers ranging from -1 to <0 and numbers between >0 and 1 independently. Thanks ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
this might work for you newy <- sign(oldy)*f(abs(oldy)) where f() is a monotonic transformation, perhaps a power function. On Sun, Jan 20, 2019 at 11:08 AM Adrian Johnson
<oriolebaltimore at gmail.com> wrote:
I apologize, I forgot to mention another key operation. in my matrix -1 to <0 has a different meaning while values between >0 to 1 has a different set of meaning. So If I do logit transformation some of the positives becomes negative (values < 0.5 etc.). In such case, the resulting transformed matrix is incorrect. I want to transform numbers ranging from -1 to <0 and numbers between >0 and 1 independently. Thanks
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
There is no "perhaps" about it. Nonsense phrases like "similar to logit, where I dont [sic] lose normality of the data" that lead into off-topic discussions of why one introduces transformations in the first place are perfect examples of why questions like this belong on a statistical theory discussion forum like StackExchange rather than here where the topic is the R language.
On January 20, 2019 6:02:15 AM PST, Adrian Johnson <oriolebaltimore at gmail.com> wrote:
Dear group, My question, perhaps is more of a statistical question using R I have a data matrix ( 400 x 400 normally distributed) with data points ranging from -1 to +1.. For certain clustering algorithms, I suspect the tight data range is not helping resolving the clusters. Is there a way to transform the data something similar to logit, where I dont lose normality of the data and yet I can better expand the data ranges. Thanks Adrian
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Sent from my phone. Please excuse my brevity.