Skip to content

How to make our data normally distributed in R

7 messages · Neha gupta, Jin Li, Rui Barradas

#
Hi

I have a regression based data where I get the RMSE results as:

SVM=3500
ANN=4600
R.Forest=2900

I want to know how can I make it so that its values comes as 0-1

I plot the boxplot for it to indicate their RMSE values and used,
ylim=(0,1), but the boxplot which works for RMSE values like 3500 etc, but
when I use ylim=(0,1), all the boxplots suddenly disappears. What should I
do for it?

Thanks
#
Hello,

To rescale data so that their values are between 0 and 1, use this function:


scale01 <- function(x, na.rm = FALSE){
   (x - min(x, na.rm = na.rm))/(max(x, na.rm = na.rm) - min(x, na.rm = 
na.rm))
}

x <- c(SVM=3500,
        ANN=4600,
        R.Forest=2900)

scale01(x)
#      SVM       ANN  R.Forest
#0.3529412 1.0000000 0.0000000


See base R function ?scale for another way of scaling data.

As for the second question, if your RMSE vector had values in the range 
2900 to 4600 and the y axis limits are c(0, 1), how can you expect to 
see anything?

Hope this helps,

Rui Barradas


?s 21:08 de 12/03/20, Neha gupta escreveu:
#
Thanks Hasan and Rui

Rui, as you mentioned

As for the second question, if your RMSE vector had values in the range
2900 to 4600 and the y axis limits are c(0, 1), how can you expect to
see anything?

Then what should be the values of ylim in boxplots? I need to show them as
boxplot between 0-1 or 1-10, even 10-100 but it will be very awkward if the
boxplot shows the values of 3500 etc.

Regards
On Thu, Mar 12, 2020 at 11:51 PM Rui Barradas <ruipbarradas at sapo.pt> wrote:

            

  
  
#
Hi,
Why do you want to re-scale RMSE to 0-1? You can change ylim=(0,1) to
ylim=(0, 4600). You may use VEcv (Variance explained by predictive models
based on cross-validation) that ranges from  0 to 100% instead. It can be
calculated using vecv function in library(spm) or you can convert RMSE to
VEcv using tovecv in spm.
Hope this helps,
Jin
On Fri, Mar 13, 2020 at 8:08 AM Neha gupta <neha.bologna90 at gmail.com> wrote:

            

  
    
#
Hello,

Why would it be awkward to show values like 4600? If those are the 
values, show them. When there is a large difference, orders of 
magnitude, you can plot logs by setting parameter log = "y" as in

boxplot(10^(0:5), log = "y")

But I don't see why to have values in the range 2900-4600 (same order of 
magnitude) is a reason to alter ylim.

Hope this helps,

Rui Barradas

?s 22:58 de 12/03/20, Neha gupta escreveu:
#
Thanks a lot Jin..

If my total number of observations are 500,
n will be 500,
mu will be average (500)
s will be sd (500)
and m will be RMSE value i.e. 4500 in this case?

tovecv(n=500, mu=average (500), s=sd, m=4500, measure="rmse")
On Fri, Mar 13, 2020 at 12:46 AM Jin Li <jinli68 at gmail.com> wrote:

            

  
  
2 days later
#
Please note that mu and sd are the mean and standard deviation of
validation samples. You may use pred.acc in spm to calculate a number of
error and accuracy measures including RMSE and VEcv from the observed and
predicted values directly.
On Sat, Mar 14, 2020 at 2:07 AM Neha gupta <neha.bologna90 at gmail.com> wrote: