Skip to content

Add Gauss normal curve ?

5 messages · David Winsemius, Peter Dalgaard, varin sacha +1 more

#
Dear R-experts,

Here below my reproducible example. I would like to fit/add the Gauss normal curve to this data. 
I don't get it. There is no error message but I don't get what I am looking for. 
Many thanks for your help.

############################################################
mydates <- as.Date(c("2020-03-15", "2020-03-16","2020-03-17","2020-03-18","2020-03-19","2020-03-20","2020-03-21","2020-03-22","2020-03-23","2020-03-24","2020-03-25","2020-03-26","2020-03-27","2020-03-28","2020-03-29","2020-03-30","2020-03-31","2020-04-01","2020-04-02","2020-04-03","2020-04-04","2020-04-05","2020-04-06","2020-04-07","2020-04-08","2020-04-09","2020-04-10"))

nc<-c(1,1,2,7,3,6,6,20,17,46,67,71,56,70,85,93,301,339,325,226,608,546,1069,1264,1340,813,608)

plot(as.Date(mydates),nc,pch=16,type="o",col="blue",ylim=c(1,1400), xlim=c(min(as.Date(mydates)),max(as.Date(mydates))))

x <- seq(min(mydates), max(mydates), 0.1) 

curve(dnorm(x, mean(nc), sd(nc)), add=TRUE, col="red", lwd=2)
############################################################
#
On 4/11/20 7:00 AM, varin sacha via R-help wrote:
(I infer) The values in the `nc` vector are not taken from observations 
that are interpretable as independent sampling from a continuous random 
vector. They are counts, i.e. "new cases".


Furthermore, the "x" value in your plot is not the `nc` vector but 
rather it is the the ""y"-vector, so even if it were appropriate to use 
a Normal curve for fitting you would need to take the `nc` vector as 
corresponding to a density along the time axis.

You could probably do as well by "eyeballing" where you want the 
"normal" curve to sit, since there would be no theoretical support for 
more refined curve fitting efforts. You might also need to scale the 
density values so they would appear as something other than a flat line.

And the `curve` function does need an expression but it would be 
plotting that result far to the left of your current plotting range 
which is set by the integer values of those dates, i.e values in the 
tens of thousands. Use the `lines` function for better control.


lines( x= as.numeric(mydates),

 ? ? ? ? ? ? ? ? ? # 3000 was eyeball guess as to a scaling factor that 
might work

 ????????????????? # but needed a larger number to make the curves 
commensurate

 ?????? y=10000* dnorm( x= as.numeric(mydates),? #set a proper x scale

 ??????????????? ? ? ?? mean= as.numeric( mydates[ which.max(nc) ]),? 
#use location of max

 ???????????????? ? ? ? sd= 7) )


Might need to use smaller value for the "standard deviation" and higher 
scaling factor to improve the eyeball fit.You might like a value of 
sd=4, but it would remain an unsupportable effort from a statistical 
viewpoint.
#
Two obvious problems: 

1. mean(nc) is a count, not a date, sd likewise
2. the scale of dnorm() is density, not count

So (slightly inefficient, but who cares...):

y <- rep(mydates, nc)
n <- sum(nc)
curve(n*dnorm(x, mean(y), sd(y)), add=TRUE, col="red", lwd=2)

-pd

  
    
#
Dear Peter, 
Dear David,

Many thanks for your response. 
Indeed, counts do not have a Gaussian distribution, even if.... sometimes one approximates the distribution by a Gaussian one, usually using the argument of the Central Limit Theorem.

Here below the reproducible example. 
One last thing. Now if I want to move my red Gaussian curve to the right or to the left, for example on the graph I can see that the Gaussian curve is centered around the 5th of April.

Is it possible to move the Gaussian curve to make the center of the Gaussian curve on the 30th of March for example ? If yes, how to do ?

############################################################
mydates <- as.Date(c("2020-03-15", "2020-03-16","2020-03-17","2020-03-18","2020-03-19","2020-03-20","2020-03-21","2020-03-22","2020-03-23","2020-03-24","2020-03-25","2020-03-26","2020-03-27","2020-03-28","2020-03-29","2020-03-30","2020-03-31","2020-04-01","2020-04-02","2020-04-03","2020-04-04","2020-04-05","2020-04-06","2020-04-07","2020-04-08","2020-04-09","2020-04-10"))

nc<-c(1,1,2,7,3,6,6,20,17,46,67,71,56,70,85,93,301,339,325,226,608,546,1069,1264,1340,813,608)

plot(as.Date(mydates),nc,pch=16,type="o",col="blue",ylim=c(1,1400), xlim=c(min(as.Date(mydates)),max(as.Date(mydates))))

y <- rep(mydates, nc)
n <- sum(nc)
curve(n*dnorm(x, mean(y), sd(y)), add=TRUE, col="red", lwd=2)
############################################################







Le samedi 11 avril 2020 ? 17:02:36 UTC+2, peter dalgaard <pdalgd at gmail.com> a ?crit : 





Two obvious problems: 

1. mean(nc) is a count, not a date, sd likewise
2. the scale of dnorm() is density, not count

So (slightly inefficient, but who cares...):

y <- rep(mydates, nc)
n <- sum(nc)
curve(n*dnorm(x, mean(y), sd(y)), add=TRUE, col="red", lwd=2)

-pd

  
    
#
Just replace 'mean(y)' in your curve() function with 
as.Date("whatever-date")

curve(n*dnorm(x, as.Date("2020-03-30"), sd(y)), add=TRUE, col="red", lwd=2)

Dan
On 4/11/2020 8:36 AM, varin sacha via R-help wrote: