Dear Group,
I am trying to simulate a dataset with 200 individuals with random
assignment of Sex (1,0) and Weight from lognormal distribution specific to
Sex. I am intrigued by the behavior of rlnorm function to impute a value
of Weight from the specified distribution. Here is the code:
ID<-1:200
Sex<-sample(c(0,1),200,replace=T,prob=c(0.4,0.6))
fulldata<-data.frame(ID,Sex)
fulldata$Wt<-ifelse(fulldata$Sex==1,rlnorm(100, meanlog = log(85.1), sdlog
= sqrt(0.0329)),
rlnorm(100, meanlog = log(73), sdlog = sqrt(0.0442)))
mean(fulldata$Wt[fulldata$Sex==0]);to check the mean is close to 73
mean(fulldata$Wt[fulldata$Sex==1]);to check the mean is close to 85
I see that the number of simulated values has an effect on the mean
calculated after imputation. That is, the code rlnorm(100, meanlog =
log(73), sdlog = sqrt(0.0442)) gives much better match compared to
rlnorm(1, meanlog = log(73), sdlog = sqrt(0.0442)) in ifelse statement in
the code above.
My understanding is that ifelse will be imputing only one value where the
condition is met as specified. I appreciate your insights on the behavior
for better performance of increasing sample number. I appreciate your
comments.
Regards,
Ayyappa
rlnorm behaviour
2 messages · Ayyappa Chaturvedula, Thierry Onkelinx
Dear Ayyappa, ifelse works on a vector. See the example below. ifelse( sample(c(TRUE, FALSE), size = length(letters), replace = TRUE), letters, LETTERS ) However, note that it will recycle short vectors when they are not of equal length. ifelse( sample(c(TRUE, FALSE), size = 2 * length(letters), replace = TRUE), letters, LETTERS ) In your code the length of the condition vector is 200, the length of the two other vectors is 100. Best regards, ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2016-06-14 17:02 GMT+02:00 Ayyappa Chaturvedula <ayyappach at gmail.com>:
Dear Group,
I am trying to simulate a dataset with 200 individuals with random
assignment of Sex (1,0) and Weight from lognormal distribution specific to
Sex. I am intrigued by the behavior of rlnorm function to impute a value
of Weight from the specified distribution. Here is the code:
ID<-1:200
Sex<-sample(c(0,1),200,replace=T,prob=c(0.4,0.6))
fulldata<-data.frame(ID,Sex)
fulldata$Wt<-ifelse(fulldata$Sex==1,rlnorm(100, meanlog = log(85.1), sdlog
= sqrt(0.0329)),
rlnorm(100, meanlog = log(73), sdlog = sqrt(0.0442)))
mean(fulldata$Wt[fulldata$Sex==0]);to check the mean is close to 73
mean(fulldata$Wt[fulldata$Sex==1]);to check the mean is close to 85
I see that the number of simulated values has an effect on the mean
calculated after imputation. That is, the code rlnorm(100, meanlog =
log(73), sdlog = sqrt(0.0442)) gives much better match compared to
rlnorm(1, meanlog = log(73), sdlog = sqrt(0.0442)) in ifelse statement in
the code above.
My understanding is that ifelse will be imputing only one value where the
condition is met as specified. I appreciate your insights on the behavior
for better performance of increasing sample number. I appreciate your
comments.
Regards,
Ayyappa
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.