An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20080620/36f6696a/attachment.pl>
Unexpected Behavior (potentially) in t.test
4 messages · Russell Pierce, PIKAL Petr, Peter Dalgaard
2 days later
Hi I tried your code with R 2.8.0 devel, gave up after about 4 hours (more or less t1N was 18) but until i interrupted it manually there was no problem neither error. Maybe you could try it with new R version. Besides I did not debug or profile your code so maybe you could try to debug it yourself. Especially 3 nested loops seems to me quite ineffective together with incremental Cgroup and Tgroup increasing. Regards Petr petr.pikal at precheza.cz 724008364, 581252140, 581252257 r-help-bounces at r-project.org napsal dne 20.06.2008 20:55:08:
Greetings, I have stumbled across some unexpected behavior (potential a bug) in,
what I
suspect to be R's (2.6.2 on Ubuntu Linux) t.test function; then again
the
problem may exist in my code. I have shutdown R and started it back up, re-run the code and re-experienced the error. I have searched on Google
for
the abnormal termination error message "(stderr < 10 *
.Machine$double.eps *
max(abs(mx), abs(my))) stop("data are essentially constant")" but only
found
,
but the discussion there did not seem particularly helpful. I've included all of my code, amateurish though it may be. I have not isolated the faulty part, and to me it all looks pretty simple, so I'm
not
sure where I'm going wrong. For background, the goal of this code is to
run
a simulation to explore the problem space of inflation of Type I error
when
decisions to run or not to run more participants are made by preliminary looks at the data (as in Wagenmakers, 2007). This code is meant to
examine
the problem space given that there is no true difference between the
groups
(as is the case when both a generated from random draws from the normal distribution). I run an initial number of subjects in two groups (t1N)
then,
if p is < .25 on the t-test I add t2N more subjects to each group. Then
I
perform the t.test again. If the p was > .25 at time 1 I stop. Plainp is simply storing the p-values from t2 (if it was performed) or from t1 (if
t2
was not performed). In the code I provide t1 starts at 16 since this is about when the problem becomes more frequent. Please note that it takes quite a long while to fail, and depending on what the true cause is it
may
not fail at all. On my system it is failing before t1N advances to 17.
Any suggestions as to how to avoid the error and instructions as to the
cause of it would be appreciated. Thank you for your input and patience.
logit <- function(p)
{
# compute and return logit of p;
# if p=.5 then logit==0 else sign(logit)==(p>.5)
return( log(p/(1-p)) )
}
antilogit <- function(x)
{
# compute and return antilogit of x;
# this returns a proportion p for which logit(p)==x;
return( exp(x)/(1+exp(x)) )
}
plainp <- c() #Clear the plainp value
t1Nsim <- (100/5) * 1000 * 10 # random chance should provide 10000 cases
at
t1
contthreshold <- .25 #p value below which we run more subjects
t1pvals <- rep(NA,t1Nsim) #clear the pvalues
t2pvals <- rep(NA,t1Nsim) #clear the pvalues
t1N <- 10 #for debugging
t2N <- 5 #for debugging
for (t1N in 16:50) #Outer loop testing possible values for t1N
for (t2N in 1:50) #Inner loop testing possible values for t2N
{
print(paste("Checking with ",t1N," initial samples and ",t2N," extra
samples",sep="")) #feedback
for (lcv in 1:t1Nsim) #Run simulation t1Nsim times...
{
if (lcv %% 20000 == 0) {print(paste((lcv/t1Nsim)*100,"%",sep=""))}
#feedback
Cgroup <- rnorm(t1N) #Initial random draw for Group1
Tgroup <- rnorm(t1N) #Initial random draw for Group 2
currentp <- t.test(Cgroup,Tgroup)[["p.value"]] #Get t1 p value
t1pvals[lcv] <- currentp #Store t1 p value
#If p >= .05 or <= continue threshold then run more subjects
if ((currentp <= contthreshold) & (currentp >= .05)) {
Cgroup <- c(Cgroup,rnorm(t2N)) #Add t2N subjects to group 1
Tgroup <- c(Tgroup,rnorm(t2N)) #Add t2N subjects to group 2
currentp <- t.test(Cgroup,Tgroup)[["p.value"]] #Get t2 p value
t2pvals[lcv] <- currentp #store t2 p value
}
}
plainp <- ifelse(!is.na(t2pvals),{t2pvals},{t1pvals}) #Make sure we are
looking at the right ps
table(t1pvals <= .05); round(summary(t1pvals),4) #debugging
hist(t1pvals) #debugging
table(plainp <= .05); round(summary(plainp),4) #debugging
hist(plainp, probability=TRUE,main=paste(t1N,"then",t2N)) #Histogram of
interest
abline(a=1.00,b=0) #Baseline probability
dev.copy(jpeg,filename=paste("Sim with ",t1N, "start samples and ",t2N,"
extra samples.jpg",sep=""),height=600,width=800,bg="white") # Create the
image
dev.off() #Save the image
chi <- rbind(table(t1pvals <= .05),table(plainp <= .05)) #debugging
chisq.test(chi) #debugging
explore <- data.frame(t1=t1pvals,t2=t2pvals,picked=plainp) #debugging
t.test(explore$picked,explore$t1) #debugging
t.test(logit(explore$picked),logit(explore$t1)) #debugging
}
---
Russell Pierce
Psychology Department
Graduate Student - Cognitive
(951) 827-2553
University of California, Riverside, 92521
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Petr PIKAL wrote:
Hi I tried your code with R 2.8.0 devel, gave up after about 4 hours (more or less t1N was 18) but until i interrupted it manually there was no problem neither error. Maybe you could try it with new R version. Besides I did not debug or profile your code so maybe you could try to debug it yourself. Especially 3 nested loops seems to me quite ineffective together with incremental Cgroup and Tgroup increasing.
Yes. I had this running over a large part of the weekend (started it in
a window and forgot about it...) with no error.
So you (Russell, not Petr) need to do some more work. You have the
problem, and you can do things like printing the data vector(s) that
t.test complains about. Notice that try() allows you to catch the error
and do something before exiting.
-p
Regards Petr petr.pikal at precheza.cz 724008364, 581252140, 581252257 r-help-bounces at r-project.org napsal dne 20.06.2008 20:55:08:
Greetings,
I have stumbled across some unexpected behavior (potential a bug) in,
what I
suspect to be R's (2.6.2 on Ubuntu Linux) t.test function; then again
the
problem may exist in my code. I have shutdown R and started it back up,
re-run the code and re-experienced the error. I have searched on Google
for
the abnormal termination error message "(stderr < 10 *
.Machine$double.eps *
max(abs(mx), abs(my))) stop("data are essentially constant")" but only
found
,
but the discussion there did not seem particularly helpful.
I've included all of my code, amateurish though it may be. I have not
isolated the faulty part, and to me it all looks pretty simple, so I'm
not
sure where I'm going wrong. For background, the goal of this code is to
run
a simulation to explore the problem space of inflation of Type I error
when
decisions to run or not to run more participants are made by preliminary
looks at the data (as in Wagenmakers, 2007). This code is meant to
examine
the problem space given that there is no true difference between the
groups
(as is the case when both a generated from random draws from the normal
distribution). I run an initial number of subjects in two groups (t1N)
then,
if p is < .25 on the t-test I add t2N more subjects to each group. Then
I
perform the t.test again. If the p was > .25 at time 1 I stop. Plainp is
simply storing the p-values from t2 (if it was performed) or from t1 (if
t2
was not performed). In the code I provide t1 starts at 16 since this is
about when the problem becomes more frequent. Please note that it takes
quite a long while to fail, and depending on what the true cause is it
may
not fail at all. On my system it is failing before t1N advances to 17.
Any suggestions as to how to avoid the error and instructions as to the
cause of it would be appreciated. Thank you for your input and patience.
logit <- function(p)
{
# compute and return logit of p;
# if p=.5 then logit==0 else sign(logit)==(p>.5)
return( log(p/(1-p)) )
}
antilogit <- function(x)
{
# compute and return antilogit of x;
# this returns a proportion p for which logit(p)==x;
return( exp(x)/(1+exp(x)) )
}
plainp <- c() #Clear the plainp value
t1Nsim <- (100/5) * 1000 * 10 # random chance should provide 10000 cases
at
t1
contthreshold <- .25 #p value below which we run more subjects
t1pvals <- rep(NA,t1Nsim) #clear the pvalues
t2pvals <- rep(NA,t1Nsim) #clear the pvalues
t1N <- 10 #for debugging
t2N <- 5 #for debugging
for (t1N in 16:50) #Outer loop testing possible values for t1N
for (t2N in 1:50) #Inner loop testing possible values for t2N
{
print(paste("Checking with ",t1N," initial samples and ",t2N," extra
samples",sep="")) #feedback
for (lcv in 1:t1Nsim) #Run simulation t1Nsim times...
{
if (lcv %% 20000 == 0) {print(paste((lcv/t1Nsim)*100,"%",sep=""))}
#feedback
Cgroup <- rnorm(t1N) #Initial random draw for Group1
Tgroup <- rnorm(t1N) #Initial random draw for Group 2
currentp <- t.test(Cgroup,Tgroup)[["p.value"]] #Get t1 p value
t1pvals[lcv] <- currentp #Store t1 p value
#If p >= .05 or <= continue threshold then run more subjects
if ((currentp <= contthreshold) & (currentp >= .05)) {
Cgroup <- c(Cgroup,rnorm(t2N)) #Add t2N subjects to group 1
Tgroup <- c(Tgroup,rnorm(t2N)) #Add t2N subjects to group 2
currentp <- t.test(Cgroup,Tgroup)[["p.value"]] #Get t2 p value
t2pvals[lcv] <- currentp #store t2 p value
}
}
plainp <- ifelse(!is.na(t2pvals),{t2pvals},{t1pvals}) #Make sure we are
looking at the right ps
table(t1pvals <= .05); round(summary(t1pvals),4) #debugging
hist(t1pvals) #debugging
table(plainp <= .05); round(summary(plainp),4) #debugging
hist(plainp, probability=TRUE,main=paste(t1N,"then",t2N)) #Histogram of
interest
abline(a=1.00,b=0) #Baseline probability
dev.copy(jpeg,filename=paste("Sim with ",t1N, "start samples and ",t2N,"
extra samples.jpg",sep=""),height=600,width=800,bg="white") # Create the
image
dev.off() #Save the image
chi <- rbind(table(t1pvals <= .05),table(plainp <= .05)) #debugging
chisq.test(chi) #debugging
explore <- data.frame(t1=t1pvals,t2=t2pvals,picked=plainp) #debugging
t.test(explore$picked,explore$t1) #debugging
t.test(logit(explore$picked),logit(explore$t1)) #debugging
}
---
Russell Pierce
Psychology Department
Graduate Student - Cognitive
(951) 827-2553
University of California, Riverside, 92521
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20080623/96c89eeb/attachment.pl>