Automating R for Hypothesis Testing

7 messages · meredith, Rui Barradas

meredith

Tue, May 8, 2012 1:41 PM #

R Users-
  I have been trying to automate a manual code that I have developed for
calling in a .csv file, isolating certain rows and columns that correspond
to specified months:
something to the effect
i=name.csv
N=length(i$month)
iphos1=0
iphos2=0
isphos3=0
for i=1,N
 if month=1
    iphos1=iphos+1
    iphos1(iphos1)=i

an so on to call out the months into there own arrays (unless there is a way
I can wrap it into the next automation)

Next: I would like to run a simple linear regression combining each of the
months 1 by 1:
for instance I want to run a regression on a combined model from months 1
and 2 and a dummy model for 1 and 2, compare them using a Chi-sq
distribution, if Chi-sq is less than the Critical value, we accept and go on
to test another set of models with both 1 and 2. If it rejects, then we
proceed to months 2 and 3.  If we move on to the second set on months 1 and
2, and the critical value is accepted, I want to print an accept or reject
and move on to months 2 and 3, until finally comparing months 12-1 at the
end.
Is there a way to loop or automate this in R?

Thanks
Meredith


--
View this message in context: http://r.789695.n4.nabble.com/Automating-R-for-Hypothesis-Testing-tp4618653.html
Sent from the R help mailing list archive at Nabble.com.

Rui Barradas

Tue, May 8, 2012 6:44 PM #

Hello,

I'm not at all sure if I understand your problem. Does this describe it?


test first model for months 1 and 2
if test statistic less than critical value{
	test second model for months 1 and 2
	print results of the first and second tests? just one of them?
}
move on to months 2 and 3
etc, until months 12 and 1


Please post example data using dput(dataset).
Just copy it's output and paste it in your post.

And example code, what you're already doing.
(Possibly simplified)

Rui Barradas


meredith wrote

--
View this message in context: http://r.789695.n4.nabble.com/Automating-R-for-Hypothesis-Testing-tp4618653p4619195.html
Sent from the R help mailing list archive at Nabble.com.

meredith

Wed, May 9, 2012 4:46 AM #

dput:  http://r.789695.n4.nabble.com/file/n4620188/milwaukeephos.csv
milwaukeephos.csv 

# Feb-march

(Intercept) lffeb_march 
  -2.429890    1.172821

(Intercept)   X1feb_mar lffeb_march 
 -2.8957776  -0.5272793   1.3016303

# Test Stat Equation for Chisq

if less than CV move onto to slope modification

(Intercept) X3feb_march 
   5.342130    1.172821

(Intercept) X3feb_march X4feb_march 
  5.2936263   1.0353752   0.2407557 
# Test Stat

# Record Chisq value

Does this help?
Here lffeb_march is the combination of Feb and March log flows
and llfeb_march is the combination of Feb and March log loads
X3: lffeb_march-mean(feb_march)
X4: X1*X3

Thanks

Rui Barradas wrote

Rui Barradas wrote

--
View this message in context: http://r.789695.n4.nabble.com/Automating-R-for-Hypothesis-Testing-tp4618653p4620188.html
Sent from the R help mailing list archive at Nabble.com.

Rui Barradas

Wed, May 9, 2012 7:55 AM #

Hello,

Yes, it does help. Now we can see your data and what you're doing.
What follows is a suggestion on what you could do, not full solution.
(You forgot to say what X1 is, but I don't think it's important to
understand the suggestion.)
(If I'm wrong, say something.)


milwaukeephos <- read.csv("milwaukeephos.csv", header=TRUE,
stringsAsFactors=FALSE)
# list of data.frames, one per month
ls1 <- split(milwaukeephos, milwaukeephos$month)

#--------- if you want to keep the models, not needed if you don't.
#          (yoy probably don't)
modelH <- vector("list", 12)
modelHa <- vector("list", 12)
modelH2 <- vector("list", 12)
modelH2a <- vector("list", 12)
#--------- values to record, these are needed, create them beforehand.
chi_fm <- numeric(12)
chi_fms <- numeric(12)
#
seq_months <- c(1:12, 1) # wrap months around.
for(i in 1:12){
	month_this <- seq_months[i]
	month_next <- seq_months[i + 1]

	lload <- c(ls1[[month_this]]$load_kg, ls1[[month_next]]$load_kg)
	lflow <- c(ls1[[month_this]]$flow, ls1[[month_next]]$flow)
	modelH[[i]] <- lm(lload ~ lflow)
	# If you don't want to keep the models, use modelH only
	# ( without [[i]] )
	# and do the same with X1

	# rest of your code for first test goes here
	chi_fm[i] <- bfm %*% var_fm %*% (bunres_fm - bres_fm)

	# and the same for the second test
	chi_fms[i] <- ...etc...
}


Hope this helps,

Rui Barradas


meredith wrote

dput:  http://r.789695.n4.nabble.com/file/n4620188/milwaukeephos.csv
milwaukeephos.csv 

# Feb-march

modelH_febmarch<-lm(llfeb_march~lffeb_march)
modelHa_febmarch<-lm(llfeb_march~X1feb_mar+lffeb_march)
anova(modelHa_febmarch)
coefficients(modelH_febmarch)

(Intercept) lffeb_march 
  -2.429890    1.172821

coefficients(modelHa_febmarch)

(Intercept)   X1feb_mar lffeb_march 
 -2.8957776  -0.5272793   1.3016303

bres_fm<-matrix(c(-2.429890,0,1.172821),nrow=3)
bunres_fm<-matrix(c(-2.8957776,-0.5272793,1.3016303),nrow=3)
bfm<-t(bunres_fm-bres_fm)
fmvect<-seq(1,1,length=34)
X1a_febmar<-seq(0,0,length=9) # dummy variable step 1
X1b_febmar<-seq(1,1,length=25) # dummy variable step 2
X1feb_mar<-c(X1a_febmar,X1b_febmar) #dummy variable creation

# Test Stat Equation for Chisq

fmxx<-cbind(fmvect,X1feb_mar,lffeb_march)
tfmx<-t(fmxx)
xcom_fm<-(tfmx %*% fmxx)
xinv_fm<-ginv(xcom_fm)
var_fm<-xinv_fm*0.307
chi_fm<-bfm %*% var_fm %*% (bunres_fm-bres_fm)
chi_fm # chisq value for recording

if less than CV move onto to slope modification

modelH2_febmarch<-lm(llfeb_march~X3feb_march)
modelH2a_febmarch<-lm(llfeb_march~X3feb_march+X4feb_march)
anova(modelH2a_febmarch)
coefficients(modelH2_febmarch) # get coefficients to make beta vectors
for test

(Intercept) X3feb_march 
   5.342130    1.172821

coefficients(modelH2a_febmarch)

(Intercept) X3feb_march X4feb_march 
  5.2936263   1.0353752   0.2407557 
# Test Stat

bsres_fm<-matrix(c(5.342130,1.172821,0),nrow=3)
bsunres_fm<-matrix(c(5.2936263,1.0353752,0.2407557),nrow=3)
bsfm<-t(bsunres_fm-bsres_fm)
#X matrix
fmxs<-cbind(fmvect,X3feb_march,X4feb_march)
tfmxs<-t(fmxs)
xcoms_fm<-(tfmxs %*% fmxs)
xinvs_fm<-ginv(xcoms_fm)
var_fms<-xinvs_fm*0.341
chi_fms<-bsfm %*% var_fms %*% (bsunres_fm-bsres_fm)
chi_fms

# Record Chisq value

Does this help?
Here lffeb_march is the combination of Feb and March log flows
and llfeb_march is the combination of Feb and March log loads
X3: lffeb_march-mean(feb_march)
X4: X1*X3

Thanks

Rui Barradas wrote

Hello,

I'm not at all sure if I understand your problem. Does this describe it?


test first model for months 1 and 2
if test statistic less than critical value{
	test second model for months 1 and 2
	print results of the first and second tests? just one of them?
}
move on to months 2 and 3
etc, until months 12 and 1


Please post example data using dput(dataset).
Just copy it's output and paste it in your post.

And example code, what you're already doing.
(Possibly simplified)

Rui Barradas


meredith wrote

R Users-
  I have been trying to automate a manual code that I have developed for
calling in a .csv file, isolating certain rows and columns that
correspond to specified months:
something to the effect
i=name.csv
N=length(i$month)
iphos1=0
iphos2=0
isphos3=0
for i=1,N
 if month=1
    iphos1=iphos+1
    iphos1(iphos1)=i

an so on to call out the months into there own arrays (unless there is a
way I can wrap it into the next automation)

Next: I would like to run a simple linear regression combining each of
the months 1 by 1:
for instance I want to run a regression on a combined model from months
1 and 2 and a dummy model for 1 and 2, compare them using a Chi-sq
distribution, if Chi-sq is less than the Critical value, we accept and
go on to test another set of models with both 1 and 2. If it rejects,
then we proceed to months 2 and 3.  If we move on to the second set on
months 1 and 2, and the critical value is accepted, I want to print an
accept or reject and move on to months 2 and 3, until finally comparing
months 12-1 at the end.
Is there a way to loop or automate this in R?

Thanks
Meredith

Rui Barradas wrote

Hello,

I'm not at all sure if I understand your problem. Does this describe it?


test first model for months 1 and 2
if test statistic less than critical value{
	test second model for months 1 and 2
	print results of the first and second tests? just one of them?
}
move on to months 2 and 3
etc, until months 12 and 1


Please post example data using dput(dataset).
Just copy it's output and paste it in your post.

And example code, what you're already doing.
(Possibly simplified)

Rui Barradas


meredith wrote

R Users-
  I have been trying to automate a manual code that I have developed for
calling in a .csv file, isolating certain rows and columns that
correspond to specified months:
something to the effect
i=name.csv
N=length(i$month)
iphos1=0
iphos2=0
isphos3=0
for i=1,N
 if month=1
    iphos1=iphos+1
    iphos1(iphos1)=i

an so on to call out the months into there own arrays (unless there is a
way I can wrap it into the next automation)

Next: I would like to run a simple linear regression combining each of
the months 1 by 1:
for instance I want to run a regression on a combined model from months
1 and 2 and a dummy model for 1 and 2, compare them using a Chi-sq
distribution, if Chi-sq is less than the Critical value, we accept and
go on to test another set of models with both 1 and 2. If it rejects,
then we proceed to months 2 and 3.  If we move on to the second set on
months 1 and 2, and the critical value is accepted, I want to print an
accept or reject and move on to months 2 and 3, until finally comparing
months 12-1 at the end.
Is there a way to loop or automate this in R?

Thanks
Meredith

--
View this message in context: http://r.789695.n4.nabble.com/Automating-R-for-Hypothesis-Testing-tp4618653p4620696.html
Sent from the R help mailing list archive at Nabble.com.

meredith

Thu, May 10, 2012 7:40 AM #

Rui-
  Thanks this definitely helps, just one quick question. How would you code
the values of chi-fm and chi-fms to change based on the degrees of freedom
of each model H(i)?

Meredith


Rui Barradas wrote

Hello,

Yes, it does help. Now we can see your data and what you're doing.
What follows is a suggestion on what you could do, not full solution.
(You forgot to say what X1 is, but I don't think it's important to
understand the suggestion.)
(If I'm wrong, say something.)


milwaukeephos <- read.csv("milwaukeephos.csv", header=TRUE,
stringsAsFactors=FALSE)
# list of data.frames, one per month
ls1 <- split(milwaukeephos, milwaukeephos$month)

#--------- if you want to keep the models, not needed if you don't.
#          (yoy probably don't)
modelH <- vector("list", 12)
modelHa <- vector("list", 12)
modelH2 <- vector("list", 12)
modelH2a <- vector("list", 12)
#--------- values to record, these are needed, create them beforehand.
chi_fm <- numeric(12)
chi_fms <- numeric(12)
#
seq_months <- c(1:12, 1) # wrap months around.
for(i in 1:12){
	month_this <- seq_months[i]
	month_next <- seq_months[i + 1]

	lload <- c(ls1[[month_this]]$load_kg, ls1[[month_next]]$load_kg)
	lflow <- c(ls1[[month_this]]$flow, ls1[[month_next]]$flow)
	modelH[[i]] <- lm(lload ~ lflow)
	# If you don't want to keep the models, use modelH only
	# ( without [[i]] )
	# and do the same with X1

	# rest of your code for first test goes here
	chi_fm[i] <- bfm %*% var_fm %*% (bunres_fm - bres_fm)

	# and the same for the second test
	chi_fms[i] <- ...etc...
}


Hope this helps,

Rui Barradas


meredith wrote

dput:  http://r.789695.n4.nabble.com/file/n4620188/milwaukeephos.csv
milwaukeephos.csv 

# Feb-march

modelH_febmarch<-lm(llfeb_march~lffeb_march)
modelHa_febmarch<-lm(llfeb_march~X1feb_mar+lffeb_march)
anova(modelHa_febmarch)
coefficients(modelH_febmarch)

(Intercept) lffeb_march 
  -2.429890    1.172821

coefficients(modelHa_febmarch)

(Intercept)   X1feb_mar lffeb_march 
 -2.8957776  -0.5272793   1.3016303

bres_fm<-matrix(c(-2.429890,0,1.172821),nrow=3)
bunres_fm<-matrix(c(-2.8957776,-0.5272793,1.3016303),nrow=3)
bfm<-t(bunres_fm-bres_fm)
fmvect<-seq(1,1,length=34)
X1a_febmar<-seq(0,0,length=9) # dummy variable step 1
X1b_febmar<-seq(1,1,length=25) # dummy variable step 2
X1feb_mar<-c(X1a_febmar,X1b_febmar) #dummy variable creation

# Test Stat Equation for Chisq

fmxx<-cbind(fmvect,X1feb_mar,lffeb_march)
tfmx<-t(fmxx)
xcom_fm<-(tfmx %*% fmxx)
xinv_fm<-ginv(xcom_fm)
var_fm<-xinv_fm*0.307
chi_fm<-bfm %*% var_fm %*% (bunres_fm-bres_fm)
chi_fm # chisq value for recording

if less than CV move onto to slope modification

modelH2_febmarch<-lm(llfeb_march~X3feb_march)
modelH2a_febmarch<-lm(llfeb_march~X3feb_march+X4feb_march)
anova(modelH2a_febmarch)
coefficients(modelH2_febmarch) # get coefficients to make beta vectors
for test

(Intercept) X3feb_march 
   5.342130    1.172821

coefficients(modelH2a_febmarch)

(Intercept) X3feb_march X4feb_march 
  5.2936263   1.0353752   0.2407557 
# Test Stat

bsres_fm<-matrix(c(5.342130,1.172821,0),nrow=3)
bsunres_fm<-matrix(c(5.2936263,1.0353752,0.2407557),nrow=3)
bsfm<-t(bsunres_fm-bsres_fm)
#X matrix
fmxs<-cbind(fmvect,X3feb_march,X4feb_march)
tfmxs<-t(fmxs)
xcoms_fm<-(tfmxs %*% fmxs)
xinvs_fm<-ginv(xcoms_fm)
var_fms<-xinvs_fm*0.341
chi_fms<-bsfm %*% var_fms %*% (bsunres_fm-bsres_fm)
chi_fms

# Record Chisq value

Does this help?
Here lffeb_march is the combination of Feb and March log flows
and llfeb_march is the combination of Feb and March log loads
X3: lffeb_march-mean(feb_march)
X4: X1*X3

Thanks

Rui Barradas wrote

Hello,

I'm not at all sure if I understand your problem. Does this describe it?


test first model for months 1 and 2
if test statistic less than critical value{
	test second model for months 1 and 2
	print results of the first and second tests? just one of them?
}
move on to months 2 and 3
etc, until months 12 and 1


Please post example data using dput(dataset).
Just copy it's output and paste it in your post.

And example code, what you're already doing.
(Possibly simplified)

Rui Barradas


meredith wrote

R Users-
  I have been trying to automate a manual code that I have developed
for calling in a .csv file, isolating certain rows and columns that
correspond to specified months:
something to the effect
i=name.csv
N=length(i$month)
iphos1=0
iphos2=0
isphos3=0
for i=1,N
 if month=1
    iphos1=iphos+1
    iphos1(iphos1)=i

an so on to call out the months into there own arrays (unless there is
a way I can wrap it into the next automation)

Next: I would like to run a simple linear regression combining each of
the months 1 by 1:
for instance I want to run a regression on a combined model from months
1 and 2 and a dummy model for 1 and 2, compare them using a Chi-sq
distribution, if Chi-sq is less than the Critical value, we accept and
go on to test another set of models with both 1 and 2. If it rejects,
then we proceed to months 2 and 3.  If we move on to the second set on
months 1 and 2, and the critical value is accepted, I want to print an
accept or reject and move on to months 2 and 3, until finally comparing
months 12-1 at the end.
Is there a way to loop or automate this in R?

Thanks
Meredith

Rui Barradas wrote

Hello,

I'm not at all sure if I understand your problem. Does this describe it?


test first model for months 1 and 2
if test statistic less than critical value{
	test second model for months 1 and 2
	print results of the first and second tests? just one of them?
}
move on to months 2 and 3
etc, until months 12 and 1


Please post example data using dput(dataset).
Just copy it's output and paste it in your post.

And example code, what you're already doing.
(Possibly simplified)

Rui Barradas


meredith wrote

R Users-
  I have been trying to automate a manual code that I have developed
for calling in a .csv file, isolating certain rows and columns that
correspond to specified months:
something to the effect
i=name.csv
N=length(i$month)
iphos1=0
iphos2=0
isphos3=0
for i=1,N
 if month=1
    iphos1=iphos+1
    iphos1(iphos1)=i

an so on to call out the months into there own arrays (unless there is
a way I can wrap it into the next automation)

Next: I would like to run a simple linear regression combining each of
the months 1 by 1:
for instance I want to run a regression on a combined model from months
1 and 2 and a dummy model for 1 and 2, compare them using a Chi-sq
distribution, if Chi-sq is less than the Critical value, we accept and
go on to test another set of models with both 1 and 2. If it rejects,
then we proceed to months 2 and 3.  If we move on to the second set on
months 1 and 2, and the critical value is accepted, I want to print an
accept or reject and move on to months 2 and 3, until finally comparing
months 12-1 at the end.
Is there a way to loop or automate this in R?

Thanks
Meredith

--
View this message in context: http://r.789695.n4.nabble.com/Automating-R-for-Hypothesis-Testing-tp4618653p4623689.html
Sent from the R help mailing list archive at Nabble.com.

Rui Barradas

Thu, May 10, 2012 8:30 AM #

Hello,

I'm glad it helped.

As for your second question, I don't know, but I'm not very comfortable with
the way you're doing things.
Why subtract the coefficients of model 1 from model 2?
And why the dummy? Why set model 1 to zero?

Isn't it better to use anova's F? After all, it's designed for it, for the
linear model...
And if you really want/need the dummy, wouldn't a nested anova do it? (F
statistic, once again.)

anova(model1, model2)

is simple and statistically speaking seems to me much better. (I specially
don't like the subtraction bit.)

Rui Barradas

meredith wrote

Rui-
  Thanks this definitely helps, just one quick question. How would you
code the values of chi-fm and chi-fms to change based on the degrees of
freedom of each model H(i)?

Meredith


Rui Barradas wrote

Hello,

Yes, it does help. Now we can see your data and what you're doing.
What follows is a suggestion on what you could do, not full solution.
(You forgot to say what X1 is, but I don't think it's important to
understand the suggestion.)
(If I'm wrong, say something.)


milwaukeephos <- read.csv("milwaukeephos.csv", header=TRUE,
stringsAsFactors=FALSE)
# list of data.frames, one per month
ls1 <- split(milwaukeephos, milwaukeephos$month)

#--------- if you want to keep the models, not needed if you don't.
#          (yoy probably don't)
modelH <- vector("list", 12)
modelHa <- vector("list", 12)
modelH2 <- vector("list", 12)
modelH2a <- vector("list", 12)
#--------- values to record, these are needed, create them beforehand.
chi_fm <- numeric(12)
chi_fms <- numeric(12)
#
seq_months <- c(1:12, 1) # wrap months around.
for(i in 1:12){
	month_this <- seq_months[i]
	month_next <- seq_months[i + 1]

	lload <- c(ls1[[month_this]]$load_kg, ls1[[month_next]]$load_kg)
	lflow <- c(ls1[[month_this]]$flow, ls1[[month_next]]$flow)
	modelH[[i]] <- lm(lload ~ lflow)
	# If you don't want to keep the models, use modelH only
	# ( without [[i]] )
	# and do the same with X1

	# rest of your code for first test goes here
	chi_fm[i] <- bfm %*% var_fm %*% (bunres_fm - bres_fm)

	# and the same for the second test
	chi_fms[i] <- ...etc...
}


Hope this helps,

Rui Barradas


meredith wrote

dput:  http://r.789695.n4.nabble.com/file/n4620188/milwaukeephos.csv
milwaukeephos.csv 

# Feb-march

modelH_febmarch<-lm(llfeb_march~lffeb_march)
modelHa_febmarch<-lm(llfeb_march~X1feb_mar+lffeb_march)
anova(modelHa_febmarch)
coefficients(modelH_febmarch)

(Intercept) lffeb_march 
  -2.429890    1.172821

coefficients(modelHa_febmarch)

(Intercept)   X1feb_mar lffeb_march 
 -2.8957776  -0.5272793   1.3016303

bres_fm<-matrix(c(-2.429890,0,1.172821),nrow=3)
bunres_fm<-matrix(c(-2.8957776,-0.5272793,1.3016303),nrow=3)
bfm<-t(bunres_fm-bres_fm)
fmvect<-seq(1,1,length=34)
X1a_febmar<-seq(0,0,length=9) # dummy variable step 1
X1b_febmar<-seq(1,1,length=25) # dummy variable step 2
X1feb_mar<-c(X1a_febmar,X1b_febmar) #dummy variable creation

# Test Stat Equation for Chisq

fmxx<-cbind(fmvect,X1feb_mar,lffeb_march)
tfmx<-t(fmxx)
xcom_fm<-(tfmx %*% fmxx)
xinv_fm<-ginv(xcom_fm)
var_fm<-xinv_fm*0.307
chi_fm<-bfm %*% var_fm %*% (bunres_fm-bres_fm)
chi_fm # chisq value for recording

if less than CV move onto to slope modification

modelH2_febmarch<-lm(llfeb_march~X3feb_march)
modelH2a_febmarch<-lm(llfeb_march~X3feb_march+X4feb_march)
anova(modelH2a_febmarch)
coefficients(modelH2_febmarch) # get coefficients to make beta vectors
for test

(Intercept) X3feb_march 
   5.342130    1.172821

coefficients(modelH2a_febmarch)

(Intercept) X3feb_march X4feb_march 
  5.2936263   1.0353752   0.2407557 
# Test Stat

bsres_fm<-matrix(c(5.342130,1.172821,0),nrow=3)
bsunres_fm<-matrix(c(5.2936263,1.0353752,0.2407557),nrow=3)
bsfm<-t(bsunres_fm-bsres_fm)
#X matrix
fmxs<-cbind(fmvect,X3feb_march,X4feb_march)
tfmxs<-t(fmxs)
xcoms_fm<-(tfmxs %*% fmxs)
xinvs_fm<-ginv(xcoms_fm)
var_fms<-xinvs_fm*0.341
chi_fms<-bsfm %*% var_fms %*% (bsunres_fm-bsres_fm)
chi_fms

# Record Chisq value

Does this help?
Here lffeb_march is the combination of Feb and March log flows
and llfeb_march is the combination of Feb and March log loads
X3: lffeb_march-mean(feb_march)
X4: X1*X3

Thanks

Rui Barradas wrote

Hello,

I'm not at all sure if I understand your problem. Does this describe
it?


test first model for months 1 and 2
if test statistic less than critical value{
	test second model for months 1 and 2
	print results of the first and second tests? just one of them?
}
move on to months 2 and 3
etc, until months 12 and 1


Please post example data using dput(dataset).
Just copy it's output and paste it in your post.

And example code, what you're already doing.
(Possibly simplified)

Rui Barradas


meredith wrote

R Users-
  I have been trying to automate a manual code that I have developed
for calling in a .csv file, isolating certain rows and columns that
correspond to specified months:
something to the effect
i=name.csv
N=length(i$month)
iphos1=0
iphos2=0
isphos3=0
for i=1,N
 if month=1
    iphos1=iphos+1
    iphos1(iphos1)=i

an so on to call out the months into there own arrays (unless there is
a way I can wrap it into the next automation)

Next: I would like to run a simple linear regression combining each of
the months 1 by 1:
for instance I want to run a regression on a combined model from
months 1 and 2 and a dummy model for 1 and 2, compare them using a
Chi-sq distribution, if Chi-sq is less than the Critical value, we
accept and go on to test another set of models with both 1 and 2. If
it rejects, then we proceed to months 2 and 3.  If we move on to the
second set on months 1 and 2, and the critical value is accepted, I
want to print an accept or reject and move on to months 2 and 3, until
finally comparing months 12-1 at the end.
Is there a way to loop or automate this in R?

Thanks
Meredith

Rui Barradas wrote

Hello,

I'm not at all sure if I understand your problem. Does this describe
it?


test first model for months 1 and 2
if test statistic less than critical value{
	test second model for months 1 and 2
	print results of the first and second tests? just one of them?
}
move on to months 2 and 3
etc, until months 12 and 1


Please post example data using dput(dataset).
Just copy it's output and paste it in your post.

And example code, what you're already doing.
(Possibly simplified)

Rui Barradas


meredith wrote

R Users-
  I have been trying to automate a manual code that I have developed
for calling in a .csv file, isolating certain rows and columns that
correspond to specified months:
something to the effect
i=name.csv
N=length(i$month)
iphos1=0
iphos2=0
isphos3=0
for i=1,N
 if month=1
    iphos1=iphos+1
    iphos1(iphos1)=i

an so on to call out the months into there own arrays (unless there is
a way I can wrap it into the next automation)

Next: I would like to run a simple linear regression combining each of
the months 1 by 1:
for instance I want to run a regression on a combined model from
months 1 and 2 and a dummy model for 1 and 2, compare them using a
Chi-sq distribution, if Chi-sq is less than the Critical value, we
accept and go on to test another set of models with both 1 and 2. If
it rejects, then we proceed to months 2 and 3.  If we move on to the
second set on months 1 and 2, and the critical value is accepted, I
want to print an accept or reject and move on to months 2 and 3, until
finally comparing months 12-1 at the end.
Is there a way to loop or automate this in R?

Thanks
Meredith

--
View this message in context: http://r.789695.n4.nabble.com/Automating-R-for-Hypothesis-Testing-tp4618653p4623790.html
Sent from the R help mailing list archive at Nabble.com.

5 days later

meredith

Wed, May 16, 2012 6:34 AM #

Rui-
  Just a quick question. I understand your comment on using ANOVA, but
doesn't this only test for similarities of the mean. We are trying to see if
a the same model can fit for two or three months, therefore have the similar
slope and intercept. The ANOVA would only do one part of this correct, with
the F-Test?

Thanks!
Meredith

Rui Barradas wrote

Hello,

I'm glad it helped.

As for your second question, I don't know, but I'm not very comfortable
with the way you're doing things.
Why subtract the coefficients of model 1 from model 2?
And why the dummy? Why set model 1 to zero?

Isn't it better to use anova's F? After all, it's designed for it, for the
linear model...
And if you really want/need the dummy, wouldn't a nested anova do it? (F
statistic, once again.)

anova(model1, model2)

is simple and statistically speaking seems to me much better. (I specially
don't like the subtraction bit.)

Rui Barradas

meredith wrote

Rui-
  Thanks this definitely helps, just one quick question. How would you
code the values of chi-fm and chi-fms to change based on the degrees of
freedom of each model H(i)?

Meredith


Rui Barradas wrote

Hello,

Yes, it does help. Now we can see your data and what you're doing.
What follows is a suggestion on what you could do, not full solution.
(You forgot to say what X1 is, but I don't think it's important to
understand the suggestion.)
(If I'm wrong, say something.)


milwaukeephos <- read.csv("milwaukeephos.csv", header=TRUE,
stringsAsFactors=FALSE)
# list of data.frames, one per month
ls1 <- split(milwaukeephos, milwaukeephos$month)

#--------- if you want to keep the models, not needed if you don't.
#          (yoy probably don't)
modelH <- vector("list", 12)
modelHa <- vector("list", 12)
modelH2 <- vector("list", 12)
modelH2a <- vector("list", 12)
#--------- values to record, these are needed, create them beforehand.
chi_fm <- numeric(12)
chi_fms <- numeric(12)
#
seq_months <- c(1:12, 1) # wrap months around.
for(i in 1:12){
	month_this <- seq_months[i]
	month_next <- seq_months[i + 1]

	lload <- c(ls1[[month_this]]$load_kg, ls1[[month_next]]$load_kg)
	lflow <- c(ls1[[month_this]]$flow, ls1[[month_next]]$flow)
	modelH[[i]] <- lm(lload ~ lflow)
	# If you don't want to keep the models, use modelH only
	# ( without [[i]] )
	# and do the same with X1

	# rest of your code for first test goes here
	chi_fm[i] <- bfm %*% var_fm %*% (bunres_fm - bres_fm)

	# and the same for the second test
	chi_fms[i] <- ...etc...
}


Hope this helps,

Rui Barradas


meredith wrote

dput:  http://r.789695.n4.nabble.com/file/n4620188/milwaukeephos.csv
milwaukeephos.csv 

# Feb-march

modelH_febmarch<-lm(llfeb_march~lffeb_march)
modelHa_febmarch<-lm(llfeb_march~X1feb_mar+lffeb_march)
anova(modelHa_febmarch)
coefficients(modelH_febmarch)

(Intercept) lffeb_march 
  -2.429890    1.172821

coefficients(modelHa_febmarch)

(Intercept)   X1feb_mar lffeb_march 
 -2.8957776  -0.5272793   1.3016303

bres_fm<-matrix(c(-2.429890,0,1.172821),nrow=3)
bunres_fm<-matrix(c(-2.8957776,-0.5272793,1.3016303),nrow=3)
bfm<-t(bunres_fm-bres_fm)
fmvect<-seq(1,1,length=34)
X1a_febmar<-seq(0,0,length=9) # dummy variable step 1
X1b_febmar<-seq(1,1,length=25) # dummy variable step 2
X1feb_mar<-c(X1a_febmar,X1b_febmar) #dummy variable creation

# Test Stat Equation for Chisq

fmxx<-cbind(fmvect,X1feb_mar,lffeb_march)
tfmx<-t(fmxx)
xcom_fm<-(tfmx %*% fmxx)
xinv_fm<-ginv(xcom_fm)
var_fm<-xinv_fm*0.307
chi_fm<-bfm %*% var_fm %*% (bunres_fm-bres_fm)
chi_fm # chisq value for recording

if less than CV move onto to slope modification

modelH2_febmarch<-lm(llfeb_march~X3feb_march)
modelH2a_febmarch<-lm(llfeb_march~X3feb_march+X4feb_march)
anova(modelH2a_febmarch)
coefficients(modelH2_febmarch) # get coefficients to make beta vectors
for test

(Intercept) X3feb_march 
   5.342130    1.172821

coefficients(modelH2a_febmarch)

(Intercept) X3feb_march X4feb_march 
  5.2936263   1.0353752   0.2407557 
# Test Stat

bsres_fm<-matrix(c(5.342130,1.172821,0),nrow=3)
bsunres_fm<-matrix(c(5.2936263,1.0353752,0.2407557),nrow=3)
bsfm<-t(bsunres_fm-bsres_fm)
#X matrix
fmxs<-cbind(fmvect,X3feb_march,X4feb_march)
tfmxs<-t(fmxs)
xcoms_fm<-(tfmxs %*% fmxs)
xinvs_fm<-ginv(xcoms_fm)
var_fms<-xinvs_fm*0.341
chi_fms<-bsfm %*% var_fms %*% (bsunres_fm-bsres_fm)
chi_fms

# Record Chisq value

Does this help?
Here lffeb_march is the combination of Feb and March log flows
and llfeb_march is the combination of Feb and March log loads
X3: lffeb_march-mean(feb_march)
X4: X1*X3

Thanks

Rui Barradas wrote

Hello,

I'm not at all sure if I understand your problem. Does this describe
it?


test first model for months 1 and 2
if test statistic less than critical value{
	test second model for months 1 and 2
	print results of the first and second tests? just one of them?
}
move on to months 2 and 3
etc, until months 12 and 1


Please post example data using dput(dataset).
Just copy it's output and paste it in your post.

And example code, what you're already doing.
(Possibly simplified)

Rui Barradas


meredith wrote

R Users-
  I have been trying to automate a manual code that I have developed
for calling in a .csv file, isolating certain rows and columns that
correspond to specified months:
something to the effect
i=name.csv
N=length(i$month)
iphos1=0
iphos2=0
isphos3=0
for i=1,N
 if month=1
    iphos1=iphos+1
    iphos1(iphos1)=i

an so on to call out the months into there own arrays (unless there
is a way I can wrap it into the next automation)

Next: I would like to run a simple linear regression combining each
of the months 1 by 1:
for instance I want to run a regression on a combined model from
months 1 and 2 and a dummy model for 1 and 2, compare them using a
Chi-sq distribution, if Chi-sq is less than the Critical value, we
accept and go on to test another set of models with both 1 and 2. If
it rejects, then we proceed to months 2 and 3.  If we move on to the
second set on months 1 and 2, and the critical value is accepted, I
want to print an accept or reject and move on to months 2 and 3,
until finally comparing months 12-1 at the end.
Is there a way to loop or automate this in R?

Thanks
Meredith

Rui Barradas wrote

Hello,

I'm not at all sure if I understand your problem. Does this describe
it?


test first model for months 1 and 2
if test statistic less than critical value{
	test second model for months 1 and 2
	print results of the first and second tests? just one of them?
}
move on to months 2 and 3
etc, until months 12 and 1


Please post example data using dput(dataset).
Just copy it's output and paste it in your post.

And example code, what you're already doing.
(Possibly simplified)

Rui Barradas


meredith wrote

R Users-
  I have been trying to automate a manual code that I have developed
for calling in a .csv file, isolating certain rows and columns that
correspond to specified months:
something to the effect
i=name.csv
N=length(i$month)
iphos1=0
iphos2=0
isphos3=0
for i=1,N
 if month=1
    iphos1=iphos+1
    iphos1(iphos1)=i

an so on to call out the months into there own arrays (unless there
is a way I can wrap it into the next automation)

Next: I would like to run a simple linear regression combining each
of the months 1 by 1:
for instance I want to run a regression on a combined model from
months 1 and 2 and a dummy model for 1 and 2, compare them using a
Chi-sq distribution, if Chi-sq is less than the Critical value, we
accept and go on to test another set of models with both 1 and 2. If
it rejects, then we proceed to months 2 and 3.  If we move on to the
second set on months 1 and 2, and the critical value is accepted, I
want to print an accept or reject and move on to months 2 and 3,
until finally comparing months 12-1 at the end.
Is there a way to loop or automate this in R?

Thanks
Meredith

--
View this message in context: http://r.789695.n4.nabble.com/Automating-R-for-Hypothesis-Testing-tp4618653p4630243.html
Sent from the R help mailing list archive at Nabble.com.