Skip to content

Rsquared for anova

9 messages · Dorien Herremans, Dieter Menne, Peter Ehlers +1 more

#
On 2011-04-15 14:36, Dorien Herremans wrote:
You might find working through "An Introduction to R"
enlightening. It's certain to be a more efficient
method than a guess-and-hope approach to modeling
syntax.
Why do you have the extra parentheses? They cause lm()
to think that _everything_ inside the inner parens is
the 'formula' argument to lm(), including the ", data=..."
part.
Well, now you've left off the 'data=' argument and R can't
find 'tos'. Isn't the message pretty clear?
Are you seriously contemplating up to 10-way interactions?
I hope that you have a great deal of data and much patience
as you attempt to interpret those interactions.

Peter Ehlers
#
dorien wrote:
Peter's point is the important one: too many interactions, and even with +
instead of * you might be running into problems.

But anyway: if you don't let us access 

/home/dorien/UA/meta-music/optimuse/optimuse1-build-desktop/results/results_processedCP

you cannot expect a better answer which will depend on the structure of the
data set.

Dieter



--
View this message in context: http://r.789695.n4.nabble.com/Rsquared-for-anova-tp3452399p3453719.html
Sent from the R help mailing list archive at Nabble.com.
1 day later
#
Thanks for your remarks. I've been reading about R for the last two days,
but I don't really get when I should use lm or aov.

I have attached the dataset, feel free to take a look at it.

So far, running it with alle the combinations did not take too long and
there seem to be some effects between the parameters. However, 2x2
combinations might suffice.

Thanks for any help, or a pointer to some good documentation,

Dorien
On 16 April 2011 10:13, Dieter Menne <dieter.menne at menne-biomed.de> wrote:

            

  
    
#
( did this msg make it through the lists as rich text? hotmail
didn't seem to think it was plain text?)

Anyway, having come in in the middle of this it isn't clear
if your issues are with R or stats or both. Usually the hard
core stats people punt the stats questions to other places but
both can be addressed somewhat.
In any case, exploratory work is a good way to learn both and I 
always like looking at new data. If you have one or
a few dependent variable and many independent variable,
it would probably help if you could visualize a
surface with the response as a function of the input
variables and then, maybe with the input of prior information or
anecdotes, you have some idea what tests or
analyses would make sense. 

just some thoughts "for illustration only"

df<-read.table("results_processedCP.txt",header=T)


first it helps to make sure everything went ok and do quick
checks, for example, 

str(df)
unique(df$nh1)
unique(df$nh2)
unique(df$nh3)
unique(df$randsize)
unique(df$aweoghts)
unique(df$aweights)


now personally lots of binary variable confuse me and
I can munge them all together since I expect I can
later identify issues in following plots. So, with
this data you can create a composite variable like this,
( now I have not checked any of this for accuracy
and typos and other problems may render the results useless)

x=df$nh1+2*df$nh2+4*df$nh3+2*df$randsize+32*df$aweights
df2<-cbind(df,x)
str(df2)

not sure if "time" was an input or output but you could
see if there is any obvious trend or periodicity of
time with your new made up variable,

plot(df2$time,df2$x)

Apparently x is a num rather than int, it can be changed for illustration
but probably of no consequence,

xi=as.integer(x)
str(xi)

and then you can add color based on this varaiable, 

min(xi)
c=rainbow(56)
cx=c[xi+1]
str(cx)

and make color coded scatter plots. Now, if you 
got lucky and guessed right you may see some patterns
that you want to test, 

plot(df2$tos,df2$tws,col=cx)

in this case, I get a cool red-yellow-green line along bottom ( very
compelling linear fit question ) and scattered magenta( pink red? LOL ) and blue points
everywhere with cluster near origin and nothing in top right quadrant. 
Also note a few blues lines above the red-green-yellow line but much shorter.

And in fact, presumably you already knew this as it looks like it was designed
in, if you just plot the red and green points the fit looks perfect for linear,
now if you look at results of fit of "Good" points vs all points,
it isn't clear that anything like this would emerge from just
looking at summaries of a linear fit, 


td=df2$tos[good]
ti=df2$tws[good]
lm(td~ti)
lm(df2$tos~ df2$tws)
summary(lm(td~ti))
summary(lm(df2$tos~ df2$tws))





Now of course "tests" need to be considered ahead of time or else
it is easy to go shopping for the answer you want. Anything post hoc
needs to be very complete and you should at least try to rationalize
test results you don't happen to like ( assuming you are trying to understand
the system from which the data was measured rather than justify some
particular outcome). 




Date: Sun, 17 Apr 2011 11:34:14 +0200
From: dorien.herremans at ua.ac.be
To: dieter.menne at menne-biomed.de
CC: r-help at r-project.org
Subject: Re: [R] Rsquared for anova

Thanks for your remarks. I've been reading about R for the last two days,
but I don't really get when I should use lm or aov.
 
I have attached the dataset, feel free to take a look at it.
 
So far, running it with alle the combinations did not take too long and
there seem to be some effects between the parameters. However, 2x2
combinations might suffice.
 
Thanks for any help, or a pointer to some good documentation,
 
Dorien
On 16 April 2011 10:13, Dieter Menne <dieter.menne at menne-biomed.de> wrote:

            

  
    
#
On 2011-04-17 02:34, Dorien Herremans wrote:
I don't think that reading about R is the answer at this stage.
It appears to me that you need to learn more about regression.
There are many good introductory books. If you want to learn
the R way at the same time, you could look at the books section
on CRAN. Perhaps Peter Dalgaard's Intro to Stats with R or
An R Companion to Applied Regression by J. Fox and S. Weisberg,
or the books by Verzani or Heiberger/Holland.

After that, you'll find that the R documentation is actually
quite good. Most complaints about R's documentation seem to
amount to complaints that it doesn't teach statistics. That's
a good thing.

About your data: I'm fairly sure that several, if not most, of
your predictors should be factors.

Peter Ehlers
#
Thanks everyone.

Yes Peter, I already added nh1=factor(nh1) to the 'routine'. Mostly,
my collegues are helping me work out the results and they know more
about regression, it has been a while for me...  They just asked if I
could also provide an R2, to see how good the model fits... therefore
the question. I already have the P values for each factor. I tought it
might be a simple command that I overlooked, such as summary(fit) :-)

Mike, I will study what you propose first thing tomorrow morning when
I am back at the office!

Thanks a lot,

Dorien
On 17 April 2011 19:43, Peter Ehlers <ehlers at ucalgary.ca> wrote: