Skip to content

Teaching statistics to ecology undergraduates

3 messages · Carsten Dormann, Graham Smith

#
If, like me, you have a only a few hours (11 hours over 3 years in my 
case) to try and teach statistics to ecology undergraduates, how do you 
do it?

Any introductory statistics text seems to assume, more time and more 
mathematical ability than in practice is available.

Although, I emphasise graphical techniques and the use of confidence 
intervals, and how these might help understand the ecological process 
being looked at, I still spend a large chunk of precious time on 
hypothesis testing.

The more I have been thinking about this, and the more I search for a 
suitable text book, the more I realise how hopelessly confusing the 
average text book is, with t-tests, anova, manova, ancova, OLS 
regression, poission regression, logistic regression GLM etc. Yes I know 
that all of these may well not appear in the average introductory text.

I have reached the stage  where I am wondering whether I should just 
teach GLM. This would give the students a single flexible method capable 
of tackling a wide range of ecological problems. It would also,I think, 
provide a  better framework for approaching ecological questions than 
simple hypothesis testing.

I admit, that this email is really just me thinking out loud, but does 
anyone who teaches statistics to ecologists, or indeed anyone at all 
really, have any views about how best to spend my 11 hours (which I may 
be able to increase 13 hours).

I should point out that at the moment I also spend some of this time on 
good practice in data management, a bit on scientific method, and a bit 
on the importance of random sampling, but nothing really on experimental 
design.

Graham
#
Dear Graham,

11 hours is short - there's no mistaking. I teach (among other things) a 
6 day stats course for beginners, and find that I need the first 3 days 
to get the student to "think straight". I tried for a couple of years to 
teach "only" GLM, as you suggested. I "waste" one full day of explaining 
what a distribution is, what parameters of distributions are and on what 
ground to suspect data to be derived from a certain distribution. That 
would be at least 5 of your 11 hours. The next half day goes into 
explaining (and running examples) on likelihood and its maximisation. It 
is a good way to start, I find, and eventually students are very 
comfortable using glm rather than aov and friends.
Using only GLM is clear (and Ben Bolker's book sets the right tone, 
albeit at a much too high level for beginners). At the same time, the 
learning curve is VERY steep. 30% of the participants fall by the 
wayside. Is that acceptable? Maybe it is me, not the GLM.

However, I think you have to be very realistic about what you can 
achieve (and I have heard speaking highly of your courses, so I am sure 
you are doing the right things). Giving the students a "feeling" about 
what the idea of a "fit" is and what is behind comparisons of samples is 
rather independent of distributional assumptions and a very general 
point they can take away from a short course.
Also, as you said, visualising the data, getting a feeling for it, is SO 
important, particular when a student has little idea what to expect from 
an experiment/observation.

I my little 6 day course, I spend roughly 2 days on introducing R, 
distributions and (maximum) likelihood (half of this time the 
participants run examples). Another 2 days are devoted to multiple 
regression (going wildly through different distributions to make them 
comfortable with GLM) and issues such as collinearity and model 
selection. Then I throw in a day of design of experiments (randomised 
block, nested, split, survey design, stratification, sample size 
estimation) and run some simple (?) mixed models to illustrate the 
practical problems attached to DOE. The final two days we run largish 
examples (such as Harrell's Titanic data set), touch very superficially 
multivariate methods (PCA, CA and CCA) and end up with some 
miscellaneous issues such as randomisation and bootstrapping.

If I had to reduce it to 11 hours: Unless the students are likely to do 
experiments (which seems to have fallen out of funding), I would ditch 
DOE and focus on GLM plus a few sexy  but tricky examples. I love the 
Titanic study, because you can get the students to identify with the 
passengers. If that leads them to transfer their newly gained knowledge 
to the ecological work is a different question. If you additionally make 
the buy a good book (I always recommend Quinn & Keough, having myself 
been "raised" on Sokal and Rohlf and always hated it, because it never 
addressed my type of non-Gaussian problems) I think they should be set 
up for the next level.

I shall stop now (and prepare some stats course next week), otherwise I 
would also have a word to say about Crawley's approach, which I find 
enchanting and confusing.

Carsten
Graham Smith wrote:

  
    
#
Carsten
It is useful to know that you had 6 days
I suspect that regardless of what you do, a percentage are just never 
going to get it.
My guess is that you are speaking of the Highstat courses, which I'm not 
actually involved with, I'm asking about the basic undergrad intro to 
stats lectures.
My main aim, I think, is that they gain some understanding of 
statistical thinking rather than specific techniques, which most will 
forget almost immediately.
Yes, I strongly agree with this.
In my case, the stats is part of broader module, which focusses on a 
woodland ecology study, where the students collect vegetation data, and 
carry out some soil analysis. I introduce the stats, they then complete 
a stats exercise, I give them feedback on this, and they are then meant 
to apply the stats to help explain the vegetation characteristics found 
in the two woods they sampled.
Yes, a wonderful book.
:-)

Graham