I need your thoughts on teaching with R - R-SIG-teaching

Mon, Mar 9, 2009 2:09 PM #

I should have noted this at the beginning of the thread rather than
now, but forthe record, there is an R special-interest-group mailing
list called R-sig-teaching where this might also be of interest.

albyn

On Fri, Mar 06, 2009 at 11:28:41AM -0600, Andrew Zieffler wrote:

Hello Everyone,

I hope this email finds you all well. I have been asked to write a paper
that discusses some suggested practices based on learning theory and
cognition research for using R in teaching statistics. In thinking about
framing this paper I have been considering all of the instructional
choices that have to be made. For example, should one use the base
graphics, lattice, ggplots, etc? Should there be instructional sessions
just devoted to R or should it be completely integrated and students
introduced to functions and the like as they need it? What additional
supplemental materials should be made available to students to help them
learn R? And there are many more of these types of questions and
decisions that need to be made.

As I have looked at many of the texts that have incorporated R they all
seem to have a similar approach of introducing simple operators such as
addition, subtraction, etc Then moving to assignment; the idea of
vectors; functions etc. It is unclear to me if there is a reason for
this pattern or if it is based on tradition? Maybe this lends itself to
developing better skills for students who will go on and do more
programming in R, but --- at least in our courses --- there are also a
host of students who will only ever use R as a data analysis tool.

All of this is a very long-winded way of asking for your help. I would
love to hear your thoughts on the following:

1) What are the instructional decisions that a person needs to make if
they are going to be teaching statistics using R?
2) What decisions have you yourself made and what were your reasons?
3) How do you teach with R? Do you have sessions on R and other sessions
where content is taught? Is the computing fully integrated with the
content? Or some combination?
4) If you have the heterogeneous group of students (some going on to
program in R, others just trying to get through, etc.) how do we deal
with this? Do we need to have different types of assignments and
materials for the different students?

Thank you in advance.
Andy

--
Andrew Zieffler, Ph.D.
Educational Psychology
University of Minnesota
167 Educational Sciences Building
56 East River Road
Minneapolis, MN 55455
Email: zief0002 at umn.edu
http://www.tc.umn.edu/~zief0002

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Workshop on Integrating Computing into the Statistics Curricula" group.
To post to this group, send email to computing-statistics-curricula at googlegroups.com
To unsubscribe from this group, send email to computing-statistics-curricula+unsubscribe at googlegroups.com
For more options, visit this group at http://groups.google.com/group/computing-statistics-curricula?hl=en
-~----------~----~----~----~------~----~------~--~---

markus kossner

Tue, Mar 10, 2009 1:07 AM #

Albyn Jones wrote:

I should have noted this at the beginning of the thread rather than
now, but forthe record, there is an R special-interest-group mailing
list called R-sig-teaching where this might also be of interest.

albyn

On Fri, Mar 06, 2009 at 11:28:41AM -0600, Andrew Zieffler wrote:

Hello Everyone,

I hope this email finds you all well. I have been asked to write a paper
that discusses some suggested practices based on learning theory and
cognition research for using R in teaching statistics. In thinking about
framing this paper I have been considering all of the instructional
choices that have to be made. For example, should one use the base
graphics, lattice, ggplots, etc? Should there be instructional sessions
just devoted to R or should it be completely integrated and students
introduced to functions and the like as they need it? What additional
supplemental materials should be made available to students to help them
learn R? And there are many more of these types of questions and
decisions that need to be made.

As I have looked at many of the texts that have incorporated R they all
seem to have a similar approach of introducing simple operators such as
addition, subtraction, etc Then moving to assignment; the idea of
vectors; functions etc. It is unclear to me if there is a reason for
this pattern or if it is based on tradition? Maybe this lends itself to
developing better skills for students who will go on and do more
programming in R, but --- at least in our courses --- there are also a
host of students who will only ever use R as a data analysis tool.

All of this is a very long-winded way of asking for your help. I would
love to hear your thoughts on the following:

1) What are the instructional decisions that a person needs to make if
they are going to be teaching statistics using R?
2) What decisions have you yourself made and what were your reasons?
3) How do you teach with R? Do you have sessions on R and other sessions
where content is taught? Is the computing fully integrated with the
content? Or some combination?
4) If you have the heterogeneous group of students (some going on to
program in R, others just trying to get through, etc.) how do we deal
with this? Do we need to have different types of assignments and
materials for the different students?

Thank you in advance.
Andy

--
Andrew Zieffler, Ph.D.
Educational Psychology
University of Minnesota
167 Educational Sciences Building
56 East River Road
Minneapolis, MN 55455
Email: zief0002 at umn.edu
http://www.tc.umn.edu/~zief0002

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Workshop on Integrating Computing into the Statistics Curricula" group.
To post to this group, send email to computing-statistics-curricula at googlegroups.com
To unsubscribe from this group, send email to computing-statistics-curricula+unsubscribe at googlegroups.com
For more options, visit this group at http://groups.google.com/group/computing-statistics-curricula?hl=en
-~----------~----~----~----~------~----~------~--~---

_______________________________________________
R-sig-teaching at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-teaching

Dear Andrew,
as it seams to me that you will be interested in teaching data analysis 
and/or Resgression analysis let me point you at
Julian Faraway's excellent and free book  'Practical Regression and 
ANOVA using R' .
Faraway uses defined examples for using the various data analysis tools 
within R and comments on their use, rather than
talking too much about variables, functions...
That might be a good start for you. There's also a package 'faraway' 
that includes the examples... very great work!
You can find all that directly in the CRAN or just google 'faraway book'

Cheers Markus

Derek Ogle

Wed, Mar 11, 2009 7:52 PM #

Andrew,

I teach a intro statistics course to "science" students and some "general education" students and a "biometry" course to "natural resources" and a handful of other students at Northland College.  I have been using R in both of these statistics course, and in my fisheries science course, for four or five years now.  Below are my answers to your questions.  I would be happy to expand on these if you needed me to (though now coming back to re-read I see that I have typed quite a bit).

In general, I don't really think that these decisions are unique to R.  No matter the software I believe that an instructor, especially of an introductory class, has to make a decision of whether learning the software is one of the outcomes of the course or not.  When I taught with other software (Minitab) I chose not to have learning the software as an outcome.  However, when I began using R I decided that at least some understanding of the software should be an outcome because I felt that knowing R was adding value to the student.  I believe that this added value was especially important in my upper-level courses so it became important to me to make sure that students in the intro class were gaining some knowledge of the software (R).

Once an instructor chooses to use R, I believe they must decide whether to use one of the GUIs, whether to use an external editor, which "graphics system" (base, ggplot2, lattice) to use, or whether to use package-specific or base functions.

Of the latter items mentioned above I chose NOT to use a GUI.  I am familiar with RCMDR, for example, but, personally, I think the power of R rests in the command line.  I do use TINN-R as an external editor because I like the ability for students to save their commands and recycle them for future problems.  I have not seen any "cost" to the student of using TINN-R (it is simple to learn).  I chose to use base graphics because of the simplicity (in my mind) of their functions (for doing the basic graphics needed in most intro classes).

I have also written a package of R functions that streamlines some of the base R functions.  For example, I have written a Subset() function that combines the base subset() and drop.levels() so that I don't have to explain to students the subtleties of why subset() does not drop the level from the list of possible levels for a factor variable after subsetting.  I have also written a function that can be used to provide a graphical display of probability calculations on a suite of probability distributions (motivated by a post by Dr. Bates on this list last year).  I did not want to create a large number of special purpose functions so I attempted to judiciously choose functions that simplified complexities or subtleties that I did not want students to be concerned with or that provided specific pedagogical advantages.  <BTW, my package is surely not up to the standards of other package developers but if anyone is interested it is available at www.rforge.net/NCStats.  A newer version using namespace will be up there when my semester is over in April.>  I also use some of the functions in the TeachingDemos package.

Finally, I made the conscious decision of not using the phrases "R programming" or "R coding."  It is my experience that many students do not consider themselves capable of "computer programming."  I explain that the functions are simply replacements for menu'd commands but that they can be saved and reproduced.  At most, I refer to "R scripts" but never "R programs."

At Northland, I teach two 2-hour sessions a week for the intro class.  Generally, I use some portion of this time for a traditional lecture, some portion to teach "doing statistics with R", and then some time for the students to work independently with R and ask me questions.  The traditional lecture and the "doing statistics with R" sections follow each other closely so I would consider them to be "integrated."  In the "doing statistics with R" section I have found it better to provide students with a handout (easily accomplished by the instructor with Sweave()) rather than demonstrating with the computer or letting them type commands into the computer.  The two main advantages to this are that it is easier to keep students in roughly the same spot (if they are typing things in themselves then invariably someone forgot a comma, or misspelled, etc, and you spend all of your time troubleshooting individual students while others sit there or start surfing the net) and it lets the students see a correct set of commands on which they can take detailed notes (if they are typing in commands themselves they are just typing and not thinking and not taking notes that they can return to).

In the upper-level course, I fully integrate R into the notes and lecture.

You can seen what I do, if interested, because all of my class materials ("book", handouts, lecture slides) are available at www.ncfaculty.net/dogle/ and then follow the intro stats, biometry, or fisheries science "buttons."

Again, I don't see this as a specific issue with using R -- i.e., even if we did all calculations by hand there would be heterogeneity amongst those that will continue with other statistics courses and those "just trying to get through."

When I first started teaching with R I was just learning the program myself. I was terrible at teaching with it (and the students hated it) but I believed that it was the correct software to use. Now, five years later, I am a much better teacher of statistics and R and, most of my students either "enjoy" R (I have seen students become more accepting of typing in a command line rather than a GUI -- I believe that this is due to the amount of texting and chatting that they do relative to students from just a few years ago) or are, at least, accepting of the idea of "R". I even more firmly believe that this is the proper software for students to learn. Be firm in your commitment to R if you choose to teach with it but also be patient with yourself and your students during the first few years.

I do spend some time trying to teach students the "lexicon of R" as this makes communication about specific functions easier. For example, my students will know about constructor and extractor functions, named and positional arguments, objects, assignment operators, vectors, data.frames, etc. This has been a "cost" of using R in the sense that it does take time and, thus, "something else" had to go from my curriculum.

I also think that R is especially beneficial for my biometry course which focuses on regression (simple, multiple, and indicator variable), anova (one- and two-way), and logistic regression because the vast majority of these topics use a common set of functions -- lm(), anova(), summary(), confint(), etc. In addition, I have added a few functions (in my NCStats function) to make fitted line plots, residual plots, and to compare all slopes. Thus, students can learn a vast number of topics with very few commands in R. The same can be said in the intro class for t.test() (one-sample, two-sample, matched-pairs) and chisq.test() (goodness-of-fit and general chi-square). This efficiency is very convenient and powerful.

Finally, I have found R to be useful because students "like" free software, open-source concepts (at least at my environmental liberal arts college), and being on the "cutting edge." The more students like in your class the more likely they are to learn.

Dr. Derek H. Ogle
Associate Professor of Mathematical Sciences and Natural Resources
Northland College
1411 Ellis Avenue
Box?112
Ashland, WI
715.682.1300
www.ncfaculty.net/dogle/

Please consider the environment before printing this email.

Douglas Bates

Mon, Mar 30, 2009 2:12 PM #

On Wed, Mar 11, 2009 at 9:52 PM, Derek Ogle <DOgle at northland.edu> wrote:

Andrew,

I teach a intro statistics course to "science" students and some "general education" students and a "biometry" course to "natural resources" and a handful of other students at Northland College. ?I have been using R in both of these statistics course, and in my fisheries science course, for four or five years now. ?Below are my answers to your questions. ?I would be happy to expand on these if you needed me to (though now coming back to re-read I see that I have typed quite a bit).

1) What are the instructional decisions that a person needs to make if they are going to be teaching statistics using R?

In general, I don't really think that these decisions are unique to R. ?No matter the software I believe that an instructor, especially of an introductory class, has to make a decision of whether learning the software is one of the outcomes of the course or not. ?When I taught with other software (Minitab) I chose not to have learning the software as an outcome. ?However, when I began using R I decided that at least some understanding of the software should be an outcome because I felt that knowing R was adding value to the student. ?I believe that this added value was especially important in my upper-level courses so it became important to me to make sure that students in the intro class were gaining some knowledge of the software (R).

Once an instructor chooses to use R, I believe they must decide whether to use one of the GUIs, whether to use an external editor, which "graphics system" (base, ggplot2, lattice) to use, or whether to use package-specific or base functions.

2) What decisions have you yourself made and what were your reasons?

Of the latter items mentioned above I chose NOT to use a GUI. ?I am familiar with RCMDR, for example, but, personally, I think the power of R rests in the command line. ?I do use TINN-R as an external editor because I like the ability for students to save their commands and recycle them for future problems. ?I have not seen any "cost" to the student of using TINN-R (it is simple to learn). ?I chose to use base graphics because of the simplicity (in my mind) of their functions (for doing the basic graphics needed in most intro classes).

I have also written a package of R functions that streamlines some of the base R functions. ?For example, I have written a Subset() function that combines the base subset() and drop.levels() so that I don't have to explain to students the subtleties of why subset() does not drop the level from the list of possible levels for a factor variable after subsetting. ?I have also written a function that can be used to provide a graphical display of probability calculations on a suite of probability distributions (motivated by a post by Dr. Bates on this list last year). ?I did not want to create a large number of special purpose functions so I attempted to judiciously choose functions that simplified complexities or subtleties that I did not want students to be concerned with or that provided specific pedagogical advantages. ?<BTW, my package is surely not up to the standards of other package developers but if anyone is interested it is available at www.rforge.net/NCStats. ?A newer version using namespace will be up there when my semester is over in April.> ?I also use some of the functions in the TeachingDemos package.

Finally, I made the conscious decision of not using the phrases "R programming" or "R coding." ?It is my experience that many students do not consider themselves capable of "computer programming." ?I explain that the functions are simply replacements for menu'd commands but that they can be saved and reproduced. ?At most, I refer to "R scripts" but never "R programs."

3) How do you teach with R? Do you have sessions on R and other sessions where content is taught? Is the computing fully integrated with the content? Or some combinationn?

At Northland, I teach two 2-hour sessions a week for the intro class. ?Generally, I use some portion of this time for a traditional lecture, some portion to teach "doing statistics with R", and then some time for the students to work independently with R and ask me questions. ?The traditional lecture and the "doing statistics with R" sections follow each other closely so I would consider them to be "integrated." ?In the "doing statistics with R" section I have found it better to provide students with a handout (easily accomplished by the instructor with Sweave()) rather than demonstrating with the computer or letting them type commands into the computer. ?The two main advantages to this are that it is easier to keep students in roughly the same spot (if they are typing things in themselves then invariably someone forgot a comma, or misspelled, etc, and you spend all of your time troubleshooting individual students while others sit there or start surfing the net) and it lets the students see a correct set of commands on which they can take detailed notes (if they are typing in commands themselves they are just typing and not thinking and not taking notes that they can return to).

In the upper-level course, I fully integrate R into the notes and lecture.

You can seen what I do, if interested, because all of my class materials ("book", handouts, lecture slides) are available at www.ncfaculty.net/dogle/ and then follow the intro stats, biometry, or fisheries science "buttons."

4) If you have the heterogeneous group of students (some going on to program in R, others just trying to get through, etc.) how do we deal with this? Do we need to have different types of assignments and materials for the different students?

Again, I don't see this as a specific issue with using R -- i.e., even if we did all calculations by hand there would be heterogeneity amongst those that will continue with other statistics courses and those "just trying to get through."

5) A few more comments.

When I first started teaching with R I was just learning the program myself. ?I was terrible at teaching with it (and the students hated it) but I believed that it was the correct software to use. ?Now, five years later, I am a much better teacher of statistics and R and, most of my students either "enjoy" R (I have seen students become more accepting of typing in a command line rather than a GUI -- I believe that this is due to the amount of texting and chatting that they do relative to students from just a few years ago) or are, at least, accepting of the idea of "R". ?I even more firmly believe that this is the proper software for students to learn. ?Be firm in your commitment to R if you choose to teach with it but also be patient with yourself and your students during the first few years.

I do spend some time trying to teach students the "lexicon of R" as this makes communication about specific functions easier. ?For example, my students will know about constructor and extractor functions, named and positional arguments, objects, assignment operators, vectors, data.frames, etc. ?This has been a "cost" of using R in the sense that it does take time and, thus, "something else" had to go from my curriculum.

I also think that R is especially beneficial for my biometry course which focuses on regression (simple, multiple, and indicator variable), anova (one- and two-way), and logistic regression because the vast majority of these topics use a common set of functions -- lm(), anova(), summary(), confint(), etc. ?In addition, I have added a few functions (in my NCStats function) to make fitted line plots, residual plots, and to compare all slopes. ?Thus, students can learn a vast number of topics with very few commands in R. ?The same can be said in the intro class for t.test() (one-sample, two-sample, matched-pairs) and chisq.test() (goodness-of-fit and general chi-square). ?This efficiency is very convenient and powerful.

Finally, I have found R to be useful because students "like" free software, open-source concepts (at least at my environmental liberal arts college), and being on the "cutting edge." ?The more students like in your class the more likely they are to learn.

I agree with what Derek, my neighbor to the north, has said.

I teach introductory engineering statistics using R and have done so
for several years, although I am never completely satisfied with how R
blends with the text in such a course.  I have tried using a standard
introductory engineering text, specifically Devore's "Probability and
Statistics for Engineering and the Sciences", supplemented with
material on R (see the Devore6 package on CRAN which John Verzani
updated for the 7th edition to Devore7), Peter Dalgaard's
"Introductory Statistics with R" and now Cohen and Cohen's "Statistics
and Data with R".  I have also looked at "Probability and Statistics
with R" by Ugarte et al.

With the exception of Peter's book I found myself fighting the text.
That is, I found myself saying "the text presents this material this
way but it is unnecessary and confusing.  Do things this other way."

In the case of Peter's book I could agree with his presentation but
the book is clearly oriented toward biostatistics and has little
coverage of probability.  It came about as a supplement to another
text used in a course and reads like that so it has to be supplemented
extensively, especially if your audience is not from medical fields.

I would dearly love to see an approach to teaching statistics that
takes advantage of the graphical and computational capabilities of R
to remove redundant topics from the typical introductory course.
Sadly the last two texts I list (Cohen and Cohen, 2008;  Ugarte et al,
2008) do exactly the opposite.  Instead of using R to simplify an
approach to statistics they complicate an introductory course by
adding page after page of confusing R code.

What do I mean by simplify?  There are many topics in an introductory
statistics course that are ingrained in the curriculum but really are
there for the sake of approximation or computational simplification.
How many introductory texts still describe how to approximate a
"difficult" distribution by a "simpler" distribution (hypergeometric
by binomial, binomial by Poisson or Gaussian, etc.)?  When you can
calculate the exact probability why do you want to waste time teaching
an approximation and rules like "when np > 5 ..."?  Even a basic
graphical presentation, the histogram, is outmoded.  The purpose of
the histogram is to give us a picture of the density.  Why not use a
density plot for this?  There is a great advantage in that you can
easily overlay density plots from different groups, not to mention the
fact that it shows a smooth approximation to the density.  In the past
we used histograms because it was comparatively simple to choose bins
and count the observations in the bins then produce a bar chart.  We
can do better than that now.

Think carefully about the graphics.  Deepayan Sarkar (lattice) and
Hadley Wickham (ggplot2) have provided powerful techniques for
exploring data.  Students should benefit from that if they can do so
without needing to learn many, many details of the language.

When teaching the principles of hypothesis testing I describe a
p-value as "the probability of seeing the data that we did or
something more unusual when the null hypothesis (usually meaning "no
change") is true".  The closer that probability is to "impossible",
the stronger the evidence against the null hypothesis in favor of the
alternative.  The point is that we should go directly to the p-value.
All the confusing material about picking a level and calculating the
rejection region is there because we couldn't calculate that
probability when I took an introductory course more than 40 years ago.
 All we had then were slide rules, pencil and paper, and a few tables
in a book.  We can do better than that now.

Do we need to describe computational formulas in a text book?  It
turns out that just about every formula in an introductory text,
beyond the calculation of the sample mean, is not really the way that
the calculation is done.  Most of us know that the "short cut" formula
for the sample variance has bad numerical properties and a few might
know that regression coefficients are not really evaluated by
inverting X'X.  Why teach a formula that is only good for a simplified
situation, like a simple linear regression model?  Why not say that we
minimize the residual sum of squares and leave it at that?  Pay more
attention to model building and examining residuals.

In teaching I think it is important to strive for simplicity and
consistency in the use of R.  Keep the R code as concise as possible.

I prefer to teach lattice graphics because I think the graphics are
informative and because all the lattice functions can be called with a
formula/data pair of arguments, just as t.test, aov, lm, glm, nls,
etc. can be called with formula/data.  I use Sweave and the beamer
LaTeX class to generate the slides for my classes so that I can
extract the R code and make that available on the course web site.
The slides and class presentations describe the graphics calls
succinctly, if at all, but the detailed code is available for
examination if the students want to delve deeper.

In short, the worst way to use R in an introductory course is to teach
the same-old-same-old material augmented with page after page of
confusing R code.  Try to use the power of the computer and the
software to aid insight into data and to simplify the ideas of
statistics.

I have over the years produced slides for classes based first on
Devore's books then on Peter's book and now on the Cohen and Cohen
book.  I am willing to make these available, including the source
code, so others can borrow code or presentation approaches if they
wish.  I am not familiar with open documentation licenses like
Creative Commons.  If it would help to stimulate discussion I will
make them available without copyright.  I would be particularly
interested in corresponding with potential text book authors on some
of the techniques that I think can be used to simplify presentation of
R code and graphics.  I don't have plans to embark on writing a text
myself.

Hadley Wickham

Mon, Mar 30, 2009 3:29 PM #

Even knowing how to look up numbers in a table is an outdated skill!

I agree 100% with your points apart from this one.  I'm not a big fan
of density estimates because most real-life distributions are not
smooth, continuous and unbounded, like most density estimators assume
they are.  It's also much harder to understand how a density plot is
made, and while I don't think students need to understand the
motivations and theory for every tool they use, I think they should
understand how their basic graphic tools work.  A happy intermediate
is the frequency polygon, which has more favourable theoretical
properties than the histogram, but is equally easy to understand (and
you can overlay them like densities)

I would love to see these!

Hadley

http://had.co.nz/

Douglas Bates

Mon, Mar 30, 2009 4:02 PM #

On Mon, Mar 30, 2009 at 5:29 PM, hadley wickham <h.wickham at gmail.com> wrote:

Good point.  Density plots do have problems with smearing at the boundaries.

The source directory of my slides for Peter's book, "Introductory
Statistics with R", is available as

http://www.stat.wisc.edu/~bates/ISwR.zip

(I'm sorry, Hadley, but I use lattice throughout.  I haven't taken the
time to learn ggplot2.)

Hadley Wickham

Mon, Mar 30, 2009 5:54 PM #

Those are really nice.

I see you still teach t-tests and the Wilcoxon signed-rank test - is
this just an artefact of following Dalgaard, or do you have a
preference for them over the (computational expensive but conceptually
simpler) permutation tests?

No problems.  I don't think there's much difference in capabilities at
this level.

Hadley

http://had.co.nz/

Douglas Bates

Tue, Mar 31, 2009 6:41 AM #

On Mon, Mar 30, 2009 at 7:54 PM, hadley wickham <h.wickham at gmail.com> wrote:

Thanks.  I'll get the set for the Cohen and Cohen book up after I
clear off some disk space.

The nonparametric tests are there because they are described in
Peter's text.  I skip them in my classes.

I tend to use t-tests after examining normal probability plots and,
possibly, considering transformation.  I believe they would be more
powerful than permutation tests but that may be incorrect.  Can you
describe situations in which you would prefer permutation tests to
t-tests?

Hadley Wickham

Tue, Mar 31, 2009 7:24 AM #

The basic argument I'm most familiar with is presented in:
http://repositories.cdlib.org/uclastat/cts/tise/vol1/iss1/art1/

"My thesis is that both the content and the structure of our
introductory curriculum are shaped by old history. What we teach was
developed a little at a time, for reasons that had a lot to do with
the need to use available theory to handle problems that were
essentially computational. Almost one hundred years after Student
published his 1908 paper on the t- test, we are still using 19th
century analytical methods to solve what is essentially a technical
problem ? computing a p-value or a 95% margin of error.
Intellectually, we are asking our students to do the equivalent of
working with one of those old 30-pound Burroughs electric calculators
with the rows of little wheels that clicked and spun as they churned
out sums of squares."


Hadley

http://had.co.nz/