Skip to content

importing and processing large datasets in R (fwd)

8 messages · Bob, gaurav singh, R. Michael Weylandt +4 more

Bob
#
I am one of the people who lobbied for the creation of this list long
ago.  I am not sure R is a great choice for a first course in
statistics, but I thought that if someone chose to use it, then they
and their students might need all the help they could get to make it
easier for the class.  But right from the beginning, the bulk of the
posts to the list were like this latest one quoted below -- questions
about how to do something with R that has no obvious connection to
pedagogy or to using R in a first course.  This means that those of us
interested in the actual topic of this list get lots of off-topic
messages, while those who post the messages reach only a small
audience that may not be interested in their question.  Some off topic
posts are answered, some ingnored, and some posters get redirected
(even scolded) toward a more appropriate list.  I see only losers in
this process.

So my question is whether this list really serves any useful purpose,
or does it just siphon off queries that should have gone elsewhere?
Those who post those queries would be likely to get an answer, and get
it sooner, if they posted to an appropriate list in the first place.
My own answer is that this list is not useful at the present time.
Possibly in the future more people will be interested in R for an
introductory course and then they might be glad if this list were
still alive, but so far...

So I am wondering what others on the list think.

Here's the official description of this list.

Special Interest Group (SIG) on teaching statistics with R. The
primary purpose of the group is to provide a forum where instructors
using R in their statistics courses can share ideas, teaching
materials, and experiences. One particular focus of the SIG is to
provide helpful support to instructors new to R who are teaching
introductory statistics courses populated with students with little
experience in statistics, statistical software, and command line
interfaces. 

Here is where most posts to this list really should have gone.

R-help

    The ?main? R mailing list, for discussion about problems and
    solutions using R, announcements (not covered by ?R-announce? or
    ?R-packages?, see above), about the availability of new
    functionality for R and documentation of R, comparison and
    compatibility with S-plus, and for the posting of nice examples
    and benchmarks.

Forwarded message:
------->  First-time AP Stats. teacher?  Help is on the way! See
http://courses.ncssm.edu/math/Stat_Inst/Stats2007/Bob%20Hayden/Relief.html
      _
     | |          Robert W. Hayden
     | |          142 Main Street
    /  |          Apartment 104
   |   |          Jaffrey, New Hampshire 03452  USA
   |   |          email: bob@ the site below
  /    |          website: http://statland.org
 | x   /          phone: (603) 532-7224 (home)
 ''''''
#
Hello Bob,

I'm one of those who follow the list mainly out of curiosity, not
because I have any connection with teaching stats.  My goal was to
keep an eye out for some of the occasional posts where people post
useful links or material that they use in teaching... but I understand
the frustration with the frequent off-topic requests for help learning
R.

I don't know if yet another mailing list would be worth considering...
but for comparison, in the Python world there is a SIG for Education,
very similar to this one... and there is a SIG/mailing list called
'python-tutor', specifically for people asking beginner type
questions, or who maybe don't want to brave the 'main' list and its
denizens just yet because they still need/want some hand-holding.
There is very little off-topic discussion on the Edu list, and
neophytes have a kiddie pool that they feel welcome in.

I know the R community seems to kind of enjoy the occasional
snarkiness on the main list by certain posters, and I've certainly
been around the 'Net long enough to have developed a thick enough skin
myself, but for new users it may be a bit intimidating.  Maybe an
'R-Novice' or 'R-Tutor' list would open up an intermediate ground for
beginners and reduce the 'pollution' of the R-Teaching list?

Monte
#
On Fri, Jan 18, 2013 at 3:32 PM, Monte Milanuk <memilanuk at gmail.com> wrote:
It's a topic that's been discussed many times before on R-help, with
the usual outcome being some feeling that it would lower the quality
of responses that newbies get. I'm not sure it's an argument I buy,
but if you want to raise the topic again, I can certainly promise to
be active (enough) on R-Tutor.

There might also be some value in thinking of an R-Statistics list
(though under a much better name) where statistical questions are fair
game -- it's always seemed odd to me that data munging is a fair topic
on the various R lists, while data analysis isn't.

Just throwing it out there,

MW
#
I'd participate actively in an "R-tutor" mailing list.

--Chris

Christopher W. Ryan, MD, MS
SUNY Upstate Medical University Clinical Campus at Binghamton
425 Robinson Street, Binghamton, NY  13904
cryanatbinghamtondotedu

"Once we recognize that we do not err out of laziness, stupidity, or
evil intent, we can liberate ourselves from the impossible burden of
trying to be permanently right. We can take seriously the proposition
that we could be in error, without deeming ourselves idiotic or
unworthy." [Karen Schulz, in Being Wrong: Adventures in the Margin of Error]
R. Michael Weylandt wrote:
#
I think the kinds of confusions that prompted this discussion are 
normal, and may not be something to fret over.  That said, I support the 
idea of another appropriate SIG.  However, r-sig-teaching & r-sig-tutor 
are going to elicit more of these same kinds of confusions.  We should 
use r-sig-forstatisticseducation and r-sig-fornovices, instead.
On 1/18/2013 11:08 AM, Christopher W. Ryan wrote:
#
I'm not sure I would call the list completely unsuitable for your question, but it is borderline.  My main point (also expressed in Bob's reply) is that you might get better answers elsewhere because there may be more people with more expertise in this area on other lists.

As a side note, I think that we are heading into an era where using more large data sets in lower level courses may become more common.  So this may come back around to become very appropriate for this list (although this list may still be shorter on the expertise to answer your questions fully than some other lists).

In any case, good luck solving your large data problems.

As for the usefulness and (intended) scope of the SIG as it currently exists, I'd have to think more about that.  It seems to me that the trend is toward more use of R in undergraduate courses, especially in second courses and in courses for majors and minors, but also in intro courses.  But my view of the world may be distorted by the fact that I have been heavily involved in that the last few years.

Finally, I'll mention one outcome of the work described in the preceding paragraph is an R package (co-developed with Danny Kaplan and Nick Horton) that we think makes it much easier to teach statistics using R. The package is called mosaic and is available on CRAN.  We are getting closer and closer to something we are willing to call version 1.0.  Perhaps in a separate message I can say a bit more about what is in that package while there may still be time to adjust things for second semester courses that are about to begin or have just begun.

---rjp
On Jan 18, 2013, at 10:20 AM, gaurav singh wrote:

            
#
How about 'r-sig-edu' for Education - should be obvious enough to
faculty, teaching staff, etc., and 'r-sig-novice' (or -tutor) which
would probably 'jump out' as the obvious choice to a new user trying
to choose which list to subscribe and post to...

Monte
On Fri, Jan 18, 2013 at 8:22 AM, Jeff Laux <jefflaux at gmail.com> wrote: