popular R packages
Is this another discussion of what data might be collected and
analyzed, and what could and could not be said if we only had such data?
Has anyone but me produced any actual data? If so, I missed it.
Hadly mentioned the 'fortunes' package. My earlier methodology,
"RSiteSearch('library(fortunes)')", produced 40 hits for 'fortunes',
compared to 169 for 'lme4' and 2 for 'DierckxSpline'.
With anything like this, it would be wise to approach the problem
from many different perspectives, recognizing that the strengths of one
approach can help improve our understanding of what other analyses say
about the question at hand.
Happy Sunday.
Spencer Graves
(Ted Harding) wrote:
On 08-Mar-09 15:14:03, Duncan Murdoch wrote:
On 08/03/2009 10:49 AM, hadley wickham wrote:
More seriously : I don't think relative numbers of package downloads
can be interpreted in any reasonable way, because reasons for
package download have a very wide range from curiosity ("what's
this ?"), fun (think "fortunes"...), to vital need tthink lme4
if/when a consensus on denominator DFs can be reached :-)...).
What can you infer in good faith from such a mess ?
So when we have messy data with measurement error, we should just
give up? Doesn't sound very statistical! ;)
I think the situation is worse than messy. If a client comes in with
data that doesn't address the question they're interested in, I think
they are better served to be told that, than to be given an answer that
is not actually valid. They should also be told how to design a study
that actually does address their question.
You (and others) have mentioned Google Analytics as a possible way to
address the quality of data; that's helpful. But analyzing bad data
will just give bad conclusions.
Duncan Murdoch
The population of R users (which we would need to sample in order to obtain good data) is probably more elusive than a fish population in the ocean -- only partially visible at best, and with an unknown proportion invisible. At least in Fisheries research, there are long established capture techniques (from trawling to netting to electro-fishing to ... ) which can be deployed, for research purposes, in such a way as to potentially reach all members of a target population, with at least a moderately good approximation to random sampling. What have we for R? Come to think of it, electro-fishing, ... Suppose R were released with 2 types of cookie embedded in base R. Each type is randomly configured, when R is first run, to be Active or Inactive (probability of activation to be decided at the design stage ... ). Type 1, if active, on a certain date generates an event which brings it to the notice of R-Core (e.g. by clandestine email or by inducing a bug report). Type 2 acts similarly on a later date. If Type 2 acts, it carries with it information as to whether there was a Type 1 action along with whether, apparently, the Type 1 action "succeeded". We then have, in effect, an analogue of the Mark-Recapture technique of population estimation (along with the usual questions about equal catchability and so forth). However, since this sort of thing (which I am not proposing seriously, only for the sake of argument) is undoubtedly unethical (and would do R's reputation no good if it came to light), I tentatively conclude that the population of R users is likely to remain as elusive as ever. Best wishes to all, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 08-Mar-09 Time: 16:11:44 ------------------------------ XFMail ------------------------------
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.