popular R packages
On 08-Mar-09 15:14:03, Duncan Murdoch wrote:
On 08/03/2009 10:49 AM, hadley wickham wrote:
More seriously : I don't think relative numbers of package downloads
can be interpreted in any reasonable way, because reasons for
package download have a very wide range from curiosity ("what's
this ?"), fun (think "fortunes"...), to vital need tthink lme4
if/when a consensus on denominator DFs can be reached :-)...).
What can you infer in good faith from such a mess ?
So when we have messy data with measurement error, we should just give up? Doesn't sound very statistical! ;)
I think the situation is worse than messy. If a client comes in with data that doesn't address the question they're interested in, I think they are better served to be told that, than to be given an answer that is not actually valid. They should also be told how to design a study that actually does address their question. You (and others) have mentioned Google Analytics as a possible way to address the quality of data; that's helpful. But analyzing bad data will just give bad conclusions. Duncan Murdoch
The population of R users (which we would need to sample in order to obtain good data) is probably more elusive than a fish population in the ocean -- only partially visible at best, and with an unknown proportion invisible. At least in Fisheries research, there are long established capture techniques (from trawling to netting to electro-fishing to ... ) which can be deployed, for research purposes, in such a way as to potentially reach all members of a target population, with at least a moderately good approximation to random sampling. What have we for R? Come to think of it, electro-fishing, ... Suppose R were released with 2 types of cookie embedded in base R. Each type is randomly configured, when R is first run, to be Active or Inactive (probability of activation to be decided at the design stage ... ). Type 1, if active, on a certain date generates an event which brings it to the notice of R-Core (e.g. by clandestine email or by inducing a bug report). Type 2 acts similarly on a later date. If Type 2 acts, it carries with it information as to whether there was a Type 1 action along with whether, apparently, the Type 1 action "succeeded". We then have, in effect, an analogue of the Mark-Recapture technique of population estimation (along with the usual questions about equal catchability and so forth). However, since this sort of thing (which I am not proposing seriously, only for the sake of argument) is undoubtedly unethical (and would do R's reputation no good if it came to light), I tentatively conclude that the population of R users is likely to remain as elusive as ever. Best wishes to all, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 08-Mar-09 Time: 16:11:44 ------------------------------ XFMail ------------------------------