An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20131108/45f4ac90/attachment.pl>
Milestone: 5000 packages on CRAN
3 messages · Henrik Bengtsson, William Dunlap, Spencer Graves
"Currently, the CRAN package repository features 5001 available packages." Going from 4000 to 5000 packages took 14.5 months - that's one new package every 10.5 hours. Behind every package there are real people. These user-contributed packages are maintained by ~2900 people [2] - that's 350 new maintainers and many more contributors. More people to thank than ever before - don't forget about them, e.g. cite properly when publishing.
Congratulations! I have often wondered about the natural history of R packages: how often they are created and shared, how long they are used, how many people use them, how long they are maintained, etc. The usage numbers are hard to get, but the "Last modified" dates in the CRAN archives do give some information on how often new packages are shared and how long they are maintained. Here are some summaries of derived from those dates. The code to get the data and calculate (and plot) the summaries follows.
newPkgsByYear
1997-01-01 1998-01-01 1999-01-01 2000-01-01
2 12 56 41
2001-01-01 2002-01-01 2003-01-01 2004-01-01
65 66 101 144
2005-01-01 2006-01-01 2007-01-01 2008-01-01
209 280 329 374
2009-01-01 2010-01-01 2011-01-01 2012-01-01
502 546 702 809
2013-01-01
439
table(nUpdatesSinceSep2011) # number of recent updates (not including original submission)
nUpdatesSinceSep2011 0 1 2 3 4 5 6 7 8 9 2079 963 528 332 238 166 75 79 50 43 10 11 12 13 14 15 16 17 18 19 23 22 13 14 8 9 12 5 4 1 20 21 22 24 26 27 31 32 34 1 3 1 2 1 2 1 1 1 The code I used is: library(XML) getArchiveList <- function(site = "http://cran.r-project.org/src/contrib/Archive/") { retval <- readHTMLTable(site, stringsAsFactors=FALSE)[[1]] retval <- retval[!is.na(retval$Name) & grepl("/$", retval$Name), ] retval$Name <- gsub("/$", "", retval$Name) retval$"Last modified" <- as.Date(retval$"Last modified", format="%d-%b-%Y") retval } getArchiveEntry <- function(Name, site = "http://cran.r-project.org/src/contrib/Archive/") { retval <- readHTMLTable(paste0(site, Name), stringsAsFactors=FALSE)[[1]] retval <- retval[!is.na(retval$Name) & retval$Name != "Parent Directory", ] retval$"Last modified" <- as.Date(retval$"Last modified", format="%d-%b-%Y") retval } al <- getArchiveList() # The next may bog down the CRAN archive server - do not do it often # ae <- lapply(structure(al$Name, names=al$Name), # function(Name)tryCatch(getArchiveEntry(Name), # error=function(e)data.frame(Name=character(), "Last Modified" = as.Date(character())))) initialSubmissionDate <- as.Date(vapply(ae, function(e)min(e[["Last modified"]]), 0), origin=as.Date("1970-01-01")) lastSubmissionDate <- as.Date(vapply(ae, function(e)max(e[["Last modified"]]), 0), origin=as.Date("1970-01-01")) mths <- seq(as.Date("1997-10-01"), as.Date("2014-01-01"), by="months") yrs <- seq(as.Date("1997-01-01"), as.Date("2014-01-01"), by="years") par(ask=TRUE) newPkgsByMonth <- table(cut(initialSubmissionDate, mths)) newPkgsByYear <- table(cut(initialSubmissionDate, yrs)) plot(mths[-1], newPkgsByMonth, log="y", ylab="# New Pkgs", main="New packages by month") # number of additions each month yearsOfMaintainanceActivity <- as.numeric(lastSubmissionDate - initialSubmissionDate, units="days")/365.25 hist(yearsOfMaintainanceActivity, xlab="Years", main="Maintainance Duration") newPkgsByYear table(floor(yearsOfMaintainanceActivity)) nUpdatesSinceSep2011 <- vapply(ae, function(e){ Lm <- e[["Last modified"]] sum(Lm >= as.Date("2011-09-01") & Lm != min(Lm))}, 0L) table(nUpdatesSinceSep2011) # number of recent updates (not including original submission) Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com
-----Original Message----- From: r-devel-bounces at r-project.org [mailto:r-devel-bounces at r-project.org] On Behalf Of Henrik Bengtsson Sent: Friday, November 08, 2013 1:59 PM To: R Development Mailing List Subject: [Rd] Milestone: 5000 packages on CRAN Here we go again... Today (2011-11-08) on The Comprehensive R Archive Network (CRAN) [1]: "Currently, the CRAN package repository features 5001 available packages." Going from 4000 to 5000 packages took 14.5 months - that's one new package every 10.5 hours. Behind every package there are real people. These user-contributed packages are maintained by ~2900 people [2] - that's 350 new maintainers and many more contributors. More people to thank than ever before - don't forget about them, e.g. cite properly when publishing. Milestones: 2013-11-08: 5000 packages [this post] 2012-08-23: 4000 packages [7] 2011-05-12: 3000 packages [6] 2009-10-04: 2000 packages [5] 2007-04-12: 1000 packages [4] 2004-10-01: 500 packages [3,4] 2003-04-01: 250 packages [3,4] [1] http://cran.r-project.org/web/packages/ [2] http://cran.r-project.org/web/checks/check_summary_by_maintainer.html [3] Private data. [4] https://stat.ethz.ch/pipermail/r-devel/2007-April/045359.html [5] https://stat.ethz.ch/pipermail/r-devel/2009-October/055049.html [6] https://stat.ethz.ch/pipermail/r-devel/2011-May/061002.html [7] https://stat.ethz.ch/pipermail/r-devel/2012-August/064675.html /Henrik PS. These data are for CRAN only. There are more packages elsewhere, e.g. R-Forge, Bioconductor, Github etc. [[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
CRAN size has grown almost exponentially at least since 2001. R history was discussed by John Fox (2009) Aspects of the Social Organization and Trajectory of the R Project, R Journal (http://journal.r-project.org/archive/2009-2/RJournal_2009-2_Fox.pdf). Below please find his data plus 5 additional points I added and R script I used to fit a models. I won't defend the models fit in the script below. However, unless CRAN management changes dramatically in the next 5 years, it seems likely that CRAN will have 10,000 packages some time in 2018. By the way, if you don't already use the sos package routinely, I encourage you to consider it. For me, it's by far the fastest literature search for anything statistical. In a very few minutes, I get an Excel file with a summary by package of the matches to almost any combination of search terms. (Shameless plug by the lead author of the package ;-) Best Wishes, Spencer Graves date packages 2001-06-21 110 2001-12-17 129 2002-06-12 162 2003-05-27 219 2003-11-16 273 2004-06-05 357 2004-10-12 406 2005-06-18 548 2005-12-16 647 2006-05-31 739 2006-12-12 911 2007-04-12 1000 2007-11-16 1300 2008-03-18 1427 2008-10-18 1614 2009-09-17 1952 2012-06-12 3786 2012-11-01 4082 2012-12-14 4210 2013-10-28 4960 2013-11-08 5000 library(gdata) (CRANfile <- dir(pattern='s\\.xls$')) #readLines(CRANfile) str(CRANhist. <- read.xls(CRANfile, stringsAsFactors=FALSE, header=TRUE)) tail(CRANhist., 11) CRANhist <- CRANhist.[1:20, 1:2] (dt. <- as.Date(CRANhist$date)) CRANhist$date <- dt. (day1 <- min(CRANhist$date)) # 2001-06-21 str(ddate <- CRANhist$date-day1) # difftime in days CRANhist$CRANdays <- as.numeric(ddate) (growth <- lm(log(packages)~CRANdays, CRANhist)) CRANhist$pred <- exp(predict(growth)) plot(packages~date, CRANhist, log='y') lines(pred~date, CRANhist, pch='.') fitLogLogis <- nls(log(packages) ~ a+b*CRANdays + log(1+exp(d+b*CRANdays)), CRANhist, start=c(a=4.9, b=0.0009, d=0)) # Error ... singular gradient library(drc) CRANlogLogis <- drm(packages~CRANdays, data=CRANhist, fct=LL.3()) plot(CRANlogLogis, log='y') # very poor through 2005 CRANlogLogis. <- drm(log(packages)~CRANdays, data=CRANhist, fct=LL.3()) plot(CRANlogLogis., log='y') # terrible: far worse than CRANlogLogis CRANlogLogis4 <- drm(packages~CRANdays, data=CRANhist, fct=LL.4()) plot(CRANlogLogis4, log='y') # poor for 2001 but great otherwise CRANlogLogis4. <- drm(log(packages)~CRANdays, data=CRANhist, fct=LL.4()) plot(CRANlogLogis4., log='y') # best I've found so far. abline(h=c(4200, 8400)) sapply(CRANhist, range) pred.dTimes <- seq(0, 6000, 100) CRANpred <- predict(CRANlogLogis4., data.frame(CRANdays=pred.dTimes)) data.frame(Date=as.Date(day1+pred.dTimes), nPkgs=exp(CRANpred)) plot(day1+pred.dTimes, exp(CRANpred), type='l', log='y') points(packages~date, CRANhist) pred.dTimes <- seq(0, 10000, 100) CRANpred <- predict(CRANlogLogis4., data.frame(CRANdays=pred.dTimes)) plot(day1+pred.dTimes, exp(CRANpred), type='l', log='y') points(packages~date, CRANhist) abline(h=c(4200, 8400)) abline(v=as.Date('2012-12-14')) abline(v=as.Date('2017-09-30')) ######################### abline(h=20000) abline(h=70000) pred.dTimes <- seq(0, 1000000, 10000) CRANpred <- predict(CRANlogLogis4., data.frame(CRANdays=pred.dTimes)) plot(day1+pred.dTimes, exp(CRANpred), type='l', log='y') points(packages~date, CRANhist)
On 11/8/2013 4:43 PM, William Dunlap wrote:
"Currently, the CRAN package repository features 5001 available packages." Going from 4000 to 5000 packages took 14.5 months - that's one new package every 10.5 hours. Behind every package there are real people. These user-contributed packages are maintained by ~2900 people [2] - that's 350 new maintainers and many more contributors. More people to thank than ever before - don't forget about them, e.g. cite properly when publishing.
Congratulations! I have often wondered about the natural history of R packages: how often they are created and shared, how long they are used, how many people use them, how long they are maintained, etc. The usage numbers are hard to get, but the "Last modified" dates in the CRAN archives do give some information on how often new packages are shared and how long they are maintained. Here are some summaries of derived from those dates. The code to get the data and calculate (and plot) the summaries follows.
newPkgsByYear
1997-01-01 1998-01-01 1999-01-01 2000-01-01
2 12 56 41
2001-01-01 2002-01-01 2003-01-01 2004-01-01
65 66 101 144
2005-01-01 2006-01-01 2007-01-01 2008-01-01
209 280 329 374
2009-01-01 2010-01-01 2011-01-01 2012-01-01
502 546 702 809
2013-01-01
439
table(nUpdatesSinceSep2011) # number of recent updates (not including original submission)
nUpdatesSinceSep2011
0 1 2 3 4 5 6 7 8 9
2079 963 528 332 238 166 75 79 50 43
10 11 12 13 14 15 16 17 18 19
23 22 13 14 8 9 12 5 4 1
20 21 22 24 26 27 31 32 34
1 3 1 2 1 2 1 1 1
The code I used is:
library(XML)
getArchiveList <- function(site = "http://cran.r-project.org/src/contrib/Archive/") {
retval <- readHTMLTable(site, stringsAsFactors=FALSE)[[1]]
retval <- retval[!is.na(retval$Name) & grepl("/$", retval$Name), ]
retval$Name <- gsub("/$", "", retval$Name)
retval$"Last modified" <- as.Date(retval$"Last modified", format="%d-%b-%Y")
retval
}
getArchiveEntry <- function(Name, site = "http://cran.r-project.org/src/contrib/Archive/") {
retval <- readHTMLTable(paste0(site, Name), stringsAsFactors=FALSE)[[1]]
retval <- retval[!is.na(retval$Name) & retval$Name != "Parent Directory", ]
retval$"Last modified" <- as.Date(retval$"Last modified", format="%d-%b-%Y")
retval
}
al <- getArchiveList()
# The next may bog down the CRAN archive server - do not do it often
# ae <- lapply(structure(al$Name, names=al$Name),
# function(Name)tryCatch(getArchiveEntry(Name),
# error=function(e)data.frame(Name=character(), "Last Modified" = as.Date(character()))))
initialSubmissionDate <- as.Date(vapply(ae, function(e)min(e[["Last modified"]]), 0), origin=as.Date("1970-01-01"))
lastSubmissionDate <- as.Date(vapply(ae, function(e)max(e[["Last modified"]]), 0), origin=as.Date("1970-01-01"))
mths <- seq(as.Date("1997-10-01"), as.Date("2014-01-01"), by="months")
yrs <- seq(as.Date("1997-01-01"), as.Date("2014-01-01"), by="years")
par(ask=TRUE)
newPkgsByMonth <- table(cut(initialSubmissionDate, mths))
newPkgsByYear <- table(cut(initialSubmissionDate, yrs))
plot(mths[-1], newPkgsByMonth, log="y", ylab="# New Pkgs", main="New packages by month") # number of additions each month
yearsOfMaintainanceActivity <- as.numeric(lastSubmissionDate - initialSubmissionDate, units="days")/365.25
hist(yearsOfMaintainanceActivity, xlab="Years", main="Maintainance Duration")
newPkgsByYear
table(floor(yearsOfMaintainanceActivity))
nUpdatesSinceSep2011 <- vapply(ae, function(e){
Lm <- e[["Last modified"]]
sum(Lm >= as.Date("2011-09-01") & Lm != min(Lm))}, 0L)
table(nUpdatesSinceSep2011) # number of recent updates (not including original submission)
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
-----Original Message----- From: r-devel-bounces at r-project.org [mailto:r-devel-bounces at r-project.org] On Behalf Of Henrik Bengtsson Sent: Friday, November 08, 2013 1:59 PM To: R Development Mailing List Subject: [Rd] Milestone: 5000 packages on CRAN Here we go again... Today (2011-11-08) on The Comprehensive R Archive Network (CRAN) [1]: "Currently, the CRAN package repository features 5001 available packages." Going from 4000 to 5000 packages took 14.5 months - that's one new package every 10.5 hours. Behind every package there are real people. These user-contributed packages are maintained by ~2900 people [2] - that's 350 new maintainers and many more contributors. More people to thank than ever before - don't forget about them, e.g. cite properly when publishing. Milestones: 2013-11-08: 5000 packages [this post] 2012-08-23: 4000 packages [7] 2011-05-12: 3000 packages [6] 2009-10-04: 2000 packages [5] 2007-04-12: 1000 packages [4] 2004-10-01: 500 packages [3,4] 2003-04-01: 250 packages [3,4] [1] http://cran.r-project.org/web/packages/ [2] http://cran.r-project.org/web/checks/check_summary_by_maintainer.html [3] Private data. [4] https://stat.ethz.ch/pipermail/r-devel/2007-April/045359.html [5] https://stat.ethz.ch/pipermail/r-devel/2009-October/055049.html [6] https://stat.ethz.ch/pipermail/r-devel/2011-May/061002.html [7] https://stat.ethz.ch/pipermail/r-devel/2012-August/064675.html /Henrik PS. These data are for CRAN only. There are more packages elsewhere, e.g. R-Forge, Bioconductor, Github etc. [[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Spencer Graves, PE, PhD President and Chief Technology Officer Structure Inspection and Monitoring, Inc. 751 Emerson Ct. San Jos?, CA 95126 ph: 408-655-4567 web: www.structuremonitoring.com