Skip to content

CRAN mirror size mushrooming; consider archiving some?

4 messages · Paul Johnson, Uwe Ligges, Hadley Wickham +1 more

#
Hi, everybody

I'm setting up a new CRAN mirror and filled up the disk space the
server allotted me.  I asked for more, then filled that up.  Now the
system administrators want me to buy an $800 fiber channel card and a
storage device.  I'm going to do that, but it does make want to
suggest to you that this is a problem.

CRAN now is about 68GB, and about 3/4 of that is in the bin folder,
where one finds copies of compiled packages for macosx and windows.
If the administrators of CRAN would move the packages for R before,
say, 2.12, to long term storage, then mirror management would be a bit
more, well, manageable.

Moving the R for windows packages for, say, R 2.0 through 2.10 would
save some space, and possibly establish a useful precedent for the
long term.

Here's the bin/windows folder. Note it is expanding exponentially (or nearly so)

$ du --max-depth=1 | sort
1012644 ./2.6
103504  ./1.7
122200  ./1.8
1239876 ./2.7
1487024 ./2.8
15220   ./ATLAS
167668  ./1.9
17921604        .
1866196 ./2.9
204392  ./2.0
2207708 ./2.10
2340120 ./2.13
2356272 ./2.12
2403176 ./2.11
298620  ./2.1
364292  ./2.2
438044  ./2.3
595920  ./2.4
698064  ./2.5

--
1 day later
#
On 25.07.2011 19:47, Paul Johnson wrote:
Why? Just for the mirror? That's nonsense. A 6 year old outdated desktop 
machine (say upgraded to 2GB RAM) with a 1T harddisc for 50$ should be 
fine for your first tries. The bottleneck will probably be your network 
connection rather than the storage.
That is right, but then users of R < 2.11.0 could no longer use 
install.packages() and friends. If we want to move stuff around in 
future, we may want to implement that in R first. We thought about 
removing old binaries before, but then disk space increased roughly as 
exponentially as repository space in the past and we decided to stay 
with it as is.
And you see that quite a lot of efforts were made during the last 
release cycles to reduce the amount of used memory (e.g. using better 
compression).

Best wishes,
Uwe
#
Another perspective is that it costs ~$10 / month to store 68 Gb of
data on amazon's S3.  And then you pay 12c / GB for download.

Hadley