Skip to content
Prev 368 / 7419 Next

Clustering large data

Hi Hadley,

Here is a more elaborate report of what I did and what when wrong. The
example is not reproducible because the dataset is to large. A smaller
dummy dataset is not an option as it works with smaller datasets. I'm
willing to run the code again with a development version of reshape.

Cheers,

Thierry
Loading required package: plyr
sysname                      release 
                   "Windows"                         "XP" 
                     version                     nodename 
"build 2600, Service Pack 2"                 "LHPA000838" 
                     machine                        login 
                       "x86"           "thierry_onkelinx" 
                        user 
          "thierry_onkelinx"
R version 2.7.2 (2008-08-25) 
i386-pc-mingw32 

locale:
LC_COLLATE=Dutch_Belgium.1252;LC_CTYPE=Dutch_Belgium.1252;LC_MONETARY=Du
tch_Belgium.1252;LC_NUMERIC=C;LC_TIME=Dutch_Belgium.1252

attached base packages:
[1] stats     graphics  grDevices datasets  tcltk     utils     methods

[8] base     

other attached packages:
[1] reshape_0.8.1  plyr_0.1       RODBC_1.2-3    svSocket_0.9-5
svIO_0.9-5    
[6] R2HTML_1.59    svMisc_0.9-5   svIDE_0.9-5   

loaded via a namespace (and not attached):
[1] tools_2.7.2
Location, TaxonFK AS Species FROM kmhok_periode2_selectie ORDER BY
KMhokcode, TaxonFK", as.is = TRUE)
[1] 1157024       3
[1] 6354
[1] 1381
= 0))
   user  system elapsed 
   0.11    0.00    0.17
= 0))
   user  system elapsed 
    1.7     0.0     1.7
fill = 0))
   user  system elapsed 
  46.42    0.45   47.02
Error: cannot allocate vector of size 33.5 Mb
Timing stopped at: 322.95 3.43 327.4
user  system elapsed 
   1.10    0.00    1.11 
 


------------------------------------------------------------------------
----
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature
and Forest
Cel biometrie, methodologie en kwaliteitszorg / Section biometrics,
methodology and quality assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium 
tel. + 32 54/436 185
Thierry.Onkelinx at inbo.be 
www.inbo.be 

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey

-----Oorspronkelijk bericht-----
Van: r-sig-ecology-bounces at r-project.org
[mailto:r-sig-ecology-bounces at r-project.org] Namens hadley wickham
Verzonden: vrijdag 10 oktober 2008 14:40
Aan: ONKELINX, Thierry
CC: r-sig-ecology at r-project.org
Onderwerp: Re: [R-sig-eco] Clustering large data
solution
Exactly what error did you get?  Or did it just take a very long time
and then you gave up?  I have an experimental rewrite of the reshape
package that is more memory efficient and much faster (10 - 20x) -
however, it's still some time from being ready for production use.

Hadley