Skip to content

About performance of R

5 messages · Suman, Bert Gunter, Jeff Newmiller +2 more

#
Hi there,

Now that R has grown up with a vibrant community. It's no 1 statistical package used by scientists. It's graphics capabilities are amazing.
Now it's time to provide native support in "R core" for distributed and parallel computing for high performance in massive datasets.
And may be base R functions should be replaced with best R packages like data.table, dplyr, reader for fast and efficient operations.


Thanks

Sent from my iPad
#
Did you consider the amount of code your "suggestions" would break?

-- Bert

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
Clifford Stoll
On Wed, May 27, 2015 at 8:00 AM, Suman <suman12029 at yahoo.co.uk> wrote:
#
a) Base R already includes the "parallel" package. Deciding to use more than one processor for a particular computation is a very high level decision that can require knowledge of computing time cost, importance of other tasks on the system, and interdependence of computation results. It is not a decision that R should automatically make.

b) Most performance issues with R arise due to users choosing inefficient algorithms. Inserting parallelism inside existing algorithms will not fix that.
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.
On May 27, 2015 8:00:03 AM PDT, Suman <suman12029 at yahoo.co.uk> wrote:
#
On May 27, 2015, at 8:00 AM, Suman wrote:

            
Generally email exhortations from iPads are quite ineffective in promoting fundamental advances. 

In the US, efforts at cheerleading of sports events are often attempted by small groups of scantily clad women of various ages, usually young, using coordinated dance movements. I wonder if something similar should be attempted those desirous of more rapid advancement of computer software. On the other hand, I suppose the world has been historically effective in this domain by waving large bundles of cash and stock options, rather than waving unclad female body parts. 

Got any cash to wave?
#
On 27/05/2015 11:00 AM, Suman wrote:
Given your first three sentences, I would say the current development 
strategy for R is successful.  As Bert mentioned, one thing we have 
always tried to do is to make improvements without large disruptions to 
the existing code base.  I think we will continue to do that.

This means we are unlikely to make big, incompatible replacements. But 
there's nothing stopping people from using data.table, dplyr, etc. even 
if they aren't in the core.  In fact, having them outside of core R is 
better:  there are only so many core R developers, and if they are 
working on data.table, etc., they wouldn't be working on other things.

Compatible replacements are another question.  There is ongoing work on 
making R faster, and making it easier to take advantage of multiple 
processors.  I believe R 3.2.0 is faster than the R 3.1.x series in many 
things, and changes like that are likely to continue.  Plus, there is 
base support for explicit parallel programming in the parallel package, 
as Jeff mentioned.

As to David and his large bundles; those would definitely be appreciated.

Duncan Murdoch