Neat way of using R for pivoting? - R-help

Tue, Sep 20, 2005 9:46 AM #

09:46AM >>>

results.

R does not have pivot tables and I hope that it never does.

My experiance with pivot tables is that they encourage poor initial
design followed
by non-easily-reproducable post-hoc twiddling.

R encourages proper initial design followed by fixing the core design
in cases
where things don't turn out the way you intended. 
 
In R I prefer to work with script files and save the file.  If the
table or graph
does not turn out the way I intended, then I just edit the script file
and rerun it.
While this may be a little more work than clicking on a pivot table at
first, in the 
long run I find it saves more time.

Consider the situation where you create a table/graph, then a month
later your
boss/client/coworker finds some typos in the original data and needs
the table
and/or graph recreated with the corrected data (or maybe a new dataset
that
needs a similar graph/table).  With the pivot table you need to try and
remember
everything that you clicked on and click on it again.  With the R
script file you 
just fix the data (or load in the new data) and rerun the script and
your done.

OK, enough of my ranting, on to helping with your problem.

quite a

"Doing it

me off.
[snip]

"by" is a bit of an overkill for this situation, tapply will probably
work better.

try this basic script as a starting place:

### start ###
my.df <- data.frame( SNR=rep( c(4,6,8), each=3), 
	timeError = c(1.3,2.1,1.2,2.1,2.2,2.1,3.2,3.7,3.1))

tmp.mean <- tapply( my.df$timeError, my.df$SNR, mean)
tmp.sd   <- tapply( my.df$timeError, my.df$SNR, sd)

tmp.x <- unique(my.df$SNR)

plot( tmp.x, tmp.mean,
ylim=range(tmp.mean+3*tmp.sd,tmp.mean-3*tmp.sd),
	xlab='SNR',ylab='timeError')

segments(tmp.x, tmp.mean-3*tmp.sd, tmp.x, tmp.mean+3*tmp.sd,
col='green')

### optional
points(tmp.x, tmp.mean+3*tmp.sd, pch='-',cex=3,col='green')
points(tmp.x, tmp.mean-3*tmp.sd, pch='-',cex=3,col='green')
points(tmp.x, tmp.mean)

### end script ###

This may be even simpler with a loaded package. a quick search shows
the following functions (package in parens) that may help:

plotCI(gplots)          Plot Error Bars and Confidence Intervals
errbar(Hmisc)           Plot Error Bars
xYplot(Hmisc)           xyplot and dotplot with Matrix Variables to
Plot Error Bars and Bands

plotCI(plotrix)         Plot confidence intervals/error bars

errbar(sfsmisc)         Scatter Plot with Error Bars
plotCI(sfsmisc)         Plot Confidence Intervals / Error Bars

hope this helps,

with

very

Greg Snow, Ph.D.
Statistical Data Center, LDS Hospital
Intermountain Health Care
greg.snow at ihc.com
(801) 408-8111

Gabor Grothendieck

Tue, Sep 20, 2005 10:31 AM #

On 9/20/05, Greg Snow <greg.snow at ihc.com> wrote:

Just one comment here lest we be arguing against a strawman.
While I agree that reproducibility can be a problem with pivot tables
if created interactively and this applies to just about anything you do
in Excel if done interactively, it should also be realized that Excel is 
completely programmable, like R, using VBA or any language (including R!)
via its COM object interface. 

The fact that Excel has both an interactive interface and a script-based
interface whereas R has only a script-based interface puts it ahead, not 
behind, R in at least some respects.

Patrick Burns

Tue, Sep 20, 2005 11:46 AM #

Gabor Grothendieck wrote:

...

Sorry, but I can't resist:  That very much depends on if
you are doing something that is appropriate to be done
in a spreadsheet.  The set of tasks appropriate for R is
very much larger than the set appropriate for Excel.

http://www.burns-stat.com/pages/Tutor/spreadsheet_addiction.html

Patrick Burns
patrick at burns-stat.com
+44 (0)20 8525 0696
http://www.burns-stat.com
(home of S Poetry and "A Guide for the Unwilling S User")

Gabor Grothendieck

Tue, Sep 20, 2005 12:35 PM #

On 9/20/05, Patrick Burns <pburns at pburns.seanet.com> wrote:

I certainly don't want to be an apologist for Excel but I would
not asses its domain of applicability to be a subset of that of
R.  I agree with most of the points made in the link you cited 
but its mainly concerned  with stretching the use of spreadsheets 
to situations where R would be better  

At the same time the domain where spreadsheets are appropriate 
and preferable is very large and probably exceeds the domain where R
is preferable to Excel due to the fact that financial, accounting
and budgetary work done by every organization is mostly in the domain
of applicabilty of Excel.  Also I think the link overstates the case,
at least in reference to Excel, since some of the criticisms can
be overcome using Excel's scripting capability.