Skip to content

Grouping and Computing

3 messages · alexander.hener@gmx.de, Kaspar Pflugshaupt, Joerg Maeder

#
Hi group,

To mention it in advance, I am an R newbie, and most likely, my question is
more a mix of smaller, simpler tasks. Anyway, I got mixed up between by,
select, aggregate, lapply etc.
My problem is as follows : 

I have read data in and transformed them into a matrix for no special reason
so far. This matrix contains a column with regard to which I would like to
group, i.e. one realisation specifies one group. Neither the number of
occurences nor the value of these realisations is known in advance, which seems to
be the mayor problem. For each group separately then, I would like to compute
some aggregation function, namely the sum of a fraction of two columns. These
sums should be kept in form of another vector. 

My two questions are then

- Which object type (matrix, dataframe, list) lends itself to such a
problem?
- Do I have to create different objects for the groups, or can I compute the
vector of sums directly? And how?
 
Thanks in advance

Alexander Hener
#
On 7.2.2002 15:03 Uhr, alexander.hener at gmx.de wrote:

            
It's probably easiest with a dataframe, but you can also use a matrix
Do it directly, by all means. You can use any of tapply(), by() or
aggregate():
a           b
1  a  0.23158790
2  a -0.38852120
    [snip]
9  b -1.81645407
10 b -0.44034004
a        b 
1.282057 1.511260
INDICES: a
[1] 1.282057
------------------------------------------------------------
INDICES: b
[1] 1.51126
Group.1        x
1       a 1.282057
2       b 1.511260


See the functions' help texts and examples for further information. For me,
tapply() does all I need.

Cheers

Kaspar Pflugshaupt
#
Hello Alexander,

the function you are looking for is tapply and it works with dataframes
and matrixs.

here a small example
d <- data.frame(group=c(1,4,5,4,5,2,1),value=c(1,54,2,6,87,4,6))
tapply(d$value,d$group,mean)#arguments: the datas, the groups, the
function

will produce
   1    2    4    5 
 3.5  4.0 30.0 44.5 

you can also use factors to define the group, this has the advantage,
that you can give them a real name
d <-
data.frame(group=c('John','Jack','Joan','Jack','Jack','Joan','John'),value=c(1,54,2,6,87,4,6))
tapply(d$value,d$group,mean)

will produce
Jack Joan John 
49.0  3.0  3.5
alexander.hener at gmx.de wrote: