Skip to content

trying ti use a function in aggregate

7 messages · Rui Barradas, Jean V Adams, Sally_roman +1 more

#
Hi -I am using R v 2.13.0.  I am trying to use the aggregate function to
calculate the percent at length for each Trip_id and CommonName.  Here is a
small subset of the data.  
   Trip_id          Vessel       CommonName Length Count
1      230        Sunlight    Shad,American     19     1
2      230        Sunlight    Shad,American     20     1
3      230        Sunlight    Shad,American     21     1
4      230        Sunlight    Shad,American     23     1
5      230        Sunlight    Shad,American     26     1
6      230        Sunlight    Shad,American     27     1
7      230        Sunlight    Shad,American     30     2
8      230        Sunlight    Shad,American     33     1
9      230        Sunlight    Shad,American     34     1
10     230        Sunlight    Shad,American     37     1
11     230        Sunlight Herring,Blueback     20     1
12     230        Sunlight Herring,Blueback     21     2
13     230        Sunlight Herring,Blueback     22     5
14     230        Sunlight Herring,Blueback     26     1
15     230        Sunlight          Alewife     17     1
16     230        Sunlight          Alewife     18     1
17     230        Sunlight          Alewife     20     2
18     230        Sunlight          Alewife     21     4
19     230        Sunlight          Alewife     22    16
20     230        Sunlight          Alewife     23    22
21     230        Sunlight          Alewife     24    16
22     230        Sunlight          Alewife     25     4
23     230        Sunlight          Alewife     26     1
24     230        Sunlight          Alewife     27     2
25     230        Sunlight          Alewife     28     2
26     231 Western Venture    Shad,American     23     1
27     231 Western Venture    Shad,American     24     1
28     231 Western Venture    Shad,American     25     1
29     231 Western Venture    Shad,American     28     2
30     231 Western Venture    Shad,American     29     2

My code is:
myfun<-function (x) x/sum(x)
b<-with(data,aggregate(x=list(Percent=Count),by=list(Trip_id=Trip_id,Length=Length,Species=CommonName),
FUN="myfun"))

My issue is that the percent is not be calculated by Trip_id and CommonName. 
The result is that each row has a percent of 1 indicating that myfun is not
dividing by the sum of counts with a Trip_id/CommonName group.  Any help
would be appreciated.
Thank you 





--
View this message in context: http://r.789695.n4.nabble.com/trying-ti-use-a-function-in-aggregate-tp4647414.html
Sent from the R help mailing list archive at Nabble.com.
#
Hello,
Try the following.
(I've changed your function a bit. And named the data.frame 'dat', not 
'data', which is an R function.)


myfun <- function (x) ifelse(sum(x) == 0, 0, x/sum(x))

aggregate(Count ~ Trip_id + Length + CommonName, data = dat, myfun)

The output shows that each and every group corresponds to a single row 
of the original df. The 1's represent 100%, myfun _is_ dividing by 
sum(Count) Trip_id/Length/CommonName group. If you want just 
Trip_id/CommonName, use

aggregate(Count ~ Trip_id + CommonName, data = dat, myfun)


Or use your instruction without 'Length' in the by list:

b <- with(dat, aggregate(x=list(Percent=Count),
by=list(Trip_id=Trip_id, Species=CommonName),
FUN = myfun))
b
   Trip_id          Species    Percent
1     230          Alewife 0.01408451
2     230 Herring,Blueback 0.11111111
3     230    Shad,American 0.09090909
4     231    Shad,American 0.14285714

As you can see, the results are the same, with different output colnames.


Hope this helps,

Rui Barradas

Em 25-10-2012 15:19, Sally_roman escreveu:
#
Hi,
May be this helps:
dat1<-read.table(text="
?Trip_id????????? Vessel????? CommonName Length Count
1????? 230??????? Sunlight??? ShadAmerican??? 19??? 1
2????? 230??????? Sunlight??? ShadAmerican??? 20??? 1
3????? 230??????? Sunlight??? ShadAmerican??? 21??? 1
4????? 230??????? Sunlight??? ShadAmerican??? 23??? 1
5????? 230??????? Sunlight??? ShadAmerican??? 26??? 1
6????? 230??????? Sunlight??? ShadAmerican??? 27??? 1
7????? 230??????? Sunlight??? ShadAmerican??? 30??? 2
8????? 230??????? Sunlight??? ShadAmerican??? 33??? 1
9????? 230??????? Sunlight??? ShadAmerican??? 34??? 1
10??? 230??????? Sunlight??? ShadAmerican??? 37??? 1
11??? 230??????? Sunlight HerringBlueback??? 20??? 1
12??? 230??????? Sunlight HerringBlueback??? 21??? 2
13??? 230??????? Sunlight HerringBlueback??? 22??? 5
14??? 230??????? Sunlight HerringBlueback??? 26??? 1
15??? 230??????? Sunlight????????? Alewife??? 17??? 1
16??? 230??????? Sunlight????????? Alewife??? 18??? 1
17??? 230??????? Sunlight????????? Alewife??? 20??? 2
18??? 230??????? Sunlight????????? Alewife??? 21??? 4
19??? 230??????? Sunlight????????? Alewife??? 22??? 16
20??? 230??????? Sunlight????????? Alewife??? 23??? 22
21??? 230??????? Sunlight????????? Alewife??? 24??? 16
22??? 230??????? Sunlight????????? Alewife??? 25??? 4
23??? 230??????? Sunlight????????? Alewife??? 26??? 1
24??? 230??????? Sunlight????????? Alewife??? 27??? 2
25??? 230??????? Sunlight????????? Alewife??? 28??? 2
26??? 231 Western_Venture??? ShadAmerican??? 23??? 1
27??? 231 Western_Venture??? ShadAmerican??? 24??? 1
28??? 231 Western_Venture??? ShadAmerican??? 25??? 1
29??? 231 Western_Venture??? ShadAmerican??? 28??? 2
30??? 231 Western_Venture??? ShadAmerican??? 29??? 2
",sep="",header=TRUE,stringsAsFactors=FALSE)

with(dat1,aggregate(Count,by=list(Trip_id=Trip_id,Species=CommonName),function(x) x/sum(x)))
? Trip_id???????? Species
1???? 230???????? Alewife
2???? 230 HerringBlueback
3???? 230??? ShadAmerican
4???? 231??? ShadAmerican
?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? x
#1 0.01408451, 0.01408451, 0.02816901, 0.05633803, 0.22535211, 0.30985915, 0.22535211, 0.05633803, 0.01408451, 0.02816901, 0.02816901
#2???????????????????????????????????????????????????????????????????????????????????????? 0.1111111, 0.2222222, 0.5555556, 0.1111111
#3???????????? 0.09090909, 0.09090909, 0.09090909, 0.09090909, 0.09090909, 0.09090909, 0.18181818, 0.09090909, 0.09090909, 0.09090909
#4????????????????????????????????????????????????????????????????????????????? 0.1428571, 0.1428571, 0.1428571, 0.2857143, 0.2857143
#or 
library(plyr)
res<-ddply(dat1,.(Trip_id=Trip_id,Vessel=Vessel,CommonName=CommonName), summarize, Count/sum(Count))
colnames(res)[4]<-"value"
head(res)
#? Trip_id?? Vessel CommonName????? value
#1???? 230 Sunlight??? Alewife 0.01408451
#2???? 230 Sunlight??? Alewife 0.01408451
#3???? 230 Sunlight??? Alewife 0.02816901
#4???? 230 Sunlight??? Alewife 0.05633803
#5???? 230 Sunlight??? Alewife 0.22535211
#6???? 230 Sunlight??? Alewife 0.30985915


A.K.






----- Original Message -----
From: Sally_roman <sroman at umassd.edu>
To: r-help at r-project.org
Cc: 
Sent: Thursday, October 25, 2012 10:19 AM
Subject: [R] trying ti use a function in aggregate

Hi -I am using R v 2.13.0.? I am trying to use the aggregate function to
calculate the percent at length for each Trip_id and CommonName.? Here is a
small subset of the data.? 
?  Trip_id? ? ? ? ? Vessel? ? ?  CommonName Length Count
1? ? ? 230? ? ? ? Sunlight? ? Shad,American? ?  19? ?  1
2? ? ? 230? ? ? ? Sunlight? ? Shad,American? ?  20? ?  1
3? ? ? 230? ? ? ? Sunlight? ? Shad,American? ?  21? ?  1
4? ? ? 230? ? ? ? Sunlight? ? Shad,American? ?  23? ?  1
5? ? ? 230? ? ? ? Sunlight? ? Shad,American? ?  26? ?  1
6? ? ? 230? ? ? ? Sunlight? ? Shad,American? ?  27? ?  1
7? ? ? 230? ? ? ? Sunlight? ? Shad,American? ?  30? ?  2
8? ? ? 230? ? ? ? Sunlight? ? Shad,American? ?  33? ?  1
9? ? ? 230? ? ? ? Sunlight? ? Shad,American? ?  34? ?  1
10? ?  230? ? ? ? Sunlight? ? Shad,American? ?  37? ?  1
11? ?  230? ? ? ? Sunlight Herring,Blueback? ?  20? ?  1
12? ?  230? ? ? ? Sunlight Herring,Blueback? ?  21? ?  2
13? ?  230? ? ? ? Sunlight Herring,Blueback? ?  22? ?  5
14? ?  230? ? ? ? Sunlight Herring,Blueback? ?  26? ?  1
15? ?  230? ? ? ? Sunlight? ? ? ? ? Alewife? ?  17? ?  1
16? ?  230? ? ? ? Sunlight? ? ? ? ? Alewife? ?  18? ?  1
17? ?  230? ? ? ? Sunlight? ? ? ? ? Alewife? ?  20? ?  2
18? ?  230? ? ? ? Sunlight? ? ? ? ? Alewife? ?  21? ?  4
19? ?  230? ? ? ? Sunlight? ? ? ? ? Alewife? ?  22? ? 16
20? ?  230? ? ? ? Sunlight? ? ? ? ? Alewife? ?  23? ? 22
21? ?  230? ? ? ? Sunlight? ? ? ? ? Alewife? ?  24? ? 16
22? ?  230? ? ? ? Sunlight? ? ? ? ? Alewife? ?  25? ?  4
23? ?  230? ? ? ? Sunlight? ? ? ? ? Alewife? ?  26? ?  1
24? ?  230? ? ? ? Sunlight? ? ? ? ? Alewife? ?  27? ?  2
25? ?  230? ? ? ? Sunlight? ? ? ? ? Alewife? ?  28? ?  2
26? ?  231 Western Venture? ? Shad,American? ?  23? ?  1
27? ?  231 Western Venture? ? Shad,American? ?  24? ?  1
28? ?  231 Western Venture? ? Shad,American? ?  25? ?  1
29? ?  231 Western Venture? ? Shad,American? ?  28? ?  2
30? ?  231 Western Venture? ? Shad,American? ?  29? ?  2

My code is:
myfun<-function (x) x/sum(x)
b<-with(data,aggregate(x=list(Percent=Count),by=list(Trip_id=Trip_id,Length=Length,Species=CommonName),
FUN="myfun"))

My issue is that the percent is not be calculated by Trip_id and CommonName. 
The result is that each row has a percent of 1 indicating that myfun is not
dividing by the sum of counts with a Trip_id/CommonName group.? Any help
would be appreciated.
Thank you 





--
View this message in context: http://r.789695.n4.nabble.com/trying-ti-use-a-function-in-aggregate-tp4647414.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
#
HI,
In my previous solution, the order got messed up.? I should have ordered the columns.
Try this:
dat1<-read.table(text="
?Trip_id????????? Vessel????? CommonName Length Count
1????? 230??????? Sunlight??? ShadAmerican??? 19??? 1
2????? 230??????? Sunlight??? ShadAmerican??? 20??? 1
3????? 230??????? Sunlight??? ShadAmerican??? 21??? 1
4????? 230??????? Sunlight??? ShadAmerican??? 23??? 1
5????? 230??????? Sunlight??? ShadAmerican??? 26??? 1
6????? 230??????? Sunlight??? ShadAmerican??? 27??? 1
7????? 230??????? Sunlight??? ShadAmerican??? 30??? 2
8????? 230??????? Sunlight??? ShadAmerican??? 33??? 1
9????? 230??????? Sunlight??? ShadAmerican??? 34??? 1
10??? 230??????? Sunlight??? ShadAmerican??? 37??? 1
11??? 230??????? Sunlight HerringBlueback??? 20??? 1
12??? 230??????? Sunlight HerringBlueback??? 21??? 2
13??? 230??????? Sunlight HerringBlueback??? 22??? 5
14??? 230??????? Sunlight HerringBlueback??? 26??? 1
15??? 230??????? Sunlight????????? Alewife??? 17??? 1
16??? 230??????? Sunlight????????? Alewife??? 18??? 1
17??? 230??????? Sunlight????????? Alewife??? 20??? 2
18??? 230??????? Sunlight????????? Alewife??? 21??? 4
19??? 230??????? Sunlight????????? Alewife??? 22??? 16
20??? 230??????? Sunlight????????? Alewife??? 23??? 22
21??? 230??????? Sunlight????????? Alewife??? 24??? 16
22??? 230??????? Sunlight????????? Alewife??? 25??? 4
23??? 230??????? Sunlight????????? Alewife??? 26??? 1
24??? 230??????? Sunlight????????? Alewife??? 27??? 2
25??? 230??????? Sunlight????????? Alewife??? 28??? 2
26??? 231 Western_Venture??? ShadAmerican??? 23??? 1
27??? 231 Western_Venture??? ShadAmerican??? 24??? 1
28??? 231 Western_Venture??? ShadAmerican??? 25??? 1
29??? 231 Western_Venture??? ShadAmerican??? 28??? 2
30??? 231 Western_Venture??? ShadAmerican??? 29??? 2
",sep="",header=TRUE,stringsAsFactors=FALSE)
dat2<-dat1[order(dat1$Trip_id,dat1$Vessel,dat1$CommonName,dat1$Length,dat1$Count),]
dat3<-dat2
dat3$Prop<-unlist(tapply(dat3$Count,list(dat3$Trip_id,dat3$CommonName),function(x) x/sum(x)))


#Jean's method:

agg <- with(dat2, aggregate(data.frame(Total=Count), data.frame(Trip_id,
CommonName), sum))
# combine the totals with the full data frame
data2 <- merge(dat2, agg)
# then calculate proportions
data2$Prop <- data2$Count/data2$Total
data3<-data2[,-6]
data4<-data3[,c(1,3,2,4:6)]
rownames(dat3)<-1:nrow(dat3)
?identical(dat3,data4)
#[1] TRUE

head(dat3)
#? Trip_id?? Vessel CommonName Length Count?????? Prop
#1???? 230 Sunlight??? Alewife???? 17???? 1 0.01408451
#2???? 230 Sunlight??? Alewife???? 18???? 1 0.01408451
#3???? 230 Sunlight??? Alewife???? 20???? 2 0.02816901
#4???? 230 Sunlight??? Alewife???? 21???? 4 0.05633803
#5???? 230 Sunlight??? Alewife???? 22??? 16 0.22535211
#6???? 230 Sunlight??? Alewife???? 23??? 22 0.30985915
?head(data4)
#? Trip_id?? Vessel CommonName Length Count?????? Prop
#1???? 230 Sunlight??? Alewife???? 17???? 1 0.01408451
#2???? 230 Sunlight??? Alewife???? 18???? 1 0.01408451
#3???? 230 Sunlight??? Alewife???? 20???? 2 0.02816901
#4???? 230 Sunlight??? Alewife???? 21???? 4 0.05633803
#5???? 230 Sunlight??? Alewife???? 22??? 16 0.22535211
#6???? 230 Sunlight??? Alewife???? 23??? 22 0.30985915
A.K.





----- Original Message -----
From: Jean V Adams <jvadams at usgs.gov>
To: Sally_roman <sroman at umassd.edu>
Cc: r-help at r-project.org
Sent: Thursday, October 25, 2012 2:45 PM
Subject: Re: [R] trying ti use a function in aggregate

Sally,

It's great that you provided data and code.? To make it even more 
user-friendly for R-help readers, supply your data as Rcode, using (for 
example) the dput() function.

The reason you were getting all 1s with your code, is that you had told it 
to aggregate by trip, LENGTH, and species.? But the data are already 
summarized by trip, LENGTH, and species, so your myfun() function is 
calculating the count/count=1 for each row.? You could get rid of LENGTH 
to use your myfun() function, but the results aren't pretty ...

with(data, aggregate(data.frame(Total=Count), data.frame(Trip_id, 
CommonName), myfun))

Instead, I suggest you can use the aggregate function to calculate the 
total counts, then merge these totals with your original data to calculate 
the proportions.

# small subset of data
data <- structure(list(Trip_id = c(230L, 230L, 230L, 230L, 230L, 230L, 
230L, 230L, 230L, 230L, 230L, 230L, 230L, 230L, 230L, 230L, 230L, 
230L, 230L, 230L, 230L, 230L, 230L, 230L, 230L, 231L, 231L, 231L, 
231L, 231L), Vessel = c("Sunlight", "Sunlight", "Sunlight", "Sunlight", 
"Sunlight", "Sunlight", "Sunlight", "Sunlight", "Sunlight", "Sunlight", 
"Sunlight", "Sunlight", "Sunlight", "Sunlight", "Sunlight", "Sunlight", 
"Sunlight", "Sunlight", "Sunlight", "Sunlight", "Sunlight", "Sunlight", 
"Sunlight", "Sunlight", "Sunlight", "Western Venture", "Western Venture", 
"Western Venture", "Western Venture", "Western Venture"), CommonName = 
c("Shad,American", 
"Shad,American", "Shad,American", "Shad,American", "Shad,American", 
"Shad,American", "Shad,American", "Shad,American", "Shad,American", 
"Shad,American", "Herring,Blueback", "Herring,Blueback", 
"Herring,Blueback", 
"Herring,Blueback", "Alewife", "Alewife", "Alewife", "Alewife", 
"Alewife", "Alewife", "Alewife", "Alewife", "Alewife", "Alewife", 
"Alewife", "Shad,American", "Shad,American", "Shad,American", 
"Shad,American", "Shad,American"), Length = c(19L, 20L, 21L, 
23L, 26L, 27L, 30L, 33L, 34L, 37L, 20L, 21L, 22L, 26L, 17L, 18L, 
20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 23L, 24L, 25L, 28L, 
29L), Count = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 
5L, 1L, 1L, 1L, 2L, 4L, 16L, 22L, 16L, 4L, 1L, 2L, 2L, 1L, 1L, 
1L, 2L, 2L)), .Names = c("Trip_id", "Vessel", "CommonName", "Length", 
"Count"), row.names = c(NA, -30L), class = "data.frame")

# calculate the total count for each trip and Species
agg <- with(data, aggregate(data.frame(Total=Count), data.frame(Trip_id, 
CommonName), sum))

# combine the totals with the full data frame
data2 <- merge(data, agg)

# then calculate proportions
data2$Prop <- data2$Count/data2$Total

data2


Jean



Sally_roman <sroman at umassd.edu> wrote on 10/25/2012 09:19:57 AM:
is a
CommonName.
not
??? [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
#
Hi Rui,

I thought the OP was looking for something like this:?? May be I am wrong. 
dat2<-dat1[order(dat1$Trip_id,dat1$Vessel,dat1$CommonName,dat1$Length,dat1$Count),]
dat3<-dat2
dat3$Prop<-unlist(tapply(dat3$Count,list(dat3$Trip_id,dat3$CommonName),function(x) x/sum(x)))
?head(dat3)
#? Trip_id?? Vessel CommonName Length Count?????? Prop
#1???? 230 Sunlight??? Alewife???? 17???? 1 0.01408451
#2???? 230 Sunlight??? Alewife???? 18???? 1 0.01408451
#3???? 230 Sunlight??? Alewife???? 20???? 2 0.02816901
#4???? 230 Sunlight??? Alewife???? 21???? 4 0.05633803
#5???? 230 Sunlight??? Alewife???? 22??? 16 0.22535211
#6???? 230 Sunlight??? Alewife???? 23??? 22 0.30985915
A.K.




----- Original Message -----
From: Rui Barradas <ruipbarradas at sapo.pt>
To: Sally_roman <sroman at umassd.edu>
Cc: r-help at r-project.org
Sent: Thursday, October 25, 2012 11:59 AM
Subject: Re: [R] trying ti use a function in aggregate

Hello,
Try the following.
(I've changed your function a bit. And named the data.frame 'dat', not 
'data', which is an R function.)


myfun <- function (x) ifelse(sum(x) == 0, 0, x/sum(x))

aggregate(Count ~ Trip_id + Length + CommonName, data = dat, myfun)

The output shows that each and every group corresponds to a single row 
of the original df. The 1's represent 100%, myfun _is_ dividing by 
sum(Count) Trip_id/Length/CommonName group. If you want just 
Trip_id/CommonName, use

aggregate(Count ~ Trip_id + CommonName, data = dat, myfun)


Or use your instruction without 'Length' in the by list:

b <- with(dat, aggregate(x=list(Percent=Count),
by=list(Trip_id=Trip_id, Species=CommonName),
FUN = myfun))
b
?  Trip_id? ? ? ? ? Species? ? Percent
1? ?  230? ? ? ? ? Alewife 0.01408451
2? ?  230 Herring,Blueback 0.11111111
3? ?  230? ? Shad,American 0.09090909
4? ?  231? ? Shad,American 0.14285714

As you can see, the results are the same, with different output colnames.


Hope this helps,

Rui Barradas

Em 25-10-2012 15:19, Sally_roman escreveu:
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.