Skip to content

weighted average grouped by variables

9 messages · Massimo Bressan, Rui Barradas, Thierry Onkelinx +2 more

#
hi all 

I have this dataframe (created as a reproducible example) 

mydf<-structure(list(date_time = structure(c(1508238000, 1508238000, 1508238000, 1508238000, 1508238000, 1508238000, 1508238000), class = c("POSIXct", "POSIXt"), tzone = ""), 
direction = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("A", "B"), class = "factor"), 
type = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L), .Label = c("car", "light_duty", "heavy_duty", "motorcycle"), class = "factor"), 
avg_speed = c(41.1029082774049, 40.3333333333333, 40.3157894736842, 36.0869565217391, 33.4065155807365, 37.6222222222222, 35.5), 
n_vehicles = c(447L, 24L, 19L, 23L, 706L, 45L, 26L)), 
.Names = c("date_time", "direction", "type", "speed", "n_vehicles"), 
row.names = c(NA, -7L), 
class = "data.frame") 

mydf 

and I need to get to this final result 

mydf_final<-structure(list(date_time = structure(c(1508238000, 1508238000, 1508238000, 1508238000), class = c("POSIXct", "POSIXt"), tzone = ""), 
type = structure(c(1L, 2L, 3L, 4L), .Label = c("car", "light_duty", "heavy_duty", "motorcycle"), class = "factor"), 
weighted_avg_speed = c(36.39029, 38.56521, 37.53333, 36.08696), 
n_vehicles = c(1153L,69L,45L,23L)), 
.Names = c("date_time", "type", "weighted_avg_speed", "n_vehicles"), 
row.names = c(NA, -4L), 
class = "data.frame") 

mydf_final 


my question: 
how to compute a weighted mean i.e. "weighted_avg_speed" 
from "speed" (the values whose weighted mean is to be computed) and "n_vehicles" (the weights) 
grouped by "date_time" and "type"? 

to be noted the complication of the case "motorcycle" (not present in both directions) 

any help for that? 

thank you 

max
#
Hello 

an update about my question: I worked out the following solution (with the package "dplyr") 

library(dplyr) 

mydf%>% 
mutate(speed_vehicles=n_vehicles*mydf$speed) %>% 
group_by(date_time,type) %>% 
summarise( 
sum_n_times_speed=sum(speed_vehicles), 
n_vehicles=sum(n_vehicles), 
vel=sum(speed_vehicles)/sum(n_vehicles) 
) 


In fact I was hoping to manage everything in a "one-go": i.e. without the need to create the "intermediate" variable called "speed_vehicles" and with the use of the function weighted.mean() 

any hints for a different approach much appreciated 

thanks 



Da: "Massimo Bressan" <massimo.bressan at arpa.veneto.it> 
A: "r-help" <r-help at r-project.org> 
Inviato: Gioved?, 9 novembre 2017 12:20:52 
Oggetto: weighted average grouped by variables 

hi all 

I have this dataframe (created as a reproducible example) 

mydf<-structure(list(date_time = structure(c(1508238000, 1508238000, 1508238000, 1508238000, 1508238000, 1508238000, 1508238000), class = c("POSIXct", "POSIXt"), tzone = ""), 
direction = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("A", "B"), class = "factor"), 
type = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L), .Label = c("car", "light_duty", "heavy_duty", "motorcycle"), class = "factor"), 
avg_speed = c(41.1029082774049, 40.3333333333333, 40.3157894736842, 36.0869565217391, 33.4065155807365, 37.6222222222222, 35.5), 
n_vehicles = c(447L, 24L, 19L, 23L, 706L, 45L, 26L)), 
.Names = c("date_time", "direction", "type", "speed", "n_vehicles"), 
row.names = c(NA, -7L), 
class = "data.frame") 

mydf 

and I need to get to this final result 

mydf_final<-structure(list(date_time = structure(c(1508238000, 1508238000, 1508238000, 1508238000), class = c("POSIXct", "POSIXt"), tzone = ""), 
type = structure(c(1L, 2L, 3L, 4L), .Label = c("car", "light_duty", "heavy_duty", "motorcycle"), class = "factor"), 
weighted_avg_speed = c(36.39029, 38.56521, 37.53333, 36.08696), 
n_vehicles = c(1153L,69L,45L,23L)), 
.Names = c("date_time", "type", "weighted_avg_speed", "n_vehicles"), 
row.names = c(NA, -4L), 
class = "data.frame") 

mydf_final 


my question: 
how to compute a weighted mean i.e. "weighted_avg_speed" 
from "speed" (the values whose weighted mean is to be computed) and "n_vehicles" (the weights) 
grouped by "date_time" and "type"? 

to be noted the complication of the case "motorcycle" (not present in both directions) 

any help for that? 

thank you 

max
#
Hello,

Using base R only, the following seems to do what you want.

with(mydf, ave(speed, date_time, type, FUN = weighted.mean, w = n_vehicles))


Hope this helps,

Rui Barradas

Em 09-11-2017 13:16, Massimo Bressan escreveu:
#
Sorry, I messed up. Only checked the final result after sending the 
previous mail. The solution is wrong.

Rui Barradas

Em 09-11-2017 13:27, Rui Barradas escreveu:
#
Hi

Thanks for working example.

you could use split/ lapply approach, however it is probably not much better than dplyr method.

sapply(split(mydf, mydf$type), function(speed, n_vehicles) sum(mydf$speed*mydf$n_vehicles)/sum(mydf$n_vehicles))
gives you averages

aggregate(mydf$n_vehicles, list(mydf$type), sum)$x
gives you sums

Cheers
Petr
________________________________
Tento e-mail a jak?koliv k n?mu p?ipojen? dokumenty jsou d?v?rn? a jsou ur?eny pouze jeho adres?t?m.
Jestli?e jste obdr?el(a) tento e-mail omylem, informujte laskav? neprodlen? jeho odes?latele. Obsah tohoto emailu i s p??lohami a jeho kopie vyma?te ze sv?ho syst?mu.
Nejste-li zam??len?m adres?tem tohoto emailu, nejste opr?vn?ni tento email jakkoliv u??vat, roz?i?ovat, kop?rovat ?i zve?ej?ovat.
Odes?latel e-mailu neodpov?d? za eventu?ln? ?kodu zp?sobenou modifikacemi ?i zpo?d?n?m p?enosu e-mailu.

V p??pad?, ?e je tento e-mail sou??st? obchodn?ho jedn?n?:
- vyhrazuje si odes?latel pr?vo ukon?it kdykoliv jedn?n? o uzav?en? smlouvy, a to z jak?hokoliv d?vodu i bez uveden? d?vodu.
- a obsahuje-li nab?dku, je adres?t opr?vn?n nab?dku bezodkladn? p?ijmout; Odes?latel tohoto e-mailu (nab?dky) vylu?uje p?ijet? nab?dky ze strany p??jemce s dodatkem ?i odchylkou.
- trv? odes?latel na tom, ?e p??slu?n? smlouva je uzav?ena teprve v?slovn?m dosa?en?m shody na v?ech jej?ch n?le?itostech.
- odes?latel tohoto emailu informuje, ?e nen? opr?vn?n uzav?rat za spole?nost ??dn? smlouvy s v?jimkou p??pad?, kdy k tomu byl p?semn? zmocn?n nebo p?semn? pov??en a takov? pov??en? nebo pln? moc byly adres?tovi tohoto emailu p??padn? osob?, kterou adres?t zastupuje, p?edlo?eny nebo jejich existence je adres?tovi ?i osob? j?m zastoupen? zn?m?.

This e-mail and any documents attached to it may be confidential and are intended only for its intended recipients.
If you received this e-mail by mistake, please immediately inform its sender. Delete the contents of this e-mail with all attachments and its copies from your system.
If you are not the intended recipient of this e-mail, you are not authorized to use, disseminate, copy or disclose this e-mail in any manner.
The sender of this e-mail shall not be liable for any possible damage caused by modifications of the e-mail or by delay with transfer of the email.

In case that this e-mail forms part of business dealings:
- the sender reserves the right to end negotiations about entering into a contract in any time, for any reason, and without stating any reasoning.
- if the e-mail contains an offer, the recipient is entitled to immediately accept such offer; The sender of this e-mail (offer) excludes any acceptance of the offer on the part of the recipient containing any amendment or variation.
- the sender insists on that the respective contract is concluded only upon an express mutual agreement on all its aspects.
- the sender of this e-mail informs that he/she is not authorized to enter into any contracts on behalf of the company except for cases in which he/she is expressly authorized to do so in writing, and such authorization or power of attorney is submitted to the recipient or the person represented by the recipient, or the existence of such authorization is known to the recipient of the person represented by the recipient.
#
Dear Massimo,

It seems straightforward to use weighted.mean() in a dplyr context

library(dplyr)
mydf %>%
  group_by(date_time, type) %>%
  summarise(vel = weighted.mean(speed, n_vehicles))

Best regards,



ir. Thierry Onkelinx
Statisticus / Statistician

Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND
FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
thierry.onkelinx at inbo.be
Kliniekstraat 25, B-1070 Brussel
www.inbo.be

///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
///////////////////////////////////////////////////////////////////////////////////////////

[image: Van 14 tot en met 19 december 2017 verhuizen we uit onze vestiging
in Brussel naar het Herman Teirlinckgebouw op de site Thurn & Taxis. Vanaf
dan ben je welkom op het nieuwe adres: Havenlaan 88 bus 73, 1000 Brussel.]
<https://overheid.vlaanderen.be/mobiliteitsplan-herman-teirlinckgebouw>
Van 14 tot en met 19 december 2017 verhuizen we uit onze vestiging in
Brussel naar het Herman Teirlinckgebouw op de site Thurn & Taxis.
Vanaf dan ben je welkom op het nieuwe adres: Havenlaan 88 bus 73, 1000
Brussel.

///////////////////////////////////////////////////////////////////////////////////////////
<https://www.inbo.be>

2017-11-09 14:16 GMT+01:00 Massimo Bressan <massimo.bressan at arpa.veneto.it>:

  
  
#
hi thierry 

thanks for your reply 

yes, you are right, your solution is more straightforward 

best 


Da: "Thierry Onkelinx" <thierry.onkelinx at inbo.be> 
A: "Massimo Bressan" <massimo.bressan at arpa.veneto.it> 
Cc: "r-help" <r-help at r-project.org> 
Inviato: Gioved?, 9 novembre 2017 15:17:31 
Oggetto: Re: [R] weighted average grouped by variables 

Dear Massimo, 

It seems straightforward to use weighted.mean() in a dplyr context 

library(dplyr) 
mydf %>% 
group_by(date_time, type) %>% 
summarise(vel = weighted.mean(speed, n_vehicles)) 

Best regards, 



ir. Thierry Onkelinx 
Statisticus / Statistician 

Vlaamse Overheid / Government of Flanders 
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND FOREST 
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance 
thierry.onkelinx at inbo.be 
Kliniekstraat 25, B-1070 Brussel 
www.inbo.be 

/////////////////////////////////////////////////////////////////////////////////////////// 
To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher 
The plural of anecdote is not data. ~ Roger Brinner 
The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 
/////////////////////////////////////////////////////////////////////////////////////////// 


Van 14 tot en met 19 december 2017 verhuizen we uit onze vestiging in Brussel naar het Herman Teirlinckgebouw op de site Thurn & Taxis. 
Vanaf dan ben je welkom op het nieuwe adres: Havenlaan 88 bus 73, 1000 Brussel. 

///////////////////////////////////////////////////////////////////////////////////////////
1 day later
#
The result of this calculation is:

     car light_duty heavy_duty motorcycle 
  36.54109   36.54109   36.54109   36.54109 

But this doesn't give the same result as the dplyr method which is:

            date_time       type      vel
               <dttm>     <fctr>    <dbl>
1 2017-10-17 13:00:00        car 36.39029
2 2017-10-17 13:00:00 light_duty 38.56522
3 2017-10-17 13:00:00 heavy_duty 37.53333
4 2017-10-17 13:00:00 motorcycle 36.08696

The base R way of getting the result should be modified slightly into

sapply(split(mydf, mydf$type), function(Z) sum(Z$speed*Z$n_vehicles)/sum(Z$n_vehicles))

Calculations are done on the elements of the list provided by split.
The result now is:

      car light_duty heavy_duty motorcycle 
  36.39029   38.56522   37.53333   36.08696 

Obviously now the same as the dplyr method.

Berend Hasselman
1 day later
#
Hi Berend

Yes you are correct. My fault, I did not test it before sending.

Cheers
Petr
________________________________
Tento e-mail a jak?koliv k n?mu p?ipojen? dokumenty jsou d?v?rn? a jsou ur?eny pouze jeho adres?t?m.
Jestli?e jste obdr?el(a) tento e-mail omylem, informujte laskav? neprodlen? jeho odes?latele. Obsah tohoto emailu i s p??lohami a jeho kopie vyma?te ze sv?ho syst?mu.
Nejste-li zam??len?m adres?tem tohoto emailu, nejste opr?vn?ni tento email jakkoliv u??vat, roz?i?ovat, kop?rovat ?i zve?ej?ovat.
Odes?latel e-mailu neodpov?d? za eventu?ln? ?kodu zp?sobenou modifikacemi ?i zpo?d?n?m p?enosu e-mailu.

V p??pad?, ?e je tento e-mail sou??st? obchodn?ho jedn?n?:
- vyhrazuje si odes?latel pr?vo ukon?it kdykoliv jedn?n? o uzav?en? smlouvy, a to z jak?hokoliv d?vodu i bez uveden? d?vodu.
- a obsahuje-li nab?dku, je adres?t opr?vn?n nab?dku bezodkladn? p?ijmout; Odes?latel tohoto e-mailu (nab?dky) vylu?uje p?ijet? nab?dky ze strany p??jemce s dodatkem ?i odchylkou.
- trv? odes?latel na tom, ?e p??slu?n? smlouva je uzav?ena teprve v?slovn?m dosa?en?m shody na v?ech jej?ch n?le?itostech.
- odes?latel tohoto emailu informuje, ?e nen? opr?vn?n uzav?rat za spole?nost ??dn? smlouvy s v?jimkou p??pad?, kdy k tomu byl p?semn? zmocn?n nebo p?semn? pov??en a takov? pov??en? nebo pln? moc byly adres?tovi tohoto emailu p??padn? osob?, kterou adres?t zastupuje, p?edlo?eny nebo jejich existence je adres?tovi ?i osob? j?m zastoupen? zn?m?.

This e-mail and any documents attached to it may be confidential and are intended only for its intended recipients.
If you received this e-mail by mistake, please immediately inform its sender. Delete the contents of this e-mail with all attachments and its copies from your system.
If you are not the intended recipient of this e-mail, you are not authorized to use, disseminate, copy or disclose this e-mail in any manner.
The sender of this e-mail shall not be liable for any possible damage caused by modifications of the e-mail or by delay with transfer of the email.

In case that this e-mail forms part of business dealings:
- the sender reserves the right to end negotiations about entering into a contract in any time, for any reason, and without stating any reasoning.
- if the e-mail contains an offer, the recipient is entitled to immediately accept such offer; The sender of this e-mail (offer) excludes any acceptance of the offer on the part of the recipient containing any amendment or variation.
- the sender insists on that the respective contract is concluded only upon an express mutual agreement on all its aspects.
- the sender of this e-mail informs that he/she is not authorized to enter into any contracts on behalf of the company except for cases in which he/she is expressly authorized to do so in writing, and such authorization or power of attorney is submitted to the recipient or the person represented by the recipient, or the existence of such authorization is known to the recipient of the person represented by the recipient.