hi all
I have this dataframe (created as a reproducible example)
mydf<-structure(list(date_time = structure(c(1508238000, 1508238000, 1508238000, 1508238000, 1508238000, 1508238000, 1508238000), class = c("POSIXct", "POSIXt"), tzone = ""),
direction = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("A", "B"), class = "factor"),
type = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L), .Label = c("car", "light_duty", "heavy_duty", "motorcycle"), class = "factor"),
avg_speed = c(41.1029082774049, 40.3333333333333, 40.3157894736842, 36.0869565217391, 33.4065155807365, 37.6222222222222, 35.5),
n_vehicles = c(447L, 24L, 19L, 23L, 706L, 45L, 26L)),
.Names = c("date_time", "direction", "type", "speed", "n_vehicles"),
row.names = c(NA, -7L),
class = "data.frame")
mydf
and I need to get to this final result
mydf_final<-structure(list(date_time = structure(c(1508238000, 1508238000, 1508238000, 1508238000), class = c("POSIXct", "POSIXt"), tzone = ""),
type = structure(c(1L, 2L, 3L, 4L), .Label = c("car", "light_duty", "heavy_duty", "motorcycle"), class = "factor"),
weighted_avg_speed = c(36.39029, 38.56521, 37.53333, 36.08696),
n_vehicles = c(1153L,69L,45L,23L)),
.Names = c("date_time", "type", "weighted_avg_speed", "n_vehicles"),
row.names = c(NA, -4L),
class = "data.frame")
mydf_final
my question:
how to compute a weighted mean i.e. "weighted_avg_speed"
from "speed" (the values whose weighted mean is to be computed) and "n_vehicles" (the weights)
grouped by "date_time" and "type"?
to be noted the complication of the case "motorcycle" (not present in both directions)
any help for that?
thank you
max
weighted average grouped by variables
9 messages · Massimo Bressan, Rui Barradas, Thierry Onkelinx +2 more
Hello
an update about my question: I worked out the following solution (with the package "dplyr")
library(dplyr)
mydf%>%
mutate(speed_vehicles=n_vehicles*mydf$speed) %>%
group_by(date_time,type) %>%
summarise(
sum_n_times_speed=sum(speed_vehicles),
n_vehicles=sum(n_vehicles),
vel=sum(speed_vehicles)/sum(n_vehicles)
)
In fact I was hoping to manage everything in a "one-go": i.e. without the need to create the "intermediate" variable called "speed_vehicles" and with the use of the function weighted.mean()
any hints for a different approach much appreciated
thanks
Da: "Massimo Bressan" <massimo.bressan at arpa.veneto.it>
A: "r-help" <r-help at r-project.org>
Inviato: Gioved?, 9 novembre 2017 12:20:52
Oggetto: weighted average grouped by variables
hi all
I have this dataframe (created as a reproducible example)
mydf<-structure(list(date_time = structure(c(1508238000, 1508238000, 1508238000, 1508238000, 1508238000, 1508238000, 1508238000), class = c("POSIXct", "POSIXt"), tzone = ""),
direction = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("A", "B"), class = "factor"),
type = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L), .Label = c("car", "light_duty", "heavy_duty", "motorcycle"), class = "factor"),
avg_speed = c(41.1029082774049, 40.3333333333333, 40.3157894736842, 36.0869565217391, 33.4065155807365, 37.6222222222222, 35.5),
n_vehicles = c(447L, 24L, 19L, 23L, 706L, 45L, 26L)),
.Names = c("date_time", "direction", "type", "speed", "n_vehicles"),
row.names = c(NA, -7L),
class = "data.frame")
mydf
and I need to get to this final result
mydf_final<-structure(list(date_time = structure(c(1508238000, 1508238000, 1508238000, 1508238000), class = c("POSIXct", "POSIXt"), tzone = ""),
type = structure(c(1L, 2L, 3L, 4L), .Label = c("car", "light_duty", "heavy_duty", "motorcycle"), class = "factor"),
weighted_avg_speed = c(36.39029, 38.56521, 37.53333, 36.08696),
n_vehicles = c(1153L,69L,45L,23L)),
.Names = c("date_time", "type", "weighted_avg_speed", "n_vehicles"),
row.names = c(NA, -4L),
class = "data.frame")
mydf_final
my question:
how to compute a weighted mean i.e. "weighted_avg_speed"
from "speed" (the values whose weighted mean is to be computed) and "n_vehicles" (the weights)
grouped by "date_time" and "type"?
to be noted the complication of the case "motorcycle" (not present in both directions)
any help for that?
thank you
max
------------------------------------------------------------ Massimo Bressan ARPAV Agenzia Regionale per la Prevenzione e Protezione Ambientale del Veneto Dipartimento Provinciale di Treviso Via Santa Barbara, 5/a 31100 Treviso, Italy tel: +39 0422 558545 fax: +39 0422 558516 e-mail: massimo.bressan at arpa.veneto.it ------------------------------------------------------------ [[alternative HTML version deleted]]
Hello, Using base R only, the following seems to do what you want. with(mydf, ave(speed, date_time, type, FUN = weighted.mean, w = n_vehicles)) Hope this helps, Rui Barradas Em 09-11-2017 13:16, Massimo Bressan escreveu:
Hello
an update about my question: I worked out the following solution (with the package "dplyr")
library(dplyr)
mydf%>%
mutate(speed_vehicles=n_vehicles*mydf$speed) %>%
group_by(date_time,type) %>%
summarise(
sum_n_times_speed=sum(speed_vehicles),
n_vehicles=sum(n_vehicles),
vel=sum(speed_vehicles)/sum(n_vehicles)
)
In fact I was hoping to manage everything in a "one-go": i.e. without the need to create the "intermediate" variable called "speed_vehicles" and with the use of the function weighted.mean()
any hints for a different approach much appreciated
thanks
Da: "Massimo Bressan" <massimo.bressan at arpa.veneto.it>
A: "r-help" <r-help at r-project.org>
Inviato: Gioved?, 9 novembre 2017 12:20:52
Oggetto: weighted average grouped by variables
hi all
I have this dataframe (created as a reproducible example)
mydf<-structure(list(date_time = structure(c(1508238000, 1508238000, 1508238000, 1508238000, 1508238000, 1508238000, 1508238000), class = c("POSIXct", "POSIXt"), tzone = ""),
direction = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("A", "B"), class = "factor"),
type = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L), .Label = c("car", "light_duty", "heavy_duty", "motorcycle"), class = "factor"),
avg_speed = c(41.1029082774049, 40.3333333333333, 40.3157894736842, 36.0869565217391, 33.4065155807365, 37.6222222222222, 35.5),
n_vehicles = c(447L, 24L, 19L, 23L, 706L, 45L, 26L)),
.Names = c("date_time", "direction", "type", "speed", "n_vehicles"),
row.names = c(NA, -7L),
class = "data.frame")
mydf
and I need to get to this final result
mydf_final<-structure(list(date_time = structure(c(1508238000, 1508238000, 1508238000, 1508238000), class = c("POSIXct", "POSIXt"), tzone = ""),
type = structure(c(1L, 2L, 3L, 4L), .Label = c("car", "light_duty", "heavy_duty", "motorcycle"), class = "factor"),
weighted_avg_speed = c(36.39029, 38.56521, 37.53333, 36.08696),
n_vehicles = c(1153L,69L,45L,23L)),
.Names = c("date_time", "type", "weighted_avg_speed", "n_vehicles"),
row.names = c(NA, -4L),
class = "data.frame")
mydf_final
my question:
how to compute a weighted mean i.e. "weighted_avg_speed"
from "speed" (the values whose weighted mean is to be computed) and "n_vehicles" (the weights)
grouped by "date_time" and "type"?
to be noted the complication of the case "motorcycle" (not present in both directions)
any help for that?
thank you
max
Sorry, I messed up. Only checked the final result after sending the previous mail. The solution is wrong. Rui Barradas Em 09-11-2017 13:27, Rui Barradas escreveu:
Hello, Using base R only, the following seems to do what you want. with(mydf, ave(speed, date_time, type, FUN = weighted.mean, w = n_vehicles)) Hope this helps, Rui Barradas Em 09-11-2017 13:16, Massimo Bressan escreveu:
Hello
an update about my question: I worked out the following solution (with
the package "dplyr")
library(dplyr)
mydf%>%
mutate(speed_vehicles=n_vehicles*mydf$speed) %>%
group_by(date_time,type) %>%
summarise(
sum_n_times_speed=sum(speed_vehicles),
n_vehicles=sum(n_vehicles),
vel=sum(speed_vehicles)/sum(n_vehicles)
)
In fact I was hoping to manage everything in a "one-go": i.e. without
the need to create the "intermediate" variable called "speed_vehicles"
and with the use of the function weighted.mean()
any hints for a different approach much appreciated
thanks
Da: "Massimo Bressan" <massimo.bressan at arpa.veneto.it>
A: "r-help" <r-help at r-project.org>
Inviato: Gioved?, 9 novembre 2017 12:20:52
Oggetto: weighted average grouped by variables
hi all
I have this dataframe (created as a reproducible example)
mydf<-structure(list(date_time = structure(c(1508238000, 1508238000,
1508238000, 1508238000, 1508238000, 1508238000, 1508238000), class =
c("POSIXct", "POSIXt"), tzone = ""),
direction = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("A",
"B"), class = "factor"),
type = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L), .Label = c("car",
"light_duty", "heavy_duty", "motorcycle"), class = "factor"),
avg_speed = c(41.1029082774049, 40.3333333333333, 40.3157894736842,
36.0869565217391, 33.4065155807365, 37.6222222222222, 35.5),
n_vehicles = c(447L, 24L, 19L, 23L, 706L, 45L, 26L)),
.Names = c("date_time", "direction", "type", "speed", "n_vehicles"),
row.names = c(NA, -7L),
class = "data.frame")
mydf
and I need to get to this final result
mydf_final<-structure(list(date_time = structure(c(1508238000,
1508238000, 1508238000, 1508238000), class = c("POSIXct", "POSIXt"),
tzone = ""),
type = structure(c(1L, 2L, 3L, 4L), .Label = c("car", "light_duty",
"heavy_duty", "motorcycle"), class = "factor"),
weighted_avg_speed = c(36.39029, 38.56521, 37.53333, 36.08696),
n_vehicles = c(1153L,69L,45L,23L)),
.Names = c("date_time", "type", "weighted_avg_speed", "n_vehicles"),
row.names = c(NA, -4L),
class = "data.frame")
mydf_final
my question:
how to compute a weighted mean i.e. "weighted_avg_speed"
from "speed" (the values whose weighted mean is to be computed) and
"n_vehicles" (the weights)
grouped by "date_time" and "type"?
to be noted the complication of the case "motorcycle" (not present in
both directions)
any help for that?
thank you
max
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Hi Thanks for working example. you could use split/ lapply approach, however it is probably not much better than dplyr method. sapply(split(mydf, mydf$type), function(speed, n_vehicles) sum(mydf$speed*mydf$n_vehicles)/sum(mydf$n_vehicles)) gives you averages aggregate(mydf$n_vehicles, list(mydf$type), sum)$x gives you sums Cheers Petr
-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Massimo
Bressan
Sent: Thursday, November 9, 2017 2:17 PM
To: r-help <r-help at r-project.org>
Subject: Re: [R] weighted average grouped by variables
Hello
an update about my question: I worked out the following solution (with the
package "dplyr")
library(dplyr)
mydf%>%
mutate(speed_vehicles=n_vehicles*mydf$speed) %>%
group_by(date_time,type) %>%
summarise(
sum_n_times_speed=sum(speed_vehicles),
n_vehicles=sum(n_vehicles),
vel=sum(speed_vehicles)/sum(n_vehicles)
)
In fact I was hoping to manage everything in a "one-go": i.e. without the need
to create the "intermediate" variable called "speed_vehicles" and with the use
of the function weighted.mean()
any hints for a different approach much appreciated
thanks
Da: "Massimo Bressan" <massimo.bressan at arpa.veneto.it>
A: "r-help" <r-help at r-project.org>
Inviato: Gioved?, 9 novembre 2017 12:20:52
Oggetto: weighted average grouped by variables
hi all
I have this dataframe (created as a reproducible example)
mydf<-structure(list(date_time = structure(c(1508238000, 1508238000,
1508238000, 1508238000, 1508238000, 1508238000, 1508238000), class =
c("POSIXct", "POSIXt"), tzone = ""), direction = structure(c(1L, 1L, 1L, 1L, 2L, 2L,
2L), .Label = c("A", "B"), class = "factor"), type = structure(c(1L, 2L, 3L, 4L, 1L,
2L, 3L), .Label = c("car", "light_duty", "heavy_duty", "motorcycle"), class =
"factor"), avg_speed = c(41.1029082774049, 40.3333333333333,
40.3157894736842, 36.0869565217391, 33.4065155807365,
37.6222222222222, 35.5), n_vehicles = c(447L, 24L, 19L, 23L, 706L, 45L, 26L)),
.Names = c("date_time", "direction", "type", "speed", "n_vehicles"), row.names
= c(NA, -7L), class = "data.frame")
mydf
and I need to get to this final result
mydf_final<-structure(list(date_time = structure(c(1508238000, 1508238000,
1508238000, 1508238000), class = c("POSIXct", "POSIXt"), tzone = ""), type =
structure(c(1L, 2L, 3L, 4L), .Label = c("car", "light_duty", "heavy_duty",
"motorcycle"), class = "factor"), weighted_avg_speed = c(36.39029, 38.56521,
37.53333, 36.08696), n_vehicles = c(1153L,69L,45L,23L)), .Names =
c("date_time", "type", "weighted_avg_speed", "n_vehicles"), row.names =
c(NA, -4L), class = "data.frame")
mydf_final
my question:
how to compute a weighted mean i.e. "weighted_avg_speed"
from "speed" (the values whose weighted mean is to be computed) and
"n_vehicles" (the weights) grouped by "date_time" and "type"?
to be noted the complication of the case "motorcycle" (not present in both
directions)
any help for that?
thank you
max
--
------------------------------------------------------------
Massimo Bressan
ARPAV
Agenzia Regionale per la Prevenzione e
Protezione Ambientale del Veneto
Dipartimento Provinciale di Treviso
Via Santa Barbara, 5/a
31100 Treviso, Italy
tel: +39 0422 558545
fax: +39 0422 558516
e-mail: massimo.bressan at arpa.veneto.it
------------------------------------------------------------
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
________________________________ Tento e-mail a jak?koliv k n?mu p?ipojen? dokumenty jsou d?v?rn? a jsou ur?eny pouze jeho adres?t?m. Jestli?e jste obdr?el(a) tento e-mail omylem, informujte laskav? neprodlen? jeho odes?latele. Obsah tohoto emailu i s p??lohami a jeho kopie vyma?te ze sv?ho syst?mu. Nejste-li zam??len?m adres?tem tohoto emailu, nejste opr?vn?ni tento email jakkoliv u??vat, roz?i?ovat, kop?rovat ?i zve?ej?ovat. Odes?latel e-mailu neodpov?d? za eventu?ln? ?kodu zp?sobenou modifikacemi ?i zpo?d?n?m p?enosu e-mailu. V p??pad?, ?e je tento e-mail sou??st? obchodn?ho jedn?n?: - vyhrazuje si odes?latel pr?vo ukon?it kdykoliv jedn?n? o uzav?en? smlouvy, a to z jak?hokoliv d?vodu i bez uveden? d?vodu. - a obsahuje-li nab?dku, je adres?t opr?vn?n nab?dku bezodkladn? p?ijmout; Odes?latel tohoto e-mailu (nab?dky) vylu?uje p?ijet? nab?dky ze strany p??jemce s dodatkem ?i odchylkou. - trv? odes?latel na tom, ?e p??slu?n? smlouva je uzav?ena teprve v?slovn?m dosa?en?m shody na v?ech jej?ch n?le?itostech. - odes?latel tohoto emailu informuje, ?e nen? opr?vn?n uzav?rat za spole?nost ??dn? smlouvy s v?jimkou p??pad?, kdy k tomu byl p?semn? zmocn?n nebo p?semn? pov??en a takov? pov??en? nebo pln? moc byly adres?tovi tohoto emailu p??padn? osob?, kterou adres?t zastupuje, p?edlo?eny nebo jejich existence je adres?tovi ?i osob? j?m zastoupen? zn?m?. This e-mail and any documents attached to it may be confidential and are intended only for its intended recipients. If you received this e-mail by mistake, please immediately inform its sender. Delete the contents of this e-mail with all attachments and its copies from your system. If you are not the intended recipient of this e-mail, you are not authorized to use, disseminate, copy or disclose this e-mail in any manner. The sender of this e-mail shall not be liable for any possible damage caused by modifications of the e-mail or by delay with transfer of the email. In case that this e-mail forms part of business dealings: - the sender reserves the right to end negotiations about entering into a contract in any time, for any reason, and without stating any reasoning. - if the e-mail contains an offer, the recipient is entitled to immediately accept such offer; The sender of this e-mail (offer) excludes any acceptance of the offer on the part of the recipient containing any amendment or variation. - the sender insists on that the respective contract is concluded only upon an express mutual agreement on all its aspects. - the sender of this e-mail informs that he/she is not authorized to enter into any contracts on behalf of the company except for cases in which he/she is expressly authorized to do so in writing, and such authorization or power of attorney is submitted to the recipient or the person represented by the recipient, or the existence of such authorization is known to the recipient of the person represented by the recipient.
Dear Massimo, It seems straightforward to use weighted.mean() in a dplyr context library(dplyr) mydf %>% group_by(date_time, type) %>% summarise(vel = weighted.mean(speed, n_vehicles)) Best regards, ir. Thierry Onkelinx Statisticus / Statistician Vlaamse Overheid / Government of Flanders INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND FOREST Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance thierry.onkelinx at inbo.be Kliniekstraat 25, B-1070 Brussel www.inbo.be /////////////////////////////////////////////////////////////////////////////////////////// To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey /////////////////////////////////////////////////////////////////////////////////////////// [image: Van 14 tot en met 19 december 2017 verhuizen we uit onze vestiging in Brussel naar het Herman Teirlinckgebouw op de site Thurn & Taxis. Vanaf dan ben je welkom op het nieuwe adres: Havenlaan 88 bus 73, 1000 Brussel.] <https://overheid.vlaanderen.be/mobiliteitsplan-herman-teirlinckgebouw> Van 14 tot en met 19 december 2017 verhuizen we uit onze vestiging in Brussel naar het Herman Teirlinckgebouw op de site Thurn & Taxis. Vanaf dan ben je welkom op het nieuwe adres: Havenlaan 88 bus 73, 1000 Brussel. /////////////////////////////////////////////////////////////////////////////////////////// <https://www.inbo.be> 2017-11-09 14:16 GMT+01:00 Massimo Bressan <massimo.bressan at arpa.veneto.it>:
Hello
an update about my question: I worked out the following solution (with the
package "dplyr")
library(dplyr)
mydf%>%
mutate(speed_vehicles=n_vehicles*mydf$speed) %>%
group_by(date_time,type) %>%
summarise(
sum_n_times_speed=sum(speed_vehicles),
n_vehicles=sum(n_vehicles),
vel=sum(speed_vehicles)/sum(n_vehicles)
)
In fact I was hoping to manage everything in a "one-go": i.e. without the
need to create the "intermediate" variable called "speed_vehicles" and with
the use of the function weighted.mean()
any hints for a different approach much appreciated
thanks
Da: "Massimo Bressan" <massimo.bressan at arpa.veneto.it>
A: "r-help" <r-help at r-project.org>
Inviato: Gioved?, 9 novembre 2017 12:20:52
Oggetto: weighted average grouped by variables
hi all
I have this dataframe (created as a reproducible example)
mydf<-structure(list(date_time = structure(c(1508238000, 1508238000,
1508238000, 1508238000, 1508238000, 1508238000, 1508238000), class =
c("POSIXct", "POSIXt"), tzone = ""),
direction = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("A", "B"),
class = "factor"),
type = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L), .Label = c("car",
"light_duty", "heavy_duty", "motorcycle"), class = "factor"),
avg_speed = c(41.1029082774049, 40.3333333333333, 40.3157894736842,
36.0869565217391, 33.4065155807365, 37.6222222222222, 35.5),
n_vehicles = c(447L, 24L, 19L, 23L, 706L, 45L, 26L)),
.Names = c("date_time", "direction", "type", "speed", "n_vehicles"),
row.names = c(NA, -7L),
class = "data.frame")
mydf
and I need to get to this final result
mydf_final<-structure(list(date_time = structure(c(1508238000,
1508238000, 1508238000, 1508238000), class = c("POSIXct", "POSIXt"), tzone
= ""),
type = structure(c(1L, 2L, 3L, 4L), .Label = c("car", "light_duty",
"heavy_duty", "motorcycle"), class = "factor"),
weighted_avg_speed = c(36.39029, 38.56521, 37.53333, 36.08696),
n_vehicles = c(1153L,69L,45L,23L)),
.Names = c("date_time", "type", "weighted_avg_speed", "n_vehicles"),
row.names = c(NA, -4L),
class = "data.frame")
mydf_final
my question:
how to compute a weighted mean i.e. "weighted_avg_speed"
from "speed" (the values whose weighted mean is to be computed) and
"n_vehicles" (the weights)
grouped by "date_time" and "type"?
to be noted the complication of the case "motorcycle" (not present in both
directions)
any help for that?
thank you
max
--
------------------------------------------------------------
Massimo Bressan
ARPAV
Agenzia Regionale per la Prevenzione e
Protezione Ambientale del Veneto
Dipartimento Provinciale di Treviso
Via Santa Barbara, 5/a
31100 Treviso, Italy
tel: +39 0422 558545
fax: +39 0422 558516
e-mail: massimo.bressan at arpa.veneto.it
------------------------------------------------------------
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/ posting-guide.html and provide commented, minimal, self-contained, reproducible code.
hi thierry thanks for your reply yes, you are right, your solution is more straightforward best Da: "Thierry Onkelinx" <thierry.onkelinx at inbo.be> A: "Massimo Bressan" <massimo.bressan at arpa.veneto.it> Cc: "r-help" <r-help at r-project.org> Inviato: Gioved?, 9 novembre 2017 15:17:31 Oggetto: Re: [R] weighted average grouped by variables Dear Massimo, It seems straightforward to use weighted.mean() in a dplyr context library(dplyr) mydf %>% group_by(date_time, type) %>% summarise(vel = weighted.mean(speed, n_vehicles)) Best regards, ir. Thierry Onkelinx Statisticus / Statistician Vlaamse Overheid / Government of Flanders INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND FOREST Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance thierry.onkelinx at inbo.be Kliniekstraat 25, B-1070 Brussel www.inbo.be /////////////////////////////////////////////////////////////////////////////////////////// To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey /////////////////////////////////////////////////////////////////////////////////////////// Van 14 tot en met 19 december 2017 verhuizen we uit onze vestiging in Brussel naar het Herman Teirlinckgebouw op de site Thurn & Taxis. Vanaf dan ben je welkom op het nieuwe adres: Havenlaan 88 bus 73, 1000 Brussel. ///////////////////////////////////////////////////////////////////////////////////////////
------------------------------------------------------------ Massimo Bressan ARPAV Agenzia Regionale per la Prevenzione e Protezione Ambientale del Veneto Dipartimento Provinciale di Treviso Via Santa Barbara, 5/a 31100 Treviso, Italy tel: +39 0422 558545 fax: +39 0422 558516 e-mail: massimo.bressan at arpa.veneto.it ------------------------------------------------------------ [[alternative HTML version deleted]]
1 day later
On 9 Nov 2017, at 14:58, PIKAL Petr <petr.pikal at precheza.cz> wrote: Hi Thanks for working example. you could use split/ lapply approach, however it is probably not much better than dplyr method. sapply(split(mydf, mydf$type), function(speed, n_vehicles) sum(mydf$speed*mydf$n_vehicles)/sum(mydf$n_vehicles)) gives you averages
The result of this calculation is:
car light_duty heavy_duty motorcycle
36.54109 36.54109 36.54109 36.54109
But this doesn't give the same result as the dplyr method which is:
date_time type vel
<dttm> <fctr> <dbl>
1 2017-10-17 13:00:00 car 36.39029
2 2017-10-17 13:00:00 light_duty 38.56522
3 2017-10-17 13:00:00 heavy_duty 37.53333
4 2017-10-17 13:00:00 motorcycle 36.08696
The base R way of getting the result should be modified slightly into
sapply(split(mydf, mydf$type), function(Z) sum(Z$speed*Z$n_vehicles)/sum(Z$n_vehicles))
Calculations are done on the elements of the list provided by split.
The result now is:
car light_duty heavy_duty motorcycle
36.39029 38.56522 37.53333 36.08696
Obviously now the same as the dplyr method.
Berend Hasselman
aggregate(mydf$n_vehicles, list(mydf$type), sum)$x gives you sums Cheers Petr
-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Massimo
Bressan
Sent: Thursday, November 9, 2017 2:17 PM
To: r-help <r-help at r-project.org>
Subject: Re: [R] weighted average grouped by variables
Hello
an update about my question: I worked out the following solution (with the
package "dplyr")
library(dplyr)
mydf%>%
mutate(speed_vehicles=n_vehicles*mydf$speed) %>%
group_by(date_time,type) %>%
summarise(
sum_n_times_speed=sum(speed_vehicles),
n_vehicles=sum(n_vehicles),
vel=sum(speed_vehicles)/sum(n_vehicles)
)
In fact I was hoping to manage everything in a "one-go": i.e. without the need
to create the "intermediate" variable called "speed_vehicles" and with the use
of the function weighted.mean()
any hints for a different approach much appreciated
thanks
Da: "Massimo Bressan" <massimo.bressan at arpa.veneto.it>
A: "r-help" <r-help at r-project.org>
Inviato: Gioved?, 9 novembre 2017 12:20:52
Oggetto: weighted average grouped by variables
hi all
I have this dataframe (created as a reproducible example)
mydf<-structure(list(date_time = structure(c(1508238000, 1508238000,
1508238000, 1508238000, 1508238000, 1508238000, 1508238000), class =
c("POSIXct", "POSIXt"), tzone = ""), direction = structure(c(1L, 1L, 1L, 1L, 2L, 2L,
2L), .Label = c("A", "B"), class = "factor"), type = structure(c(1L, 2L, 3L, 4L, 1L,
2L, 3L), .Label = c("car", "light_duty", "heavy_duty", "motorcycle"), class =
"factor"), avg_speed = c(41.1029082774049, 40.3333333333333,
40.3157894736842, 36.0869565217391, 33.4065155807365,
37.6222222222222, 35.5), n_vehicles = c(447L, 24L, 19L, 23L, 706L, 45L, 26L)),
.Names = c("date_time", "direction", "type", "speed", "n_vehicles"), row.names
= c(NA, -7L), class = "data.frame")
mydf
and I need to get to this final result
mydf_final<-structure(list(date_time = structure(c(1508238000, 1508238000,
1508238000, 1508238000), class = c("POSIXct", "POSIXt"), tzone = ""), type =
structure(c(1L, 2L, 3L, 4L), .Label = c("car", "light_duty", "heavy_duty",
"motorcycle"), class = "factor"), weighted_avg_speed = c(36.39029, 38.56521,
37.53333, 36.08696), n_vehicles = c(1153L,69L,45L,23L)), .Names =
c("date_time", "type", "weighted_avg_speed", "n_vehicles"), row.names =
c(NA, -4L), class = "data.frame")
mydf_final
my question:
how to compute a weighted mean i.e. "weighted_avg_speed"
from "speed" (the values whose weighted mean is to be computed) and
"n_vehicles" (the weights) grouped by "date_time" and "type"?
to be noted the complication of the case "motorcycle" (not present in both
directions)
any help for that?
thank you
max
--
------------------------------------------------------------
Massimo Bressan
ARPAV
Agenzia Regionale per la Prevenzione e
Protezione Ambientale del Veneto
Dipartimento Provinciale di Treviso
Via Santa Barbara, 5/a
31100 Treviso, Italy
tel: +39 0422 558545
fax: +39 0422 558516
e-mail: massimo.bressan at arpa.veneto.it
------------------------------------------------------------
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
________________________________ Tento e-mail a jak?koliv k n?mu p?ipojen? dokumenty jsou d?v?rn? a jsou ur?eny pouze jeho adres?t?m. Jestli?e jste obdr?el(a) tento e-mail omylem, informujte laskav? neprodlen? jeho odes?latele. Obsah tohoto emailu i s p??lohami a jeho kopie vyma?te ze sv?ho syst?mu. Nejste-li zam??len?m adres?tem tohoto emailu, nejste opr?vn?ni tento email jakkoliv u??vat, roz?i?ovat, kop?rovat ?i zve?ej?ovat. Odes?latel e-mailu neodpov?d? za eventu?ln? ?kodu zp?sobenou modifikacemi ?i zpo?d?n?m p?enosu e-mailu. V p??pad?, ?e je tento e-mail sou??st? obchodn?ho jedn?n?: - vyhrazuje si odes?latel pr?vo ukon?it kdykoliv jedn?n? o uzav?en? smlouvy, a to z jak?hokoliv d?vodu i bez uveden? d?vodu. - a obsahuje-li nab?dku, je adres?t opr?vn?n nab?dku bezodkladn? p?ijmout; Odes?latel tohoto e-mailu (nab?dky) vylu?uje p?ijet? nab?dky ze strany p??jemce s dodatkem ?i odchylkou. - trv? odes?latel na tom, ?e p??slu?n? smlouva je uzav?ena teprve v?slovn?m dosa?en?m shody na v?ech jej?ch n?le?itostech. - odes?latel tohoto emailu informuje, ?e nen? opr?vn?n uzav?rat za spole?nost ??dn? smlouvy s v?jimkou p??pad?, kdy k tomu byl p?semn? zmocn?n nebo p?semn? pov??en a takov? pov??en? nebo pln? moc byly adres?tovi tohoto emailu p??padn? osob?, kterou adres?t zastupuje, p?edlo?eny nebo jejich existence je adres?tovi ?i osob? j?m zastoupen? zn?m?. This e-mail and any documents attached to it may be confidential and are intended only for its intended recipients. If you received this e-mail by mistake, please immediately inform its sender. Delete the contents of this e-mail with all attachments and its copies from your system. If you are not the intended recipient of this e-mail, you are not authorized to use, disseminate, copy or disclose this e-mail in any manner. The sender of this e-mail shall not be liable for any possible damage caused by modifications of the e-mail or by delay with transfer of the email. In case that this e-mail forms part of business dealings: - the sender reserves the right to end negotiations about entering into a contract in any time, for any reason, and without stating any reasoning. - if the e-mail contains an offer, the recipient is entitled to immediately accept such offer; The sender of this e-mail (offer) excludes any acceptance of the offer on the part of the recipient containing any amendment or variation. - the sender insists on that the respective contract is concluded only upon an express mutual agreement on all its aspects. - the sender of this e-mail informs that he/she is not authorized to enter into any contracts on behalf of the company except for cases in which he/she is expressly authorized to do so in writing, and such authorization or power of attorney is submitted to the recipient or the person represented by the recipient, or the existence of such authorization is known to the recipient of the person represented by the recipient. ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
1 day later
Hi Berend Yes you are correct. My fault, I did not test it before sending. Cheers Petr
-----Original Message----- From: Berend Hasselman [mailto:bhh at xs4all.nl] Sent: Saturday, November 11, 2017 11:26 AM To: PIKAL Petr <petr.pikal at precheza.cz> Cc: Massimo Bressan <massimo.bressan at arpa.veneto.it>; r-help <r-help at r- project.org> Subject: Re: [R] weighted average grouped by variables
On 9 Nov 2017, at 14:58, PIKAL Petr <petr.pikal at precheza.cz> wrote: Hi Thanks for working example. you could use split/ lapply approach, however it is probably not much better
than dplyr method.
sapply(split(mydf, mydf$type), function(speed, n_vehicles) sum(mydf$speed*mydf$n_vehicles)/sum(mydf$n_vehicles)) gives you averages
The result of this calculation is:
car light_duty heavy_duty motorcycle
36.54109 36.54109 36.54109 36.54109
But this doesn't give the same result as the dplyr method which is:
date_time type vel
<dttm> <fctr> <dbl>
1 2017-10-17 13:00:00 car 36.39029
2 2017-10-17 13:00:00 light_duty 38.56522
3 2017-10-17 13:00:00 heavy_duty 37.53333
4 2017-10-17 13:00:00 motorcycle 36.08696
The base R way of getting the result should be modified slightly into
sapply(split(mydf, mydf$type), function(Z)
sum(Z$speed*Z$n_vehicles)/sum(Z$n_vehicles))
Calculations are done on the elements of the list provided by split.
The result now is:
car light_duty heavy_duty motorcycle
36.39029 38.56522 37.53333 36.08696
Obviously now the same as the dplyr method.
Berend Hasselman
aggregate(mydf$n_vehicles, list(mydf$type), sum)$x gives you sums Cheers Petr
-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of
Massimo Bressan
Sent: Thursday, November 9, 2017 2:17 PM
To: r-help <r-help at r-project.org>
Subject: Re: [R] weighted average grouped by variables
Hello
an update about my question: I worked out the following solution
(with the package "dplyr")
library(dplyr)
mydf%>%
mutate(speed_vehicles=n_vehicles*mydf$speed) %>%
group_by(date_time,type) %>%
summarise(
sum_n_times_speed=sum(speed_vehicles),
n_vehicles=sum(n_vehicles),
vel=sum(speed_vehicles)/sum(n_vehicles)
)
In fact I was hoping to manage everything in a "one-go": i.e. without
the need to create the "intermediate" variable called
"speed_vehicles" and with the use of the function weighted.mean()
any hints for a different approach much appreciated
thanks
Da: "Massimo Bressan" <massimo.bressan at arpa.veneto.it>
A: "r-help" <r-help at r-project.org>
Inviato: Gioved?, 9 novembre 2017 12:20:52
Oggetto: weighted average grouped by variables
hi all
I have this dataframe (created as a reproducible example)
mydf<-structure(list(date_time = structure(c(1508238000, 1508238000,
1508238000, 1508238000, 1508238000, 1508238000, 1508238000), class =
c("POSIXct", "POSIXt"), tzone = ""), direction = structure(c(1L, 1L,
1L, 1L, 2L, 2L, 2L), .Label = c("A", "B"), class = "factor"), type =
structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L), .Label = c("car",
"light_duty", "heavy_duty", "motorcycle"), class = "factor"),
avg_speed = c(41.1029082774049, 40.3333333333333, 40.3157894736842,
36.0869565217391, 33.4065155807365, 37.6222222222222, 35.5),
n_vehicles = c(447L, 24L, 19L, 23L, 706L, 45L, 26L)), .Names =
c("date_time", "direction", "type", "speed", "n_vehicles"), row.names
= c(NA, -7L), class = "data.frame")
mydf
and I need to get to this final result
mydf_final<-structure(list(date_time = structure(c(1508238000,
1508238000, 1508238000, 1508238000), class = c("POSIXct", "POSIXt"),
tzone = ""), type = structure(c(1L, 2L, 3L, 4L), .Label = c("car",
"light_duty", "heavy_duty", "motorcycle"), class = "factor"),
weighted_avg_speed = c(36.39029, 38.56521, 37.53333, 36.08696),
n_vehicles = c(1153L,69L,45L,23L)), .Names = c("date_time", "type",
"weighted_avg_speed", "n_vehicles"), row.names = c(NA, -4L), class =
"data.frame")
mydf_final
my question:
how to compute a weighted mean i.e. "weighted_avg_speed"
from "speed" (the values whose weighted mean is to be computed) and
"n_vehicles" (the weights) grouped by "date_time" and "type"?
to be noted the complication of the case "motorcycle" (not present in
both
directions)
any help for that?
thank you
max
--
------------------------------------------------------------
Massimo Bressan
ARPAV
Agenzia Regionale per la Prevenzione e Protezione Ambientale del
Veneto
Dipartimento Provinciale di Treviso
Via Santa Barbara, 5/a
31100 Treviso, Italy
tel: +39 0422 558545
fax: +39 0422 558516
e-mail: massimo.bressan at arpa.veneto.it
------------------------------------------------------------
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
________________________________ Tento e-mail a jak?koliv k n?mu p?ipojen? dokumenty jsou d?v?rn? a jsou
ur?eny pouze jeho adres?t?m.
Jestli?e jste obdr?el(a) tento e-mail omylem, informujte laskav? neprodlen?
jeho odes?latele. Obsah tohoto emailu i s p??lohami a jeho kopie vyma?te ze sv?ho syst?mu.
Nejste-li zam??len?m adres?tem tohoto emailu, nejste opr?vn?ni tento email
jakkoliv u??vat, roz?i?ovat, kop?rovat ?i zve?ej?ovat.
Odes?latel e-mailu neodpov?d? za eventu?ln? ?kodu zp?sobenou modifikacemi
?i zpo?d?n?m p?enosu e-mailu.
V p??pad?, ?e je tento e-mail sou??st? obchodn?ho jedn?n?: - vyhrazuje si odes?latel pr?vo ukon?it kdykoliv jedn?n? o uzav?en? smlouvy, a
to z jak?hokoliv d?vodu i bez uveden? d?vodu.
- a obsahuje-li nab?dku, je adres?t opr?vn?n nab?dku bezodkladn? p?ijmout;
Odes?latel tohoto e-mailu (nab?dky) vylu?uje p?ijet? nab?dky ze strany p??jemce s dodatkem ?i odchylkou.
- trv? odes?latel na tom, ?e p??slu?n? smlouva je uzav?ena teprve v?slovn?m
dosa?en?m shody na v?ech jej?ch n?le?itostech.
- odes?latel tohoto emailu informuje, ?e nen? opr?vn?n uzav?rat za spole?nost
??dn? smlouvy s v?jimkou p??pad?, kdy k tomu byl p?semn? zmocn?n nebo p?semn? pov??en a takov? pov??en? nebo pln? moc byly adres?tovi tohoto emailu p??padn? osob?, kterou adres?t zastupuje, p?edlo?eny nebo jejich existence je adres?tovi ?i osob? j?m zastoupen? zn?m?.
This e-mail and any documents attached to it may be confidential and are
intended only for its intended recipients.
If you received this e-mail by mistake, please immediately inform its sender.
Delete the contents of this e-mail with all attachments and its copies from your system.
If you are not the intended recipient of this e-mail, you are not authorized to
use, disseminate, copy or disclose this e-mail in any manner.
The sender of this e-mail shall not be liable for any possible damage caused
by modifications of the e-mail or by delay with transfer of the email.
In case that this e-mail forms part of business dealings: - the sender reserves the right to end negotiations about entering into a
contract in any time, for any reason, and without stating any reasoning.
- if the e-mail contains an offer, the recipient is entitled to immediately
accept such offer; The sender of this e-mail (offer) excludes any acceptance of the offer on the part of the recipient containing any amendment or variation.
- the sender insists on that the respective contract is concluded only upon an
express mutual agreement on all its aspects.
- the sender of this e-mail informs that he/she is not authorized to enter into
any contracts on behalf of the company except for cases in which he/she is expressly authorized to do so in writing, and such authorization or power of attorney is submitted to the recipient or the person represented by the recipient, or the existence of such authorization is known to the recipient of the person represented by the recipient.
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
________________________________ Tento e-mail a jak?koliv k n?mu p?ipojen? dokumenty jsou d?v?rn? a jsou ur?eny pouze jeho adres?t?m. Jestli?e jste obdr?el(a) tento e-mail omylem, informujte laskav? neprodlen? jeho odes?latele. Obsah tohoto emailu i s p??lohami a jeho kopie vyma?te ze sv?ho syst?mu. Nejste-li zam??len?m adres?tem tohoto emailu, nejste opr?vn?ni tento email jakkoliv u??vat, roz?i?ovat, kop?rovat ?i zve?ej?ovat. Odes?latel e-mailu neodpov?d? za eventu?ln? ?kodu zp?sobenou modifikacemi ?i zpo?d?n?m p?enosu e-mailu. V p??pad?, ?e je tento e-mail sou??st? obchodn?ho jedn?n?: - vyhrazuje si odes?latel pr?vo ukon?it kdykoliv jedn?n? o uzav?en? smlouvy, a to z jak?hokoliv d?vodu i bez uveden? d?vodu. - a obsahuje-li nab?dku, je adres?t opr?vn?n nab?dku bezodkladn? p?ijmout; Odes?latel tohoto e-mailu (nab?dky) vylu?uje p?ijet? nab?dky ze strany p??jemce s dodatkem ?i odchylkou. - trv? odes?latel na tom, ?e p??slu?n? smlouva je uzav?ena teprve v?slovn?m dosa?en?m shody na v?ech jej?ch n?le?itostech. - odes?latel tohoto emailu informuje, ?e nen? opr?vn?n uzav?rat za spole?nost ??dn? smlouvy s v?jimkou p??pad?, kdy k tomu byl p?semn? zmocn?n nebo p?semn? pov??en a takov? pov??en? nebo pln? moc byly adres?tovi tohoto emailu p??padn? osob?, kterou adres?t zastupuje, p?edlo?eny nebo jejich existence je adres?tovi ?i osob? j?m zastoupen? zn?m?. This e-mail and any documents attached to it may be confidential and are intended only for its intended recipients. If you received this e-mail by mistake, please immediately inform its sender. Delete the contents of this e-mail with all attachments and its copies from your system. If you are not the intended recipient of this e-mail, you are not authorized to use, disseminate, copy or disclose this e-mail in any manner. The sender of this e-mail shall not be liable for any possible damage caused by modifications of the e-mail or by delay with transfer of the email. In case that this e-mail forms part of business dealings: - the sender reserves the right to end negotiations about entering into a contract in any time, for any reason, and without stating any reasoning. - if the e-mail contains an offer, the recipient is entitled to immediately accept such offer; The sender of this e-mail (offer) excludes any acceptance of the offer on the part of the recipient containing any amendment or variation. - the sender insists on that the respective contract is concluded only upon an express mutual agreement on all its aspects. - the sender of this e-mail informs that he/she is not authorized to enter into any contracts on behalf of the company except for cases in which he/she is expressly authorized to do so in writing, and such authorization or power of attorney is submitted to the recipient or the person represented by the recipient, or the existence of such authorization is known to the recipient of the person represented by the recipient.