[R-meta] Estimate variance from time series data - R-SIG-meta-analysis

Mon, Aug 13, 2018 10:23 AM #

Dear list members,

I am doing a meta-analysis with data that are often presented as 
repeated measures of population densities, but authors sometimes also 
give overall averages and s.d. or s.e.. Because I want to combine these 
data into one analysis, I am interested in the overall effect size of 
the repeated measures, so would like to combine all data of the time 
series into one average and s.d. The time series are repeated several 
times, yielding data of the following form:
Time                Treatment 1                Treatment 2
                         N    Ave    s.d.                N    Ave    s.d.
1                      N1    x1,1    sd1,1          N2    x2,1    sd2,1
2                      N1    x1,2    sd1,2           n2    x2,2    sd2,2
...
...
...

What I want to obtain is one average and s.d. per treatment through time.
The average is straightforward, but I cannot come up with a calculation 
for the s.d.

The formula normally used for calculating the combined variance of two 
series of measurements:

Var = (s1^2(n1 -- 1) + s2^2(n2 -- 1) + n1(X-x1)^22 + n2(X-x2)^22)/( (n1 
+ n2 -- 1)

does not seem to apply when combining the measurements through time, 
because this increases the number of replicates, which in my opinion, 
should be the number of time series and not the number of observations.

I hope I made myself clear, and would be very grateful if you could 
advise me on this matter.

Thanks very much in advance.
Arne Janssen

Wolfgang Viechtbauer

Tue, Aug 14, 2018 1:50 PM #

Hi Arne,

It is not entirely clear to me what you are trying to do. Do you want to know the mean and SD when throwing together the N1 measurements from timepoint 1 and the N1 measurements from timepoint 2 from the same group, such that there are 2*N1 measurements in total now for the group? (or 3*N1 if there were three timepoints and so on). Then the same equation could be used as if there are independent subgroups.

For example:

### Suppose we have the mean, SD, and size of several subgroups, but we
### need the mean and SD of the total/combined groups. Code below shows
### what we need to compute to obtain this.

### simulate some data
n.total <- 100
grp <- sample(1:4, size=n.total, replace=TRUE)
y   <- rnorm(n.total, mean=grp, sd=2)

### means and SDs of the subgroups
ni  <- c(by(y, grp, length))
mi  <- c(by(y, grp, mean))
sdi <- c(by(y, grp, sd))

### want to get mean and SD of the total group
mean(y)
sd(y)

### mean = weighted mean (weights = group sizes)
m.total <- sum(ni*mi)/sum(ni)

### SD = sqrt((within-group sum-of-squares plus between-group sum-of-squares) / (n.total - 1))
sd.total <- sqrt((sum((ni-1) * sdi^2) + sum(ni*(mi - m.total)^2)) / (sum(ni) - 1))

### check that we get the right values
m.total
sd.total

This would be the case for independent subgroups. Now let's simulate data for 50 individuals measured twice:

library(MASS)

Y <- mvrnorm(50, mu=c(0,0), Sigma=matrix(c(1, .8, .8, 1), nrow=2))
y <- c(t(Y))
grp <- c(1:50, 1:50)

### means and SDs of the subgroups
ni  <- c(by(y, grp, length))
mi  <- c(by(y, grp, mean))
sdi <- c(by(y, grp, sd))

### want to get mean and SD of the total group
mean(y)
sd(y)

### mean = weighted mean (weights = group sizes)
m.total <- sum(ni*mi)/sum(ni)

### SD = sqrt((within-group sum-of-squares plus between-group sum-of-squares) / (n.total - 1))
sd.total <- sqrt((sum((ni-1) * sdi^2) + sum(ni*(mi - m.total)^2)) / (sum(ni) - 1))

### check that we get the right values
m.total
sd.total

Still works. However, when it comes to computing the sampling variance for m.total (or some function thereof), one cannot treat these two cases as the same. In the first case, we really have sum(ni) independent measurements, so var(y) / sum(ni) would be the correct sampling variance of m.total, but not so for the second case. You would need to know the correlation between the measurements over time to compute an appropriate sampling variance of m.total in the second case.

Best,
Wolfgang

-----Original Message-----
From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces at r-project.org] On Behalf Of Arne Janssen
Sent: Monday, 13 August, 2018 19:23
To: r-sig-meta-analysis at r-project.org
Subject: [R-meta] Estimate variance from time series data

Dear list members,

I am doing a meta-analysis with data that are often presented as 
repeated measures of population densities, but authors sometimes also 
give overall averages and s.d. or s.e.. Because I want to combine these 
data into one analysis, I am interested in the overall effect size of 
the repeated measures, so would like to combine all data of the time 
series into one average and s.d. The time series are repeated several 
times, yielding data of the following form:
Time                Treatment 1                Treatment 2
                         N    Ave    s.d.                N    Ave    s.d.
1                      N1    x1,1    sd1,1          N2    x2,1    sd2,1
2                      N1    x1,2    sd1,2           n2    x2,2    sd2,2
...
...
...

What I want to obtain is one average and s.d. per treatment through time.
The average is straightforward, but I cannot come up with a calculation 
for the s.d.

The formula normally used for calculating the combined variance of two 
series of measurements:

Var = (s1^2(n1 -- 1) + s2^2(n2 -- 1) + n1(X-x1)^22 + n2(X-x2)^22)/( (n1 
+ n2 -- 1)

does not seem to apply when combining the measurements through time, 
because this increases the number of replicates, which in my opinion, 
should be the number of time series and not the number of observations.

I hope I made myself clear, and would be very grateful if you could 
advise me on this matter.

Thanks very much in advance.
Arne Janssen

Arne Janssen

Wed, Aug 15, 2018 5:10 AM #

Dear Wolfgang,

Thanks for your quick reply. The question really is what is the sample 
size. Suppose there are 3 time series of 5 data points through time each 
that I want to combine. Given are the average and s.d. of the 3 series 
per time (so 5 averages and s.d.).

I would like to obtain an overall average  and s.d. of these 3 time 
series. If we consider that N = 15, I can use the standard method to 
combine the 3 series. If we consider N to be 3, because there are only 3 
time series, I would indeed need to know the correlation among the time 
series to estimate the s.d., but this correlation is unknown. Please advise.

Thanks and best wishes,
Arne

On 14-Aug-18 22:50, Viechtbauer, Wolfgang (SP) wrote:

Reply: This is indeed what I want to do.

Here is my doubt: The sum(ni) is now larger than the number of 
replicates (4 time series, so 4 replicates, n should be 4), am I correct?

### check that we get the right values
m.total
sd.total

This would be the case for independent subgroups. Now let's simulate data for 50 individuals measured twice:

library(MASS)

Y<- mvrnorm(50, mu=c(0,0), Sigma=matrix(c(1, .8, .8, 1), nrow=2))
y<- c(t(Y))
grp<- c(1:50, 1:50)

### means and SDs of the subgroups
ni<- c(by(y, grp, length))
mi<- c(by(y, grp, mean))
sdi<- c(by(y, grp, sd))

### want to get mean and SD of the total group
mean(y)
sd(y)

### mean = weighted mean (weights = group sizes)
m.total<- sum(ni*mi)/sum(ni)

### SD = sqrt((within-group sum-of-squares plus between-group sum-of-squares) / (n.total - 1))
sd.total<- sqrt((sum((ni-1) * sdi^2) + sum(ni*(mi - m.total)^2)) / (sum(ni) - 1))

### check that we get the right values
m.total
sd.total

Still works. However, when it comes to computing the sampling variance for m.total (or some function thereof), one cannot treat these two cases as the same. In the first case, we really have sum(ni) independent measurements, so var(y) / sum(ni) would be the correct sampling variance of m.total, but not so for the second case. You would need to know the correlation between the measurements over time to compute an appropriate sampling variance of m.total in the second case.

Best,
Wolfgang

-----Original Message-----
From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces at r-project.org] On Behalf Of Arne Janssen
Sent: Monday, 13 August, 2018 19:23
To: r-sig-meta-analysis at r-project.org
Subject: [R-meta] Estimate variance from time series data

Dear list members,

I am doing a meta-analysis with data that are often presented as
repeated measures of population densities, but authors sometimes also
give overall averages and s.d. or s.e.. Because I want to combine these
data into one analysis, I am interested in the overall effect size of
the repeated measures, so would like to combine all data of the time
series into one average and s.d. The time series are repeated several
times, yielding data of the following form:
Time                Treatment 1                Treatment 2
                          N    Ave    s.d.                N    Ave    s.d.
1                      N1    x1,1    sd1,1          N2    x2,1    sd2,1
2                      N1    x1,2    sd1,2           n2    x2,2    sd2,2
...
...
...

What I want to obtain is one average and s.d. per treatment through time.
The average is straightforward, but I cannot come up with a calculation
for the s.d.

The formula normally used for calculating the combined variance of two
series of measurements:

Var = (s1^2(n1 -- 1) + s2^2(n2 -- 1) + n1(X-x1)^22 + n2(X-x2)^22)/( (n1
+ n2 -- 1)

does not seem to apply when combining the measurements through time,
because this increases the number of replicates, which in my opinion,
should be the number of time series and not the number of observations.

I hope I made myself clear, and would be very grateful if you could
advise me on this matter.

Thanks very much in advance.
Arne Janssen
.

Wolfgang Viechtbauer

Wed, Aug 15, 2018 5:26 AM #

Dear Arne,

In this example, there are 15 observations in total. The code I provided shows how to obtain the mean and standard deviation of these 15 observations. However, these 15 observations are not independent and hence any sampling variance you compute for the combined mean (or some function thereof) would need to take the degree of correlation into consideration.

Best,
Wolfgang

-----Original Message-----
From: Arne Janssen [mailto:arne.janssen at uva.nl] 
Sent: Wednesday, 15 August, 2018 14:10
To: Viechtbauer, Wolfgang (SP); r-sig-meta-analysis at r-project.org
Subject: Re: [R-meta] Estimate variance from time series data

Dear Wolfgang,

Thanks for your quick reply. The question really is what is the sample 
size. Suppose there are 3 time series of 5 data points through time each 
that I want to combine. Given are the average and s.d. of the 3 series 
per time (so 5 averages and s.d.).

I would like to obtain an overall average  and s.d. of these 3 time 
series. If we consider that N = 15, I can use the standard method to 
combine the 3 series. If we consider N to be 3, because there are only 3 
time series, I would indeed need to know the correlation among the time 
series to estimate the s.d., but this correlation is unknown. Please advise.

Thanks and best wishes,
Arne

On 14-Aug-18 22:50, Viechtbauer, Wolfgang (SP) wrote:

Reply: This is indeed what I want to do.

Here is my doubt: The sum(ni) is now larger than the number of 
replicates (4 time series, so 4 replicates, n should be 4), am I correct?

### check that we get the right values
m.total
sd.total

This would be the case for independent subgroups. Now let's simulate data for 50 individuals measured twice:

library(MASS)

Y<- mvrnorm(50, mu=c(0,0), Sigma=matrix(c(1, .8, .8, 1), nrow=2))
y<- c(t(Y))
grp<- c(1:50, 1:50)

### means and SDs of the subgroups
ni<- c(by(y, grp, length))
mi<- c(by(y, grp, mean))
sdi<- c(by(y, grp, sd))

### want to get mean and SD of the total group
mean(y)
sd(y)

### mean = weighted mean (weights = group sizes)
m.total<- sum(ni*mi)/sum(ni)

### SD = sqrt((within-group sum-of-squares plus between-group sum-of-squares) / (n.total - 1))
sd.total<- sqrt((sum((ni-1) * sdi^2) + sum(ni*(mi - m.total)^2)) / (sum(ni) - 1))

### check that we get the right values
m.total
sd.total

Still works. However, when it comes to computing the sampling variance for m.total (or some function thereof), one cannot treat these two cases as the same. In the first case, we really have sum(ni) independent measurements, so var(y) / sum(ni) would be the correct sampling variance of m.total, but not so for the second case. You would need to know the correlation between the measurements over time to compute an appropriate sampling variance of m.total in the second case.

Best,
Wolfgang

-----Original Message-----
From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces at r-project.org] On Behalf Of Arne Janssen
Sent: Monday, 13 August, 2018 19:23
To: r-sig-meta-analysis at r-project.org
Subject: [R-meta] Estimate variance from time series data

Dear list members,

I am doing a meta-analysis with data that are often presented as
repeated measures of population densities, but authors sometimes also
give overall averages and s.d. or s.e.. Because I want to combine these
data into one analysis, I am interested in the overall effect size of
the repeated measures, so would like to combine all data of the time
series into one average and s.d. The time series are repeated several
times, yielding data of the following form:
Time                Treatment 1                Treatment 2
                          N    Ave    s.d.                N    Ave    s.d.
1                      N1    x1,1    sd1,1          N2    x2,1    sd2,1
2                      N1    x1,2    sd1,2           n2    x2,2    sd2,2
...
...
...

What I want to obtain is one average and s.d. per treatment through time.
The average is straightforward, but I cannot come up with a calculation
for the s.d.

The formula normally used for calculating the combined variance of two
series of measurements:

Var = (s1^2(n1 -- 1) + s2^2(n2 -- 1) + n1(X-x1)^22 + n2(X-x2)^22)/( (n1
+ n2 -- 1)

does not seem to apply when combining the measurements through time,
because this increases the number of replicates, which in my opinion,
should be the number of time series and not the number of observations.

I hope I made myself clear, and would be very grateful if you could
advise me on this matter.

Thanks very much in advance.
Arne Janssen

Arne Janssen

Wed, Aug 15, 2018 5:35 AM #

Dear Wolfgang,

Exactly, and there's the problem, because the correlations are never 
reported. So what do do in this case?

Best wishes,
Arne

On 15-Aug-18 14:26, Viechtbauer, Wolfgang (SP) wrote:

Dear Arne,

In this example, there are 15 observations in total. The code I provided shows how to obtain the mean and standard deviation of these 15 observations. However, these 15 observations are not independent and hence any sampling variance you compute for the combined mean (or some function thereof) would need to take the degree of correlation into consideration.

Best,
Wolfgang

-----Original Message-----
From: Arne Janssen [mailto:arne.janssen at uva.nl]
Sent: Wednesday, 15 August, 2018 14:10
To: Viechtbauer, Wolfgang (SP); r-sig-meta-analysis at r-project.org
Subject: Re: [R-meta] Estimate variance from time series data

Dear Wolfgang,

Thanks for your quick reply. The question really is what is the sample
size. Suppose there are 3 time series of 5 data points through time each
that I want to combine. Given are the average and s.d. of the 3 series
per time (so 5 averages and s.d.).

I would like to obtain an overall average  and s.d. of these 3 time
series. If we consider that N = 15, I can use the standard method to
combine the 3 series. If we consider N to be 3, because there are only 3
time series, I would indeed need to know the correlation among the time
series to estimate the s.d., but this correlation is unknown. Please advise.

Thanks and best wishes,
Arne

On 14-Aug-18 22:50, Viechtbauer, Wolfgang (SP) wrote:

Hi Arne,

It is not entirely clear to me what you are trying to do. Do you want to know the mean and SD when throwing together the N1 measurements from timepoint 1 and the N1 measurements from timepoint 2 from the same group, such that there are 2*N1 measurements in total now for the group? (or 3*N1 if there were three timepoints and so on).

Reply: This is indeed what I want to do.

   Then the same equation could be used as if there are independent subgroups.

For example:

### Suppose we have the mean, SD, and size of several subgroups, but we
### need the mean and SD of the total/combined groups. Code below shows
### what we need to compute to obtain this.

### simulate some data
n.total<- 100
grp<- sample(1:4, size=n.total, replace=TRUE)
y<- rnorm(n.total, mean=grp, sd=2)

### means and SDs of the subgroups
ni<- c(by(y, grp, length))
mi<- c(by(y, grp, mean))
sdi<- c(by(y, grp, sd))

### want to get mean and SD of the total group
mean(y)
sd(y)

### mean = weighted mean (weights = group sizes)
m.total<- sum(ni*mi)/sum(ni)

### SD = sqrt((within-group sum-of-squares plus between-group sum-of-squares) / (n.total - 1))
sd.total<- sqrt((sum((ni-1) * sdi^2) + sum(ni*(mi - m.total)^2)) / (sum(ni) - 1))

Here is my doubt: The sum(ni) is now larger than the number of
replicates (4 time series, so 4 replicates, n should be 4), am I correct?

### check that we get the right values
m.total
sd.total

This would be the case for independent subgroups. Now let's simulate data for 50 individuals measured twice:

library(MASS)

Y<- mvrnorm(50, mu=c(0,0), Sigma=matrix(c(1, .8, .8, 1), nrow=2))
y<- c(t(Y))
grp<- c(1:50, 1:50)

### means and SDs of the subgroups
ni<- c(by(y, grp, length))
mi<- c(by(y, grp, mean))
sdi<- c(by(y, grp, sd))

### want to get mean and SD of the total group
mean(y)
sd(y)

### mean = weighted mean (weights = group sizes)
m.total<- sum(ni*mi)/sum(ni)

### SD = sqrt((within-group sum-of-squares plus between-group sum-of-squares) / (n.total - 1))
sd.total<- sqrt((sum((ni-1) * sdi^2) + sum(ni*(mi - m.total)^2)) / (sum(ni) - 1))

### check that we get the right values
m.total
sd.total

Still works. However, when it comes to computing the sampling variance for m.total (or some function thereof), one cannot treat these two cases as the same. In the first case, we really have sum(ni) independent measurements, so var(y) / sum(ni) would be the correct sampling variance of m.total, but not so for the second case. You would need to know the correlation between the measurements over time to compute an appropriate sampling variance of m.total in the second case.

Best,
Wolfgang

-----Original Message-----
From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces at r-project.org] On Behalf Of Arne Janssen
Sent: Monday, 13 August, 2018 19:23
To: r-sig-meta-analysis at r-project.org
Subject: [R-meta] Estimate variance from time series data

Dear list members,

I am doing a meta-analysis with data that are often presented as
repeated measures of population densities, but authors sometimes also
give overall averages and s.d. or s.e.. Because I want to combine these
data into one analysis, I am interested in the overall effect size of
the repeated measures, so would like to combine all data of the time
series into one average and s.d. The time series are repeated several
times, yielding data of the following form:
Time                Treatment 1                Treatment 2
                           N    Ave    s.d.                N    Ave    s.d.
1                      N1    x1,1    sd1,1          N2    x2,1    sd2,1
2                      N1    x1,2    sd1,2           n2    x2,2    sd2,2
...
...
...

What I want to obtain is one average and s.d. per treatment through time.
The average is straightforward, but I cannot come up with a calculation
for the s.d.

The formula normally used for calculating the combined variance of two
series of measurements:

Var = (s1^2(n1 -- 1) + s2^2(n2 -- 1) + n1(X-x1)^22 + n2(X-x2)^22)/( (n1
+ n2 -- 1)

does not seem to apply when combining the measurements through time,
because this increases the number of replicates, which in my opinion,
should be the number of time series and not the number of observations.

I hope I made myself clear, and would be very grateful if you could
advise me on this matter.

Thanks very much in advance.
Arne Janssen

Wolfgang Viechtbauer

Wed, Aug 15, 2018 5:54 AM #

If you do not know the correlations, then you cannot compute the sampling variances correctly. You could 'guestimate' the correlations and then do sensitivity analyses. I do not know what you actually want to compute based on the combined means and SDs of the two groups -- do you want to compute a mean difference or standardized mean difference or some other effect size measure? One would have to work out the correct equation for the sampling variance that takes the correlations into consideration. That part alone may not be trivial.

Best,
Wolfgang

-----Original Message-----
From: Arne Janssen [mailto:arne.janssen at uva.nl] 
Sent: Wednesday, 15 August, 2018 14:35
To: Viechtbauer, Wolfgang (SP); r-sig-meta-analysis at r-project.org
Subject: Re: [R-meta] Estimate variance from time series data

Dear Wolfgang,

Exactly, and there's the problem, because the correlations are never 
reported. So what do do in this case?

Best wishes,
Arne

On 15-Aug-18 14:26, Viechtbauer, Wolfgang (SP) wrote:

Dear Arne,

In this example, there are 15 observations in total. The code I provided shows how to obtain the mean and standard deviation of these 15 observations. However, these 15 observations are not independent and hence any sampling variance you compute for the combined mean (or some function thereof) would need to take the degree of correlation into consideration.

Best,
Wolfgang

-----Original Message-----
From: Arne Janssen [mailto:arne.janssen at uva.nl]
Sent: Wednesday, 15 August, 2018 14:10
To: Viechtbauer, Wolfgang (SP); r-sig-meta-analysis at r-project.org
Subject: Re: [R-meta] Estimate variance from time series data

Dear Wolfgang,

Thanks for your quick reply. The question really is what is the sample
size. Suppose there are 3 time series of 5 data points through time each
that I want to combine. Given are the average and s.d. of the 3 series
per time (so 5 averages and s.d.).

I would like to obtain an overall average  and s.d. of these 3 time
series. If we consider that N = 15, I can use the standard method to
combine the 3 series. If we consider N to be 3, because there are only 3
time series, I would indeed need to know the correlation among the time
series to estimate the s.d., but this correlation is unknown. Please advise.

Thanks and best wishes,
Arne

On 14-Aug-18 22:50, Viechtbauer, Wolfgang (SP) wrote:

Hi Arne,

It is not entirely clear to me what you are trying to do. Do you want to know the mean and SD when throwing together the N1 measurements from timepoint 1 and the N1 measurements from timepoint 2 from the same group, such that there are 2*N1 measurements in total now for the group? (or 3*N1 if there were three timepoints and so on).

Reply: This is indeed what I want to do.

   Then the same equation could be used as if there are independent subgroups.

For example:

### Suppose we have the mean, SD, and size of several subgroups, but we
### need the mean and SD of the total/combined groups. Code below shows
### what we need to compute to obtain this.

### simulate some data
n.total<- 100
grp<- sample(1:4, size=n.total, replace=TRUE)
y<- rnorm(n.total, mean=grp, sd=2)

### means and SDs of the subgroups
ni<- c(by(y, grp, length))
mi<- c(by(y, grp, mean))
sdi<- c(by(y, grp, sd))

### want to get mean and SD of the total group
mean(y)
sd(y)

### mean = weighted mean (weights = group sizes)
m.total<- sum(ni*mi)/sum(ni)

### SD = sqrt((within-group sum-of-squares plus between-group sum-of-squares) / (n.total - 1))
sd.total<- sqrt((sum((ni-1) * sdi^2) + sum(ni*(mi - m.total)^2)) / (sum(ni) - 1))

Here is my doubt: The sum(ni) is now larger than the number of
replicates (4 time series, so 4 replicates, n should be 4), am I correct?

### check that we get the right values
m.total
sd.total

This would be the case for independent subgroups. Now let's simulate data for 50 individuals measured twice:

library(MASS)

Y<- mvrnorm(50, mu=c(0,0), Sigma=matrix(c(1, .8, .8, 1), nrow=2))
y<- c(t(Y))
grp<- c(1:50, 1:50)

### means and SDs of the subgroups
ni<- c(by(y, grp, length))
mi<- c(by(y, grp, mean))
sdi<- c(by(y, grp, sd))

### want to get mean and SD of the total group
mean(y)
sd(y)

### mean = weighted mean (weights = group sizes)
m.total<- sum(ni*mi)/sum(ni)

### SD = sqrt((within-group sum-of-squares plus between-group sum-of-squares) / (n.total - 1))
sd.total<- sqrt((sum((ni-1) * sdi^2) + sum(ni*(mi - m.total)^2)) / (sum(ni) - 1))

### check that we get the right values
m.total
sd.total

Still works. However, when it comes to computing the sampling variance for m.total (or some function thereof), one cannot treat these two cases as the same. In the first case, we really have sum(ni) independent measurements, so var(y) / sum(ni) would be the correct sampling variance of m.total, but not so for the second case. You would need to know the correlation between the measurements over time to compute an appropriate sampling variance of m.total in the second case.

Best,
Wolfgang

-----Original Message-----
From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces at r-project.org] On Behalf Of Arne Janssen
Sent: Monday, 13 August, 2018 19:23
To: r-sig-meta-analysis at r-project.org
Subject: [R-meta] Estimate variance from time series data

Dear list members,

I am doing a meta-analysis with data that are often presented as
repeated measures of population densities, but authors sometimes also
give overall averages and s.d. or s.e.. Because I want to combine these
data into one analysis, I am interested in the overall effect size of
the repeated measures, so would like to combine all data of the time
series into one average and s.d. The time series are repeated several
times, yielding data of the following form:
Time                Treatment 1                Treatment 2
                           N    Ave    s.d.                N    Ave    s.d.
1                      N1    x1,1    sd1,1          N2    x2,1    sd2,1
2                      N1    x1,2    sd1,2           n2    x2,2    sd2,2
...
...
...

What I want to obtain is one average and s.d. per treatment through time.
The average is straightforward, but I cannot come up with a calculation
for the s.d.

The formula normally used for calculating the combined variance of two
series of measurements:

Var = (s1^2(n1 -- 1) + s2^2(n2 -- 1) + n1(X-x1)^22 + n2(X-x2)^22)/( (n1
+ n2 -- 1)

does not seem to apply when combining the measurements through time,
because this increases the number of replicates, which in my opinion,
should be the number of time series and not the number of observations.

I hope I made myself clear, and would be very grateful if you could
advise me on this matter.

Thanks very much in advance.
Arne Janssen

Arne Janssen

Wed, Aug 15, 2018 7:36 AM #

Dear Wolfgang,

I would like to calculate the standardized mean difference between time 
series of different treatments, each replicated. Calculating raw mean 
differences is no problem, as you will realize.

Some background: I want to do a meta-analysis of population-dynamical 
data, so time series. I am interested in the effect of one treatment 
compared to a control, as usual. Each treatment and control will consist 
of several replicate time series, of which the averages and s.d. are 
usually given per time step. Because different studies involve different 
numbers of time steps, and because the effect sizes are expected to vary 
with time, I do not want to calculate the effect size per time step, but 
an overall effect size, based on the entire time series.

Hope this clarifies things a bit. In any case, it seems that I need to 
make some assumptions on the correlations between the time series within 
each treatment, which is indeed not trivial.

As an alternative approach, I was thinking of calculating the effect 
size per time step and then averaging over time, but the question then 
remains how to estimate the sampling variance.

Thanks and best wishes,
Arne

On 15-Aug-18 14:54, Viechtbauer, Wolfgang (SP) wrote:

If you do not know the correlations, then you cannot compute the sampling variances correctly. You could 'guestimate' the correlations and then do sensitivity analyses. I do not know what you actually want to compute based on the combined means and SDs of the two groups -- do you want to compute a mean difference or standardized mean difference or some other effect size measure? One would have to work out the correct equation for the sampling variance that takes the correlations into consideration. That part alone may not be trivial.

Best,
Wolfgang

-----Original Message-----
From: Arne Janssen [mailto:arne.janssen at uva.nl]
Sent: Wednesday, 15 August, 2018 14:35
To: Viechtbauer, Wolfgang (SP); r-sig-meta-analysis at r-project.org
Subject: Re: [R-meta] Estimate variance from time series data

Dear Wolfgang,

Exactly, and there's the problem, because the correlations are never
reported. So what do do in this case?

Best wishes,
Arne

On 15-Aug-18 14:26, Viechtbauer, Wolfgang (SP) wrote:

Dear Arne,

In this example, there are 15 observations in total. The code I provided shows how to obtain the mean and standard deviation of these 15 observations. However, these 15 observations are not independent and hence any sampling variance you compute for the combined mean (or some function thereof) would need to take the degree of correlation into consideration.

Best,
Wolfgang

-----Original Message-----
From: Arne Janssen [mailto:arne.janssen at uva.nl]
Sent: Wednesday, 15 August, 2018 14:10
To: Viechtbauer, Wolfgang (SP); r-sig-meta-analysis at r-project.org
Subject: Re: [R-meta] Estimate variance from time series data

Dear Wolfgang,

Thanks for your quick reply. The question really is what is the sample
size. Suppose there are 3 time series of 5 data points through time each
that I want to combine. Given are the average and s.d. of the 3 series
per time (so 5 averages and s.d.).

I would like to obtain an overall average  and s.d. of these 3 time
series. If we consider that N = 15, I can use the standard method to
combine the 3 series. If we consider N to be 3, because there are only 3
time series, I would indeed need to know the correlation among the time
series to estimate the s.d., but this correlation is unknown. Please advise.

Thanks and best wishes,
Arne

On 14-Aug-18 22:50, Viechtbauer, Wolfgang (SP) wrote:

Hi Arne,

It is not entirely clear to me what you are trying to do. Do you want to know the mean and SD when throwing together the N1 measurements from timepoint 1 and the N1 measurements from timepoint 2 from the same group, such that there are 2*N1 measurements in total now for the group? (or 3*N1 if there were three timepoints and so on).

Reply: This is indeed what I want to do.

    Then the same equation could be used as if there are independent subgroups.

For example:

### Suppose we have the mean, SD, and size of several subgroups, but we
### need the mean and SD of the total/combined groups. Code below shows
### what we need to compute to obtain this.

### simulate some data
n.total<- 100
grp<- sample(1:4, size=n.total, replace=TRUE)
y<- rnorm(n.total, mean=grp, sd=2)

### means and SDs of the subgroups
ni<- c(by(y, grp, length))
mi<- c(by(y, grp, mean))
sdi<- c(by(y, grp, sd))

### want to get mean and SD of the total group
mean(y)
sd(y)

### mean = weighted mean (weights = group sizes)
m.total<- sum(ni*mi)/sum(ni)

### SD = sqrt((within-group sum-of-squares plus between-group sum-of-squares) / (n.total - 1))
sd.total<- sqrt((sum((ni-1) * sdi^2) + sum(ni*(mi - m.total)^2)) / (sum(ni) - 1))

Here is my doubt: The sum(ni) is now larger than the number of
replicates (4 time series, so 4 replicates, n should be 4), am I correct?

### check that we get the right values
m.total
sd.total

This would be the case for independent subgroups. Now let's simulate data for 50 individuals measured twice:

library(MASS)

Y<- mvrnorm(50, mu=c(0,0), Sigma=matrix(c(1, .8, .8, 1), nrow=2))
y<- c(t(Y))
grp<- c(1:50, 1:50)

### means and SDs of the subgroups
ni<- c(by(y, grp, length))
mi<- c(by(y, grp, mean))
sdi<- c(by(y, grp, sd))

### want to get mean and SD of the total group
mean(y)
sd(y)

### mean = weighted mean (weights = group sizes)
m.total<- sum(ni*mi)/sum(ni)

### SD = sqrt((within-group sum-of-squares plus between-group sum-of-squares) / (n.total - 1))
sd.total<- sqrt((sum((ni-1) * sdi^2) + sum(ni*(mi - m.total)^2)) / (sum(ni) - 1))

### check that we get the right values
m.total
sd.total

Still works. However, when it comes to computing the sampling variance for m.total (or some function thereof), one cannot treat these two cases as the same. In the first case, we really have sum(ni) independent measurements, so var(y) / sum(ni) would be the correct sampling variance of m.total, but not so for the second case. You would need to know the correlation between the measurements over time to compute an appropriate sampling variance of m.total in the second case.

Best,
Wolfgang

-----Original Message-----
From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces at r-project.org] On Behalf Of Arne Janssen
Sent: Monday, 13 August, 2018 19:23
To: r-sig-meta-analysis at r-project.org
Subject: [R-meta] Estimate variance from time series data

Dear list members,

I am doing a meta-analysis with data that are often presented as
repeated measures of population densities, but authors sometimes also
give overall averages and s.d. or s.e.. Because I want to combine these
data into one analysis, I am interested in the overall effect size of
the repeated measures, so would like to combine all data of the time
series into one average and s.d. The time series are repeated several
times, yielding data of the following form:
Time                Treatment 1                Treatment 2
                            N    Ave    s.d.                N    Ave    s.d.
1                      N1    x1,1    sd1,1          N2    x2,1    sd2,1
2                      N1    x1,2    sd1,2           n2    x2,2    sd2,2
...
...
...

What I want to obtain is one average and s.d. per treatment through time.
The average is straightforward, but I cannot come up with a calculation
for the s.d.

The formula normally used for calculating the combined variance of two
series of measurements:

Var = (s1^2(n1 -- 1) + s2^2(n2 -- 1) + n1(X-x1)^22 + n2(X-x2)^22)/( (n1
+ n2 -- 1)

does not seem to apply when combining the measurements through time,
because this increases the number of replicates, which in my opinion,
should be the number of time series and not the number of observations.

I hope I made myself clear, and would be very grateful if you could
advise me on this matter.

Thanks very much in advance.
Arne Janssen

Wolfgang Viechtbauer

Thu, Aug 16, 2018 12:57 AM #

Dear Arne,

Some comments:

"Calculating raw mean differences is no problem, as you will realize."

Do you mean the mean difference based on the 'aggregated' means over time within the two groups? Sure, that is easy to compute. But computing the sampling variance thereof is difficult.

"Because different studies involve different numbers of time steps, and because the effect sizes are expected to vary with time, I do not want to calculate the effect size per time step, but an overall effect size, based on the entire time series."

Could you explain why you do not want to calculate effects per time step?

"As an alternative approach, I was thinking of calculating the effect size per time step and then averaging over time, but the question then remains how to estimate the sampling variance."

I think calculating the effect size per time step is exactly what you should do. However, I would not recommend averaging over time. You say that you expect effect sizes to vary with time, so unless you take extra precautions, averaging over time would incorrectly assume that they do not vary over time.

Instead, I would analyze the effects as they are, using an appropriate mixed-effects model that accounts for the dependency in the estimates. For one, sampling errors of multiple effects over time are correlated. If you do not know the correlation between the measurements over time, you could guestimate them, compute the covariances between the effects within the same study, and then do sensitivity analyses. In addition, the underlying true effects are likely to be correlated and for time series data, autoregressive structures like AR(1) and continuous-time AR(1) are often appropriate. See help(rma.mv) and take a look at the paragraph starting with: "For meta-analyses of studies reporting outcomes at multiple time points ...".

These two papers are also highly relevant:

Ishak, K. J., Platt, R. W., Joseph, L., Hanley, J. A., & Caro, J. J. (2007). Meta-analysis of longitudinal studies. Clinical Trials, 4, 525-539.

Trikalinos, T. A., & Olkin, I. (2012). Meta-analysis of effect sizes reported at multiple time points: A multivariate approach. Clinical Trials, 9, 610-620.

There is also this tutorial-type paper:

Musekiwa, A., Manda, S. O., Mwambi, H. G., & Chen, D. G. (2016). Meta-analysis of effect sizes reported at multiple time points using general linear mixed model. PLOS ONE, 11(10), e0164898.

If you are only interested in the fixed effects, instead of guestimating the correlations (and then computing the covariances), you could start with a working model that assumes that the covariances are zero, and then use cluster-robust inference methods, using robust() from metafor or, even better, the clubSandwich package, which also works nicely together with metafor.

Best,
Wolfgang

-----Original Message-----
From: Arne Janssen [mailto:arne.janssen at uva.nl] 
Sent: Wednesday, 15 August, 2018 16:37
To: Viechtbauer, Wolfgang (SP)
Cc: r-sig-meta-analysis at r-project.org
Subject: Re: [R-meta] Estimate variance from time series data

Dear Wolfgang,

I would like to calculate the standardized mean difference between time 
series of different treatments, each replicated. Calculating raw mean 
differences is no problem, as you will realize.

Some background: I want to do a meta-analysis of population-dynamical 
data, so time series. I am interested in the effect of one treatment 
compared to a control, as usual. Each treatment and control will consist 
of several replicate time series, of which the averages and s.d. are 
usually given per time step. Because different studies involve different 
numbers of time steps, and because the effect sizes are expected to vary 
with time, I do not want to calculate the effect size per time step, but 
an overall effect size, based on the entire time series.

Hope this clarifies things a bit. In any case, it seems that I need to 
make some assumptions on the correlations between the time series within 
each treatment, which is indeed not trivial.

As an alternative approach, I was thinking of calculating the effect 
size per time step and then averaging over time, but the question then 
remains how to estimate the sampling variance.

Thanks and best wishes,
Arne

On 15-Aug-18 14:54, Viechtbauer, Wolfgang (SP) wrote:

If you do not know the correlations, then you cannot compute the sampling variances correctly. You could 'guestimate' the correlations and then do sensitivity analyses. I do not know what you actually want to compute based on the combined means and SDs of the two groups -- do you want to compute a mean difference or standardized mean difference or some other effect size measure? One would have to work out the correct equation for the sampling variance that takes the correlations into consideration. That part alone may not be trivial.

Best,
Wolfgang

-----Original Message-----
From: Arne Janssen [mailto:arne.janssen at uva.nl]
Sent: Wednesday, 15 August, 2018 14:35
To: Viechtbauer, Wolfgang (SP); r-sig-meta-analysis at r-project.org
Subject: Re: [R-meta] Estimate variance from time series data

Dear Wolfgang,

Exactly, and there's the problem, because the correlations are never
reported. So what do do in this case?

Best wishes,
Arne

On 15-Aug-18 14:26, Viechtbauer, Wolfgang (SP) wrote:

Dear Arne,

In this example, there are 15 observations in total. The code I provided shows how to obtain the mean and standard deviation of these 15 observations. However, these 15 observations are not independent and hence any sampling variance you compute for the combined mean (or some function thereof) would need to take the degree of correlation into consideration.

Best,
Wolfgang

-----Original Message-----
From: Arne Janssen [mailto:arne.janssen at uva.nl]
Sent: Wednesday, 15 August, 2018 14:10
To: Viechtbauer, Wolfgang (SP); r-sig-meta-analysis at r-project.org
Subject: Re: [R-meta] Estimate variance from time series data

Dear Wolfgang,

Thanks for your quick reply. The question really is what is the sample
size. Suppose there are 3 time series of 5 data points through time each
that I want to combine. Given are the average and s.d. of the 3 series
per time (so 5 averages and s.d.).

I would like to obtain an overall average  and s.d. of these 3 time
series. If we consider that N = 15, I can use the standard method to
combine the 3 series. If we consider N to be 3, because there are only 3
time series, I would indeed need to know the correlation among the time
series to estimate the s.d., but this correlation is unknown. Please advise.

Thanks and best wishes,
Arne

On 14-Aug-18 22:50, Viechtbauer, Wolfgang (SP) wrote:

Hi Arne,

It is not entirely clear to me what you are trying to do. Do you want to know the mean and SD when throwing together the N1 measurements from timepoint 1 and the N1 measurements from timepoint 2 from the same group, such that there are 2*N1 measurements in total now for the group? (or 3*N1 if there were three timepoints and so on).

Reply: This is indeed what I want to do.

    Then the same equation could be used as if there are independent subgroups.

For example:

### Suppose we have the mean, SD, and size of several subgroups, but we
### need the mean and SD of the total/combined groups. Code below shows
### what we need to compute to obtain this.

### simulate some data
n.total<- 100
grp<- sample(1:4, size=n.total, replace=TRUE)
y<- rnorm(n.total, mean=grp, sd=2)

### means and SDs of the subgroups
ni<- c(by(y, grp, length))
mi<- c(by(y, grp, mean))
sdi<- c(by(y, grp, sd))

### want to get mean and SD of the total group
mean(y)
sd(y)

### mean = weighted mean (weights = group sizes)
m.total<- sum(ni*mi)/sum(ni)

### SD = sqrt((within-group sum-of-squares plus between-group sum-of-squares) / (n.total - 1))
sd.total<- sqrt((sum((ni-1) * sdi^2) + sum(ni*(mi - m.total)^2)) / (sum(ni) - 1))

Here is my doubt: The sum(ni) is now larger than the number of
replicates (4 time series, so 4 replicates, n should be 4), am I correct?

### check that we get the right values
m.total
sd.total

This would be the case for independent subgroups. Now let's simulate data for 50 individuals measured twice:

library(MASS)

Y<- mvrnorm(50, mu=c(0,0), Sigma=matrix(c(1, .8, .8, 1), nrow=2))
y<- c(t(Y))
grp<- c(1:50, 1:50)

### means and SDs of the subgroups
ni<- c(by(y, grp, length))
mi<- c(by(y, grp, mean))
sdi<- c(by(y, grp, sd))

### want to get mean and SD of the total group
mean(y)
sd(y)

### mean = weighted mean (weights = group sizes)
m.total<- sum(ni*mi)/sum(ni)

### SD = sqrt((within-group sum-of-squares plus between-group sum-of-squares) / (n.total - 1))
sd.total<- sqrt((sum((ni-1) * sdi^2) + sum(ni*(mi - m.total)^2)) / (sum(ni) - 1))

### check that we get the right values
m.total
sd.total

Still works. However, when it comes to computing the sampling variance for m.total (or some function thereof), one cannot treat these two cases as the same. In the first case, we really have sum(ni) independent measurements, so var(y) / sum(ni) would be the correct sampling variance of m.total, but not so for the second case. You would need to know the correlation between the measurements over time to compute an appropriate sampling variance of m.total in the second case.

Best,
Wolfgang

-----Original Message-----
From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces at r-project.org] On Behalf Of Arne Janssen
Sent: Monday, 13 August, 2018 19:23
To: r-sig-meta-analysis at r-project.org
Subject: [R-meta] Estimate variance from time series data

Dear list members,

I am doing a meta-analysis with data that are often presented as
repeated measures of population densities, but authors sometimes also
give overall averages and s.d. or s.e.. Because I want to combine these
data into one analysis, I am interested in the overall effect size of
the repeated measures, so would like to combine all data of the time
series into one average and s.d. The time series are repeated several
times, yielding data of the following form:
Time                Treatment 1                Treatment 2
                            N    Ave    s.d.                N    Ave    s.d.
1                      N1    x1,1    sd1,1          N2    x2,1    sd2,1
2                      N1    x1,2    sd1,2           n2    x2,2    sd2,2
...
...
...

What I want to obtain is one average and s.d. per treatment through time.
The average is straightforward, but I cannot come up with a calculation
for the s.d.

The formula normally used for calculating the combined variance of two
series of measurements:

Var = (s1^2(n1 -- 1) + s2^2(n2 -- 1) + n1(X-x1)^22 + n2(X-x2)^22)/( (n1
+ n2 -- 1)

does not seem to apply when combining the measurements through time,
because this increases the number of replicates, which in my opinion,
should be the number of time series and not the number of observations.

I hope I made myself clear, and would be very grateful if you could
advise me on this matter.

Thanks very much in advance.
Arne Janssen

Arne Janssen

Thu, Aug 16, 2018 3:57 AM #

Dear Wolfgang,

Thanks for all the suggestions, I will have a look at the suggested papers.

The reason that I did not want to use affect sizes per time step is that 
studies with more time steps will then have a larger weight on the 
overall analysis. Whereas I can think of some justification for this, I 
would rather be on the conservative side. Meanwhile, I thought to use 
the correlations of the few studies of which I do have the raw data as 
an indication for the range of correlations to be expected.

Thanks again for the quick replies.

Cheers,
Arne

On 16-Aug-18 9:57, Viechtbauer, Wolfgang (SP) wrote:

Dear Arne,

Some comments:

"Calculating raw mean differences is no problem, as you will realize."

Do you mean the mean difference based on the 'aggregated' means over time within the two groups? Sure, that is easy to compute. But computing the sampling variance thereof is difficult.

"Because different studies involve different numbers of time steps, and because the effect sizes are expected to vary with time, I do not want to calculate the effect size per time step, but an overall effect size, based on the entire time series."

Could you explain why you do not want to calculate effects per time step?

"As an alternative approach, I was thinking of calculating the effect size per time step and then averaging over time, but the question then remains how to estimate the sampling variance."

I think calculating the effect size per time step is exactly what you should do. However, I would not recommend averaging over time. You say that you expect effect sizes to vary with time, so unless you take extra precautions, averaging over time would incorrectly assume that they do not vary over time.

Instead, I would analyze the effects as they are, using an appropriate mixed-effects model that accounts for the dependency in the estimates. For one, sampling errors of multiple effects over time are correlated. If you do not know the correlation between the measurements over time, you could guestimate them, compute the covariances between the effects within the same study, and then do sensitivity analyses. In addition, the underlying true effects are likely to be correlated and for time series data, autoregressive structures like AR(1) and continuous-time AR(1) are often appropriate. See help(rma.mv) and take a look at the paragraph starting with: "For meta-analyses of studies reporting outcomes at multiple time points ...".

These two papers are also highly relevant:

Ishak, K. J., Platt, R. W., Joseph, L., Hanley, J. A.,&  Caro, J. J. (2007). Meta-analysis of longitudinal studies. Clinical Trials, 4, 525-539.

Trikalinos, T. A.,&  Olkin, I. (2012). Meta-analysis of effect sizes reported at multiple time points: A multivariate approach. Clinical Trials, 9, 610-620.

There is also this tutorial-type paper:

Musekiwa, A., Manda, S. O., Mwambi, H. G.,&  Chen, D. G. (2016). Meta-analysis of effect sizes reported at multiple time points using general linear mixed model. PLOS ONE, 11(10), e0164898.

If you are only interested in the fixed effects, instead of guestimating the correlations (and then computing the covariances), you could start with a working model that assumes that the covariances are zero, and then use cluster-robust inference methods, using robust() from metafor or, even better, the clubSandwich package, which also works nicely together with metafor.

Best,
Wolfgang

-----Original Message-----
From: Arne Janssen [mailto:arne.janssen at uva.nl]
Sent: Wednesday, 15 August, 2018 16:37
To: Viechtbauer, Wolfgang (SP)
Cc: r-sig-meta-analysis at r-project.org
Subject: Re: [R-meta] Estimate variance from time series data

Dear Wolfgang,

I would like to calculate the standardized mean difference between time
series of different treatments, each replicated. Calculating raw mean
differences is no problem, as you will realize.

Some background: I want to do a meta-analysis of population-dynamical
data, so time series. I am interested in the effect of one treatment
compared to a control, as usual. Each treatment and control will consist
of several replicate time series, of which the averages and s.d. are
usually given per time step. Because different studies involve different
numbers of time steps, and because the effect sizes are expected to vary
with time, I do not want to calculate the effect size per time step, but
an overall effect size, based on the entire time series.

Hope this clarifies things a bit. In any case, it seems that I need to
make some assumptions on the correlations between the time series within
each treatment, which is indeed not trivial.

As an alternative approach, I was thinking of calculating the effect
size per time step and then averaging over time, but the question then
remains how to estimate the sampling variance.

Thanks and best wishes,
Arne

On 15-Aug-18 14:54, Viechtbauer, Wolfgang (SP) wrote:

If you do not know the correlations, then you cannot compute the sampling variances correctly. You could 'guestimate' the correlations and then do sensitivity analyses. I do not know what you actually want to compute based on the combined means and SDs of the two groups -- do you want to compute a mean difference or standardized mean difference or some other effect size measure? One would have to work out the correct equation for the sampling variance that takes the correlations into consideration. That part alone may not be trivial.

Best,
Wolfgang

-----Original Message-----
From: Arne Janssen [mailto:arne.janssen at uva.nl]
Sent: Wednesday, 15 August, 2018 14:35
To: Viechtbauer, Wolfgang (SP); r-sig-meta-analysis at r-project.org
Subject: Re: [R-meta] Estimate variance from time series data

Dear Wolfgang,

Exactly, and there's the problem, because the correlations are never
reported. So what do do in this case?

Best wishes,
Arne

On 15-Aug-18 14:26, Viechtbauer, Wolfgang (SP) wrote:

Dear Arne,

In this example, there are 15 observations in total. The code I provided shows how to obtain the mean and standard deviation of these 15 observations. However, these 15 observations are not independent and hence any sampling variance you compute for the combined mean (or some function thereof) would need to take the degree of correlation into consideration.

Best,
Wolfgang

-----Original Message-----
From: Arne Janssen [mailto:arne.janssen at uva.nl]
Sent: Wednesday, 15 August, 2018 14:10
To: Viechtbauer, Wolfgang (SP); r-sig-meta-analysis at r-project.org
Subject: Re: [R-meta] Estimate variance from time series data

Dear Wolfgang,

Thanks for your quick reply. The question really is what is the sample
size. Suppose there are 3 time series of 5 data points through time each
that I want to combine. Given are the average and s.d. of the 3 series
per time (so 5 averages and s.d.).

I would like to obtain an overall average  and s.d. of these 3 time
series. If we consider that N = 15, I can use the standard method to
combine the 3 series. If we consider N to be 3, because there are only 3
time series, I would indeed need to know the correlation among the time
series to estimate the s.d., but this correlation is unknown. Please advise.

Thanks and best wishes,
Arne

On 14-Aug-18 22:50, Viechtbauer, Wolfgang (SP) wrote:

Hi Arne,

It is not entirely clear to me what you are trying to do. Do you want to know the mean and SD when throwing together the N1 measurements from timepoint 1 and the N1 measurements from timepoint 2 from the same group, such that there are 2*N1 measurements in total now for the group? (or 3*N1 if there were three timepoints and so on).

Reply: This is indeed what I want to do.

     Then the same equation could be used as if there are independent subgroups.

For example:

### Suppose we have the mean, SD, and size of several subgroups, but we
### need the mean and SD of the total/combined groups. Code below shows
### what we need to compute to obtain this.

### simulate some data
n.total<- 100
grp<- sample(1:4, size=n.total, replace=TRUE)
y<- rnorm(n.total, mean=grp, sd=2)

### means and SDs of the subgroups
ni<- c(by(y, grp, length))
mi<- c(by(y, grp, mean))
sdi<- c(by(y, grp, sd))

### want to get mean and SD of the total group
mean(y)
sd(y)

### mean = weighted mean (weights = group sizes)
m.total<- sum(ni*mi)/sum(ni)

### SD = sqrt((within-group sum-of-squares plus between-group sum-of-squares) / (n.total - 1))
sd.total<- sqrt((sum((ni-1) * sdi^2) + sum(ni*(mi - m.total)^2)) / (sum(ni) - 1))

Here is my doubt: The sum(ni) is now larger than the number of
replicates (4 time series, so 4 replicates, n should be 4), am I correct?

### check that we get the right values
m.total
sd.total

This would be the case for independent subgroups. Now let's simulate data for 50 individuals measured twice:

library(MASS)

Y<- mvrnorm(50, mu=c(0,0), Sigma=matrix(c(1, .8, .8, 1), nrow=2))
y<- c(t(Y))
grp<- c(1:50, 1:50)

### means and SDs of the subgroups
ni<- c(by(y, grp, length))
mi<- c(by(y, grp, mean))
sdi<- c(by(y, grp, sd))

### want to get mean and SD of the total group
mean(y)
sd(y)

### mean = weighted mean (weights = group sizes)
m.total<- sum(ni*mi)/sum(ni)

### SD = sqrt((within-group sum-of-squares plus between-group sum-of-squares) / (n.total - 1))
sd.total<- sqrt((sum((ni-1) * sdi^2) + sum(ni*(mi - m.total)^2)) / (sum(ni) - 1))

### check that we get the right values
m.total
sd.total

Still works. However, when it comes to computing the sampling variance for m.total (or some function thereof), one cannot treat these two cases as the same. In the first case, we really have sum(ni) independent measurements, so var(y) / sum(ni) would be the correct sampling variance of m.total, but not so for the second case. You would need to know the correlation between the measurements over time to compute an appropriate sampling variance of m.total in the second case.

Best,
Wolfgang

-----Original Message-----
From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces at r-project.org] On Behalf Of Arne Janssen
Sent: Monday, 13 August, 2018 19:23
To: r-sig-meta-analysis at r-project.org
Subject: [R-meta] Estimate variance from time series data

Dear list members,

I am doing a meta-analysis with data that are often presented as
repeated measures of population densities, but authors sometimes also
give overall averages and s.d. or s.e.. Because I want to combine these
data into one analysis, I am interested in the overall effect size of
the repeated measures, so would like to combine all data of the time
series into one average and s.d. The time series are repeated several
times, yielding data of the following form:
Time                Treatment 1                Treatment 2
                             N    Ave    s.d.                N    Ave    s.d.
1                      N1    x1,1    sd1,1          N2    x2,1    sd2,1
2                      N1    x1,2    sd1,2           n2    x2,2    sd2,2
...
...
...

What I want to obtain is one average and s.d. per treatment through time.
The average is straightforward, but I cannot come up with a calculation
for the s.d.

The formula normally used for calculating the combined variance of two
series of measurements:

Var = (s1^2(n1 -- 1) + s2^2(n2 -- 1) + n1(X-x1)^22 + n2(X-x2)^22)/( (n1
+ n2 -- 1)

does not seem to apply when combining the measurements through time,
because this increases the number of replicates, which in my opinion,
should be the number of time series and not the number of observations.

I hope I made myself clear, and would be very grateful if you could
advise me on this matter.

Thanks very much in advance.
Arne Janssen

Wolfgang Viechtbauer

Wed, Aug 22, 2018 2:31 AM #

Dear Arne,

I have heard this, or similar sentiments, before. However, if one uses an appropriate model that accounts for the dependencies among the estimates, then studies with more time steps will not automatically receive more weight. They will if the estimates are essentially independent, which is appropriate. On the other hand, if estimates are highly dependent, then this will lead to an automatic downweighting of estimates from the same study. The model in essence takes care of that for you. Also note that just looking at the weights is usually insufficient in more complex models. One really needs to look at the whole weight matrix.

Best,
Wolfgang

-----Original Message-----
From: Arne Janssen [mailto:arne.janssen at uva.nl] 
Sent: Thursday, 16 August, 2018 12:57
To: Viechtbauer, Wolfgang (SP)
Cc: r-sig-meta-analysis at r-project.org
Subject: Re: [R-meta] Estimate variance from time series data

Dear Wolfgang,

Thanks for all the suggestions, I will have a look at the suggested papers.

The reason that I did not want to use affect sizes per time step is that 
studies with more time steps will then have a larger weight on the 
overall analysis. Whereas I can think of some justification for this, I 
would rather be on the conservative side. Meanwhile, I thought to use 
the correlations of the few studies of which I do have the raw data as 
an indication for the range of correlations to be expected.

Thanks again for the quick replies.

Cheers,
Arne