[R-meta] Post-hoc weighted analysis based on number of observations - R-SIG-meta-analysis

Mon, Jan 22, 2018 9:51 AM #

I have a gridded dataset representing the standard error (SE) of an effect. This SE was calculated through a meta-analysis and subsequent predictive model applied on a grid:

ECMmeta <- rma(es, var, data=ecm.df ,control=list(stepadj=.5), mods= ~ 1 + MAP + MAT*CO2dif, knha=TRUE)
options(na.action = "na.pass")
ECMpred <- predict(ECMmeta, 
                    newmods = cbind(s.df$precipitation, s.df$temperature, CO2inc, s.df$temperature*CO2inc))
ECMrelSE <- rasterFromXYZ(ECMpred[,c("x", "y", "se")],crs="+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0")


I would like to add a further level of uncertainty to SE based on the number of measurements (observations) per type of ecosystem in the dataset. The idea is that ecosystems that are poorly represented by experiments in the dataset should have a higher SE than ecosystems with plenty of measurements in the dataset.

I thought I could, for example, calculate an ecosystem-based weight as:

weight = n/sum(n)

That is, number of observations in a particular ecosystem divided by the total of observations. 

The next step would be to apply a weighting approach to each pixel. First approach I've come up with is to simply multiply SE and the inverse of the weight:

SEw=SE*(1/weight)

But the values are extremely high.

An approach like this would be more like an post-hoc patch. I am sure something like this can be done within the meta-analysis at the beginning. Alternatively, a better post-hoc approach or ideas to investigate further would be welcome. Any recommendation or basic approach commonly used to add further uncertainty to areas with low representativeness?

Thanks

Viechtbauer Wolfgang (STAT)

Wed, Jan 24, 2018 2:56 PM #

Dear Cesar,

Let me try to understand the essence of your question/issue and abstract it a bit from the specifics of your data. So, if I understand things correctly, you have data from various places on Earth. Let's pretend those places are on a 2d surface, so something like this (where * indicates a place where you have data):

+------------------------+
|     *                  |
|  *                     |
|     *                  |
|                     *  |
|                 *  *   |
|                        |
+------------------------+

You have fitted a model that relates an outcome to some predictor variables based on the data for these places. Now you actually have the values of the predictor variables for *all* places on that surface and you have computed the corresponding predicted values. But there are locations for which there were no data to begin with (e.g., upper right and lower left) and hence you want the SEs of the predicted values to reflect this lack of information in those areas and you are wondering how to do that. Does that capture the essence of your question?

Best,
Wolfgang

Cesar Terrer Moreno

Wed, Jan 24, 2018 10:56 PM #

Dear Wolfgang,

Thanks so much for your reply. You have captured the essence of the question perfectly. 

I have successfully scaled the meta-analysis-derived SE, so I have basically produced a global map of the SE of the effect:

SE <- predict(meta, 
                    newmods = cbind(s.df$precipitation, s.df$temperature, CO2inc, s.df$temperature*CO2inc))$se


However, as you said, some locations, in this case ecosystems (e.g. tropical forests) are poorly represented in the dataset. Therefore, a proper assessment of the uncertainties of the approach should account for the uncertainty associated with the sampling effort (or the lack of) in some regions. Reviewers will check this for sure.

It turns out that ecosystem type, per se, is not a good predictor, thus including it in the meta-regression probably does not make much sense (or maybe yes). I was thus thinking more on a post-hoc solution, not necessarily in a meta-analytic context, so maybe this distribution list is not the right place to ask this question. The idea is to increase SE in pixels dominated by ecosystems that are poorly sampled. The final quantification of uncertainties would thus be an aggregation of the SEs and some sort of multiplier that adds uncertainty in a particular pixel as a function of the representativeness of the type of ecosystem in that pixel.

For example:

group_by(ecosystem_type) %>% summarise(n = n()) %>% mutate (weight = n/sum(n))
	
SEw= max(SE,na.rm=T) - max(SE,na.rm=T)*weight, 

SEsum = SE + SEw

SEsum would thus be the sum of SE and another level of error driven by the sample size of the type of ecosystem, and constrained to fall within the range of observed SE from the dataset.

But I think this approach is not very elegant. Any other ideas?
Thanks
C?sar

On 24 Jan 2018, at 23:56, Viechtbauer Wolfgang (SP) <wolfgang.viechtbauer at maastrichtuniversity.nl> wrote:

Dear Cesar,

Let me try to understand the essence of your question/issue and abstract it a bit from the specifics of your data. So, if I understand things correctly, you have data from various places on Earth. Let's pretend those places are on a 2d surface, so something like this (where * indicates a place where you have data):

+------------------------+
|     *                  |
|  *                     |
|     *                  |
|                     *  |
|                 *  *   |
|                        |
+------------------------+

You have fitted a model that relates an outcome to some predictor variables based on the data for these places. Now you actually have the values of the predictor variables for *all* places on that surface and you have computed the corresponding predicted values. But there are locations for which there were no data to begin with (e.g., upper right and lower left) and hence you want the SEs of the predicted values to reflect this lack of information in those areas and you are wondering how to do that. Does that capture the essence of your question?

Best,
Wolfgang

-----Original Message-----
From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces at r-
project.org] On Behalf Of Cesar Terrer Moreno
Sent: Monday, 22 January, 2018 18:52
To: r-sig-meta-analysis at r-project.org
Subject: [R-meta] Post-hoc weighted analysis based on number of
observations

I have a gridded dataset representing the standard error (SE) of an
effect. This SE was calculated through a meta-analysis and subsequent
predictive model applied on a grid:

ECMmeta <- rma(es, var, data=ecm.df ,control=list(stepadj=.5), mods= ~ 1
+ MAP + MAT*CO2dif, knha=TRUE)
options(na.action = "na.pass")
ECMpred <- predict(ECMmeta,
                  newmods = cbind(s.df$precipitation, s.df$temperature,
CO2inc, s.df$temperature*CO2inc))
ECMrelSE <- rasterFromXYZ(ECMpred[,c("x", "y", "se")],crs="+proj=longlat
+datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0")

I would like to add a further level of uncertainty to SE based on the
number of measurements (observations) per type of ecosystem in the
dataset. The idea is that ecosystems that are poorly represented by
experiments in the dataset should have a higher SE than ecosystems with
plenty of measurements in the dataset.

I thought I could, for example, calculate an ecosystem-based weight as:

weight = n/sum(n)

That is, number of observations in a particular ecosystem divided by the
total of observations.

The next step would be to apply a weighting approach to each pixel. First
approach I've come up with is to simply multiply SE and the inverse of
the weight:

SEw=SE*(1/weight)

But the values are extremely high.

An approach like this would be more like an post-hoc patch. I am sure
something like this can be done within the meta-analysis at the
beginning. Alternatively, a better post-hoc approach or ideas to
investigate further would be welcome. Any recommendation or basic
approach commonly used to add further uncertainty to areas with low
representativeness?

Thanks

Viechtbauer Wolfgang (STAT)

Thu, Jan 25, 2018 2:08 AM #

I will need to mull over this for a bit, but I think this falls under 'spatial uncertainty' (a term worth googling in the meantime).

Best,
Wolfgang

-----Original Message-----
From: Cesar Terrer Moreno [mailto:cesar.terrer at me.com]
Sent: Thursday, 25 January, 2018 7:57
To: Viechtbauer Wolfgang (SP)
Cc: r-sig-meta-analysis at r-project.org
Subject: Re: [R-meta] Post-hoc weighted analysis based on number of
observations

Dear Wolfgang,

Thanks so much for your reply. You have captured the essence of the
question perfectly.

I have successfully scaled the meta-analysis-derived SE, so I have
basically produced a global map of the SE of the effect:

SE <- predict(meta,
? ? ? ? ? ? ? ? ? ? newmods = cbind(s.df$precipitation, s.df$temperature,
CO2inc, s.df$temperature*CO2inc))$se

However, as you said, some locations, in this case ecosystems (e.g.
tropical forests) are poorly represented in the dataset. Therefore, a
proper assessment of the uncertainties of the approach should account for
the uncertainty associated with the sampling effort (or the lack of) in
some regions. Reviewers will check this for sure.

It turns out that ecosystem type, per se, is not a good predictor, thus
including it in the meta-regression probably does not make much sense (or
maybe yes). I was thus thinking more on a post-hoc solution, not
necessarily in a meta-analytic context, so maybe this distribution list
is not the right place to ask this question. The idea is to increase SE
in pixels dominated by ecosystems that are poorly sampled. The final
quantification of uncertainties would thus be an aggregation of the SEs
and some sort of?multiplier that adds uncertainty in a particular pixel
as a function of the representativeness of the type of ecosystem in that
pixel.

For example:

group_by(ecosystem_type) %>% summarise(n = n()) %>% mutate (weight =
n/sum(n))

SEw= max(SE,na.rm=T) - max(SE,na.rm=T)*weight,

SEsum = SE + SEw

SEsum would thus be the sum of SE and another level of error driven by
the sample size of the type of ecosystem, and?constrained to fall within
the range of observed SE from the dataset.

But I think this approach is not very elegant. Any other ideas?
Thanks
C?sar

On 24 Jan 2018, at 23:56, Viechtbauer Wolfgang (SP)
<wolfgang.viechtbauer at maastrichtuniversity.nl> wrote:

Dear Cesar,

Let me try to understand the essence of your question/issue and abstract
it a bit from the specifics of your data. So, if I understand things
correctly, you have data from various places on Earth. Let's pretend
those places are on a 2d surface, so something like this (where *
indicates a place where you have data):

+------------------------+
| ????* ?????????????????|
| ?* ????????????????????|
| ????* ?????????????????|
| ????????????????????* ?|
| ????????????????* ?* ??|
| ???????????????????????|
+------------------------+

You have fitted a model that relates an outcome to some predictor
variables based on the data for these places. Now you actually have the
values of the predictor variables for *all* places on that surface and
you have computed the corresponding predicted values. But there are
locations for which there were no data to begin with (e.g., upper right
and lower left) and hence you want the SEs of the predicted values to
reflect this lack of information in those areas and you are wondering how
to do that. Does that capture the essence of your question?

Best,
Wolfgang

-----Original Message-----
From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces at r-
project.org] On Behalf Of Cesar Terrer Moreno
Sent: Monday, 22 January, 2018 18:52
To: r-sig-meta-analysis at r-project.org
Subject: [R-meta] Post-hoc weighted analysis based on number of
observations

I have a gridded dataset representing the standard error (SE) of an
effect. This SE was calculated through a meta-analysis and subsequent
predictive model applied on a grid:

ECMmeta <- rma(es, var, data=ecm.df ,control=list(stepadj=.5), mods= ~ 1
+ MAP + MAT*CO2dif, knha=TRUE)
options(na.action = "na.pass")
ECMpred <- predict(ECMmeta,
??????????????????newmods = cbind(s.df$precipitation, s.df$temperature,
CO2inc, s.df$temperature*CO2inc))
ECMrelSE <- rasterFromXYZ(ECMpred[,c("x", "y", "se")],crs="+proj=longlat
+datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0")

I would like to add a further level of uncertainty to SE based on the
number of measurements (observations) per type of ecosystem in the
dataset. The idea is that ecosystems that are poorly represented by
experiments in the dataset should have a higher SE than ecosystems with
plenty of measurements in the dataset.

I thought I could, for example, calculate an ecosystem-based weight as:

weight = n/sum(n)

That is, number of observations in a particular ecosystem divided by the
total of observations.

The next step would be to apply a weighting approach to each pixel. First
approach I've come up with is to simply multiply SE and the inverse of
the weight:

SEw=SE*(1/weight)

But the values are extremely high.

An approach like this would be more like an post-hoc patch. I am sure
something like this can be done within the meta-analysis at the
beginning. Alternatively, a better post-hoc approach or ideas to
investigate further would be welcome. Any recommendation or basic
approach commonly used to add further uncertainty to areas with low
representativeness?

Thanks

Cesar Terrer Moreno

Thu, Jan 25, 2018 4:32 AM #

Thanks Wolfgang. I?m reading some stuff about spatial uncertainty and it?s indeed interesting, though complex, so I am bit lost at the moment. Please, take your time to think about it. I can send you my code and data so far.

Cheers,
Cesar

On 25 Jan 2018, at 11:08, Viechtbauer Wolfgang (SP) <wolfgang.viechtbauer at maastrichtuniversity.nl> wrote:

I will need to mull over this for a bit, but I think this falls under 'spatial uncertainty' (a term worth googling in the meantime).

Best,
Wolfgang

-----Original Message-----
From: Cesar Terrer Moreno [mailto:cesar.terrer at me.com]
Sent: Thursday, 25 January, 2018 7:57
To: Viechtbauer Wolfgang (SP)
Cc: r-sig-meta-analysis at r-project.org
Subject: Re: [R-meta] Post-hoc weighted analysis based on number of
observations

Dear Wolfgang,

Thanks so much for your reply. You have captured the essence of the
question perfectly.

I have successfully scaled the meta-analysis-derived SE, so I have
basically produced a global map of the SE of the effect:

SE <- predict(meta,
                    newmods = cbind(s.df$precipitation, s.df$temperature,
CO2inc, s.df$temperature*CO2inc))$se

However, as you said, some locations, in this case ecosystems (e.g.
tropical forests) are poorly represented in the dataset. Therefore, a
proper assessment of the uncertainties of the approach should account for
the uncertainty associated with the sampling effort (or the lack of) in
some regions. Reviewers will check this for sure.

It turns out that ecosystem type, per se, is not a good predictor, thus
including it in the meta-regression probably does not make much sense (or
maybe yes). I was thus thinking more on a post-hoc solution, not
necessarily in a meta-analytic context, so maybe this distribution list
is not the right place to ask this question. The idea is to increase SE
in pixels dominated by ecosystems that are poorly sampled. The final
quantification of uncertainties would thus be an aggregation of the SEs
and some sort of multiplier that adds uncertainty in a particular pixel
as a function of the representativeness of the type of ecosystem in that
pixel.

For example:

group_by(ecosystem_type) %>% summarise(n = n()) %>% mutate (weight =
n/sum(n))

SEw= max(SE,na.rm=T) - max(SE,na.rm=T)*weight,

SEsum = SE + SEw

SEsum would thus be the sum of SE and another level of error driven by
the sample size of the type of ecosystem, and constrained to fall within
the range of observed SE from the dataset.

But I think this approach is not very elegant. Any other ideas?
Thanks
C?sar

On 24 Jan 2018, at 23:56, Viechtbauer Wolfgang (SP)
<wolfgang.viechtbauer at maastrichtuniversity.nl> wrote:

Dear Cesar,

Let me try to understand the essence of your question/issue and abstract
it a bit from the specifics of your data. So, if I understand things
correctly, you have data from various places on Earth. Let's pretend
those places are on a 2d surface, so something like this (where *
indicates a place where you have data):

+------------------------+
|     *                  |
|  *                     |
|     *                  |
|                     *  |
|                 *  *   |
|                        |
+------------------------+

You have fitted a model that relates an outcome to some predictor
variables based on the data for these places. Now you actually have the
values of the predictor variables for *all* places on that surface and
you have computed the corresponding predicted values. But there are
locations for which there were no data to begin with (e.g., upper right
and lower left) and hence you want the SEs of the predicted values to
reflect this lack of information in those areas and you are wondering how
to do that. Does that capture the essence of your question?

Best,
Wolfgang

-----Original Message-----
From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces at r-
project.org] On Behalf Of Cesar Terrer Moreno
Sent: Monday, 22 January, 2018 18:52
To: r-sig-meta-analysis at r-project.org
Subject: [R-meta] Post-hoc weighted analysis based on number of
observations

I have a gridded dataset representing the standard error (SE) of an
effect. This SE was calculated through a meta-analysis and subsequent
predictive model applied on a grid:

ECMmeta <- rma(es, var, data=ecm.df ,control=list(stepadj=.5), mods= ~ 1
+ MAP + MAT*CO2dif, knha=TRUE)
options(na.action = "na.pass")
ECMpred <- predict(ECMmeta,
                  newmods = cbind(s.df$precipitation, s.df$temperature,
CO2inc, s.df$temperature*CO2inc))
ECMrelSE <- rasterFromXYZ(ECMpred[,c("x", "y", "se")],crs="+proj=longlat
+datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0")

I would like to add a further level of uncertainty to SE based on the
number of measurements (observations) per type of ecosystem in the
dataset. The idea is that ecosystems that are poorly represented by
experiments in the dataset should have a higher SE than ecosystems with
plenty of measurements in the dataset.

I thought I could, for example, calculate an ecosystem-based weight as:

weight = n/sum(n)

That is, number of observations in a particular ecosystem divided by the
total of observations.

The next step would be to apply a weighting approach to each pixel. First
approach I've come up with is to simply multiply SE and the inverse of
the weight:

SEw=SE*(1/weight)

But the values are extremely high.

An approach like this would be more like an post-hoc patch. I am sure
something like this can be done within the meta-analysis at the
beginning. Alternatively, a better post-hoc approach or ideas to
investigate further would be welcome. Any recommendation or basic
approach commonly used to add further uncertainty to areas with low
representativeness?

Thanks

Cesar Terrer Moreno

Tue, Jan 30, 2018 9:43 AM #

Dear Wolfgang,

I have the feeling that spatial uncertainty would help defining uncertainty based on the geographical distance among the coordinates of the individual locations of the studies in the dataset.

However, in this case I think a simpler approach could suffice. For this particular matter, we could assume that a good representation of the different ?behaviours? of the system can be achieved through sampling intensively all types of biomes on Earth (e.g. grasslands, tropical forests, temperate forests, boreal forests), thus biomes as the unit of variability among studies. 

In this case, ?biome? is not a significantly important predictor, but this could be just the result of the low sample size in some biomes (or not). In any case, we have to somehow account for the low sample size in some biomes, allowing us to report the effect size is poorly sampled biomes yet with a very large uncertainty. This distinction between geographical uncertainty and biome representation is important, because with biome as a driver of uncertainty we can assume that e.g. uncertainty in a grassland in China should be low despite no sampling in Chinese grasslands, just because there are many other studies with grasslands in Europe, Australia and US. However, uncertainty in a tropical forest in Brazil should be large because there are very few tropical forests in the dataset, even if there are many grassland studies in Brazil in the dataset. This is the type of biome-driven uncertainty we need. 

Having said that, I don?t know how to account for this biome-driven uncertainty. 

I have tried to include ?Biome? as a random effect in the model:

meta <- rma.mv(es, var, data=df, method="ML", random= ~1|Biome, mods= ~ 1 + precipitation + temperature)

As I have data for temperature, precipitation, and biome type for virtually all points on Earth, I have upscaled this effect and standard error (SE) globally, creating a gridded map of the effect and SE:

pred <- predict(meta, newmods = cbind(s.df$precipitation, s.df$temperature), 
random= ~1|s.df$Biome)

SEraster <- rasterFromXYZ(pred[,c("x", "y", "se")],crs="+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0?) # x and y are the coordinates in each cell

However, the resulting raster of the SE of the effect is quite similar to the raster obtained with the model without the random effect, thus with low SE even in biomes that are poorly sampled (e.g. tropical forests). Why? How can I create a model where SEs are higher in regions with low biome representation?

Thanks

On 25 Jan 2018, at 11:08, Viechtbauer Wolfgang (SP) <wolfgang.viechtbauer at maastrichtuniversity.nl> wrote:

I will need to mull over this for a bit, but I think this falls under 'spatial uncertainty' (a term worth googling in the meantime).

Best,
Wolfgang

-----Original Message-----
From: Cesar Terrer Moreno [mailto:cesar.terrer at me.com]
Sent: Thursday, 25 January, 2018 7:57
To: Viechtbauer Wolfgang (SP)
Cc: r-sig-meta-analysis at r-project.org
Subject: Re: [R-meta] Post-hoc weighted analysis based on number of
observations

Dear Wolfgang,

Thanks so much for your reply. You have captured the essence of the
question perfectly.

I have successfully scaled the meta-analysis-derived SE, so I have
basically produced a global map of the SE of the effect:

SE <- predict(meta,
                    newmods = cbind(s.df$precipitation, s.df$temperature,
CO2inc, s.df$temperature*CO2inc))$se

However, as you said, some locations, in this case ecosystems (e.g.
tropical forests) are poorly represented in the dataset. Therefore, a
proper assessment of the uncertainties of the approach should account for
the uncertainty associated with the sampling effort (or the lack of) in
some regions. Reviewers will check this for sure.

It turns out that ecosystem type, per se, is not a good predictor, thus
including it in the meta-regression probably does not make much sense (or
maybe yes). I was thus thinking more on a post-hoc solution, not
necessarily in a meta-analytic context, so maybe this distribution list
is not the right place to ask this question. The idea is to increase SE
in pixels dominated by ecosystems that are poorly sampled. The final
quantification of uncertainties would thus be an aggregation of the SEs
and some sort of multiplier that adds uncertainty in a particular pixel
as a function of the representativeness of the type of ecosystem in that
pixel.

For example:

group_by(ecosystem_type) %>% summarise(n = n()) %>% mutate (weight =
n/sum(n))

SEw= max(SE,na.rm=T) - max(SE,na.rm=T)*weight,

SEsum = SE + SEw

SEsum would thus be the sum of SE and another level of error driven by
the sample size of the type of ecosystem, and constrained to fall within
the range of observed SE from the dataset.

But I think this approach is not very elegant. Any other ideas?
Thanks
C?sar

On 24 Jan 2018, at 23:56, Viechtbauer Wolfgang (SP)
<wolfgang.viechtbauer at maastrichtuniversity.nl> wrote:

Dear Cesar,

Let me try to understand the essence of your question/issue and abstract
it a bit from the specifics of your data. So, if I understand things
correctly, you have data from various places on Earth. Let's pretend
those places are on a 2d surface, so something like this (where *
indicates a place where you have data):

+------------------------+
|     *                  |
|  *                     |
|     *                  |
|                     *  |
|                 *  *   |
|                        |
+------------------------+

You have fitted a model that relates an outcome to some predictor
variables based on the data for these places. Now you actually have the
values of the predictor variables for *all* places on that surface and
you have computed the corresponding predicted values. But there are
locations for which there were no data to begin with (e.g., upper right
and lower left) and hence you want the SEs of the predicted values to
reflect this lack of information in those areas and you are wondering how
to do that. Does that capture the essence of your question?

Best,
Wolfgang

-----Original Message-----
From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces at r-
project.org] On Behalf Of Cesar Terrer Moreno
Sent: Monday, 22 January, 2018 18:52
To: r-sig-meta-analysis at r-project.org
Subject: [R-meta] Post-hoc weighted analysis based on number of
observations

I have a gridded dataset representing the standard error (SE) of an
effect. This SE was calculated through a meta-analysis and subsequent
predictive model applied on a grid:

ECMmeta <- rma(es, var, data=ecm.df ,control=list(stepadj=.5), mods= ~ 1
+ MAP + MAT*CO2dif, knha=TRUE)
options(na.action = "na.pass")
ECMpred <- predict(ECMmeta,
                  newmods = cbind(s.df$precipitation, s.df$temperature,
CO2inc, s.df$temperature*CO2inc))
ECMrelSE <- rasterFromXYZ(ECMpred[,c("x", "y", "se")],crs="+proj=longlat
+datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0")

I would like to add a further level of uncertainty to SE based on the
number of measurements (observations) per type of ecosystem in the
dataset. The idea is that ecosystems that are poorly represented by
experiments in the dataset should have a higher SE than ecosystems with
plenty of measurements in the dataset.

I thought I could, for example, calculate an ecosystem-based weight as:

weight = n/sum(n)

That is, number of observations in a particular ecosystem divided by the
total of observations.

The next step would be to apply a weighting approach to each pixel. First
approach I've come up with is to simply multiply SE and the inverse of
the weight:

SEw=SE*(1/weight)

But the values are extremely high.

An approach like this would be more like an post-hoc patch. I am sure
something like this can be done within the meta-analysis at the
beginning. Alternatively, a better post-hoc approach or ideas to
investigate further would be welcome. Any recommendation or basic
approach commonly used to add further uncertainty to areas with low
representativeness?

Thanks