Dear Aki, Before I could even suggest a modeling approach, I would need to better understand your dependent variable. You say that you have 'proportional data', but then also mention 'means' and 'SDs'. So, it seems to me that you do not have proportions per se (that is, you do not have a single count out of a total number of trials in each study -- which we could indeed model using a binomial GLMM with a logit link). Maybe you have studies where each participant conducted a number of trials, so that there is a proportion per participant and what is reported is the mean proportion and the SD among the proportions. But now I am just guessing. In either case, your glmer() syntax doesn't make sense. For a binomial GLMM, the 'weights' argument is used to give the number of trials when the response is the proportion of successes, but you are using 1/vi as weights. Best, Wolfgang -----Original Message----- From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces at r-project.org] On Behalf Of Akifumi Yanagisawa Sent: Wednesday, 20 December, 2017 19:57 To: r-sig-meta-analysis at r-project.org Subject: [R-meta] Question regarding Generalized Linear Mixed-effects Model for Meta-analysis Dear all, I am having some difficulty dealing with proportional data; the dependent variable is learning gain from an activity, in which means and SDs are converted into proportion. The learning gains are nested in each article; each article examined the learning gains from different types of activities and measured the learning gain at different timing (i.e., immediate post and delayed post). The main thing I would like to do is to get the estimated learning gain percentage and its confidence interval for each activity. Using the rma.mv() function, I noticed that estimation values go over 100% sometimes; then I thought I should use generalized linear mixed effects model. On the metafor?s webpage (http://www.metafor-project.org/doku.php/todo), I found that the rma.glmm() command does not support Multilevel Models so far and suggested using the LME4 package. I have been trying to figure out how to do this by myself, but I am not sure if I am doing this right. I would appreciate it if you could see if my approach is appropriate and answer to some of my questions. (1) The approach I tried was, (1) calculated variance from means, SDs, and the numbers of participants by using the escalc function, and (2) then I tried ?results <- glmer (learning_gain ~ ACTIVITY * TEST_TYPE * TEST_TIMING + (1|article_number/participant_group) + (1|TEST_TIMING:participant_group), weights = 1/vi, family = binomial (link = logit))?. I use the sjPlot package for plotting and the emmeans package to get estiamted learning gain percentages. Does this sound like the proper approach? Are there other options should I add? (2) Is it possible for me to get I^2 and H^2 values? I would like to know the proportion of variance explained by each the moderator. (3) Is there anyway I can conduct (a) Test for Residual Heterogeneity and (b) Test of Moderators? If so, which R package would you recommend? I noticed that the anova function does not provide p-values for the test, and the LmerTest package does not work with the glmer function, either. Any suggestions and comments will be greatly appreciated. Thank you for your help. Aki
[R-meta] Question regarding Generalized Linear Mixed-effects Model for Meta-analysis
11 messages · Michael Dewey, James Pustejovsky, Viechtbauer Wolfgang (STAT) +1 more
Thank you for your reply, Wolfgang. Your guess is right. I do not have a single count out of a total number of trials in each study. What I am using is the mean proportion and SD among the proportions. I am sad to hear that I cannot use the binomial distribution in glmer() in this case, and weights argument cannot be used as usual weights. Do you have any ideas on how to deal with this type of data? Thank you very much. Best regards, Aki
On Jan 2, 2018, at 4:03 AM, Viechtbauer Wolfgang (SP) <wolfgang.viechtbauer at maastrichtuniversity.nl<mailto:wolfgang.viechtbauer at maastrichtuniversity.nl>> wrote:
Dear Aki, Before I could even suggest a modeling approach, I would need to better understand your dependent variable. You say that you have 'proportional data', but then also mention 'means' and 'SDs'. So, it seems to me that you do not have proportions per se (that is, you do not have a single count out of a total number of trials in each study -- which we could indeed model using a binomial GLMM with a logit link). Maybe you have studies where each participant conducted a number of trials, so that there is a proportion per participant and what is reported is the mean proportion and the SD among the proportions. But now I am just guessing. In either case, your glmer() syntax doesn't make sense. For a binomial GLMM, the 'weights' argument is used to give the number of trials when the response is the proportion of successes, but you are using 1/vi as weights. Best, Wolfgang -----Original Message----- From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces at r-project.org] On Behalf Of Akifumi Yanagisawa Sent: Wednesday, 20 December, 2017 19:57 To: r-sig-meta-analysis at r-project.org<mailto:r-sig-meta-analysis at r-project.org> Subject: [R-meta] Question regarding Generalized Linear Mixed-effects Model for Meta-analysis Dear all, I am having some difficulty dealing with proportional data; the dependent variable is learning gain from an activity, in which means and SDs are converted into proportion. The learning gains are nested in each article; each article examined the learning gains from different types of activities and measured the learning gain at different timing (i.e., immediate post and delayed post). The main thing I would like to do is to get the estimated learning gain percentage and its confidence interval for each activity. Using the rma.mv() function, I noticed that estimation values go over 100% sometimes; then I thought I should use generalized linear mixed effects model. On the metafor?s webpage (http://www.metafor-project.org/doku.php/todo), I found that the rma.glmm() command does not support Multilevel Models so far and suggested using the LME4 package. I have been trying to figure out how to do this by myself, but I am not sure if I am doing this right. I would appreciate it if you could see if my approach is appropriate and answer to some of my questions. (1) The approach I tried was, (1) calculated variance from means, SDs, and the numbers of participants by using the escalc function, and (2) then I tried ?results <- glmer (learning_gain ~ ACTIVITY * TEST_TYPE * TEST_TIMING + (1|article_number/participant_group) + (1|TEST_TIMING:participant_group), weights = 1/vi, family = binomial (link = logit))?. I use the sjPlot package for plotting and the emmeans package to get estiamted learning gain percentages. Does this sound like the proper approach? Are there other options should I add? (2) Is it possible for me to get I^2 and H^2 values? I would like to know the proportion of variance explained by each the moderator. (3) Is there anyway I can conduct (a) Test for Residual Heterogeneity and (b) Test of Moderators? If so, which R package would you recommend? I noticed that the anova function does not provide p-values for the test, and the LmerTest package does not work with the glmer function, either. Any suggestions and comments will be greatly appreciated. Thank you for your help. Aki
Dear Aki In that case why not just use the mean and its sampling variance in the usual way? This may lead to impossible predictions as there will be no way of specifying that the means are bounded above and below but it may be the best you can do with what they have published. Michael
On 02/01/2018 20:48, Akifumi Yanagisawa wrote:
Thank you for your reply, Wolfgang. Your guess is right. I do not have a single count out of a total number of trials in each study. What I am using is the mean proportion and SD among the proportions. I am sad to hear that I cannot use the binomial distribution in glmer() in this case, and weights argument cannot be used as usual weights. Do you have any ideas on how to deal with this type of data? Thank you very much. Best regards, Aki On Jan 2, 2018, at 4:03 AM, Viechtbauer Wolfgang (SP) <wolfgang.viechtbauer at maastrichtuniversity.nl<mailto:wolfgang.viechtbauer at maastrichtuniversity.nl>> wrote: Dear Aki, Before I could even suggest a modeling approach, I would need to better understand your dependent variable. You say that you have 'proportional data', but then also mention 'means' and 'SDs'. So, it seems to me that you do not have proportions per se (that is, you do not have a single count out of a total number of trials in each study -- which we could indeed model using a binomial GLMM with a logit link). Maybe you have studies where each participant conducted a number of trials, so that there is a proportion per participant and what is reported is the mean proportion and the SD among the proportions. But now I am just guessing. In either case, your glmer() syntax doesn't make sense. For a binomial GLMM, the 'weights' argument is used to give the number of trials when the response is the proportion of successes, but you are using 1/vi as weights. Best, Wolfgang -----Original Message----- From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces at r-project.org] On Behalf Of Akifumi Yanagisawa Sent: Wednesday, 20 December, 2017 19:57 To: r-sig-meta-analysis at r-project.org<mailto:r-sig-meta-analysis at r-project.org> Subject: [R-meta] Question regarding Generalized Linear Mixed-effects Model for Meta-analysis Dear all, I am having some difficulty dealing with proportional data; the dependent variable is learning gain from an activity, in which means and SDs are converted into proportion. The learning gains are nested in each article; each article examined the learning gains from different types of activities and measured the learning gain at different timing (i.e., immediate post and delayed post). The main thing I would like to do is to get the estimated learning gain percentage and its confidence interval for each activity. Using the rma.mv() function, I noticed that estimation values go over 100% sometimes; then I thought I should use generalized linear mixed effects model. On the metafor?s webpage (http://www.metafor-project.org/doku.php/todo), I found that the rma.glmm() command does not support Multilevel Models so far and suggested using the LME4 package. I have been trying to figure out how to do this by myself, but I am not sure if I am doing this right. I would appreciate it if you could see if my approach is appropriate and answer to some of my questions. (1) The approach I tried was, (1) calculated variance from means, SDs, and the numbers of participants by using the escalc function, and (2) then I tried ?results <- glmer (learning_gain ~ ACTIVITY * TEST_TYPE * TEST_TIMING + (1|article_number/participant_group) + (1|TEST_TIMING:participant_group), weights = 1/vi, family = binomial (link = logit))?. I use the sjPlot package for plotting and the emmeans package to get estiamted learning gain percentages. Does this sound like the proper approach? Are there other options should I add? (2) Is it possible for me to get I^2 and H^2 values? I would like to know the proportion of variance explained by each the moderator. (3) Is there anyway I can conduct (a) Test for Residual Heterogeneity and (b) Test of Moderators? If so, which R package would you recommend? I noticed that the anova function does not provide p-values for the test, and the LmerTest package does not work with the glmer function, either. Any suggestions and comments will be greatly appreciated. Thank you for your help. Aki [[alternative HTML version deleted]]
_______________________________________________ R-sig-meta-analysis mailing list R-sig-meta-analysis at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-meta-analysis
Two additions: 1) Estimation of the sampling variance of a mean proportion is a bit more complex. Assume that in a given study there are n subjects, each of which completes t trials. So, for each subject, there is a proportion, p_i = x_i/t, where x_i denotes the number of 'successes' on the t trials. Let p = sum p_i / n denote the mean proportion and s^2 the variance of the proportions. Then the sampling variance of p can be estimated with: v = (p*(1-p) - s^2) / (n*t). So, when meta-analyzing values of p from multiple studies, the sampling variances should be computed in this way. 2) Instead of meta-analyzing values of p directly (which indeed might lead to predicted values outside of the 0-1 range), we can meta-analyze ln(p/(1-p)) values, which are unbounded and back-transformed values will always be in the 0-1 range. The sampling variance of ln(p/(1-p)) can be estimated with: v = 1/(p*(1-p))^2 * (p*(1-p) - s^2) / (n*t) Best, Wolfgang -----Original Message----- From: Michael Dewey [mailto:lists at dewey.myzen.co.uk] Sent: Wednesday, 03 January, 2018 10:59 To: Akifumi Yanagisawa; Viechtbauer Wolfgang (SP) Cc: r-sig-meta-analysis at r-project.org Subject: Re: [R-meta] Question regarding Generalized Linear Mixed-effects Model for Meta-analysis Dear Aki In that case why not just use the mean and its sampling variance in the usual way? This may lead to impossible predictions as there will be no way of specifying that the means are bounded above and below but it may be the best you can do with what they have published. Michael
On 02/01/2018 20:48, Akifumi Yanagisawa wrote:
Thank you for your reply, Wolfgang. Your guess is right. I do not have a single count out of a total number of trials in each study. What I am using is the mean proportion and SD among the proportions. I am sad to hear that I cannot use the binomial distribution in glmer() in this case, and weights argument cannot be used as usual weights. Do you have any ideas on how to deal with this type of data? Thank you very much. Best regards, Aki On Jan 2, 2018, at 4:03 AM, Viechtbauer Wolfgang (SP) <wolfgang.viechtbauer at maastrichtuniversity.nl<mailto:wolfgang.viechtbauer at maastrichtuniversity.nl>> wrote: Dear Aki, Before I could even suggest a modeling approach, I would need to better understand your dependent variable. You say that you have 'proportional data', but then also mention 'means' and 'SDs'. So, it seems to me that you do not have proportions per se (that is, you do not have a single count out of a total number of trials in each study -- which we could indeed model using a binomial GLMM with a logit link). Maybe you have studies where each participant conducted a number of trials, so that there is a proportion per participant and what is reported is the mean proportion and the SD among the proportions. But now I am just guessing. In either case, your glmer() syntax doesn't make sense. For a binomial GLMM, the 'weights' argument is used to give the number of trials when the response is the proportion of successes, but you are using 1/vi as weights. Best, Wolfgang -----Original Message----- From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces at r-project.org] On Behalf Of Akifumi Yanagisawa Sent: Wednesday, 20 December, 2017 19:57 To: r-sig-meta-analysis at r-project.org<mailto:r-sig-meta-analysis at r-project.org> Subject: [R-meta] Question regarding Generalized Linear Mixed-effects Model for Meta-analysis Dear all, I am having some difficulty dealing with proportional data; the dependent variable is learning gain from an activity, in which means and SDs are converted into proportion. The learning gains are nested in each article; each article examined the learning gains from different types of activities and measured the learning gain at different timing (i.e., immediate post and delayed post). The main thing I would like to do is to get the estimated learning gain percentage and its confidence interval for each activity. Using the rma.mv() function, I noticed that estimation values go over 100% sometimes; then I thought I should use generalized linear mixed effects model. On the metafor?s webpage (http://www.metafor-project.org/doku.php/todo), I found that the rma.glmm() command does not support Multilevel Models so far and suggested using the LME4 package. I have been trying to figure out how to do this by myself, but I am not sure if I am doing this right. I would appreciate it if you could see if my approach is appropriate and answer to some of my questions. (1) The approach I tried was, (1) calculated variance from means, SDs, and the numbers of participants by using the escalc function, and (2) then I tried ?results <- glmer (learning_gain ~ ACTIVITY * TEST_TYPE * TEST_TIMING + (1|article_number/participant_group) + (1|TEST_TIMING:participant_group), weights = 1/vi, family = binomial (link = logit))?. I use the sjPlot package for plotting and the emmeans package to get estiamted learning gain percentages. Does this sound like the proper approach? Are there other options should I add? (2) Is it possible for me to get I^2 and H^2 values? I would like to know the proportion of variance explained by each the moderator. (3) Is there anyway I can conduct (a) Test for Residual Heterogeneity and (b) Test of Moderators? If so, which R package would you recommend? I noticed that the anova function does not provide p-values for the test, and the LmerTest package does not work with the glmer function, either. Any suggestions and comments will be greatly appreciated. Thank you for your help. Aki
Wolfgang, Please forgive me for following up with questions that are pure statistical geekery. Do you have a reference for the formula you gave on estimating the sampling variance of a mean proportion? I haven't seen it before and was curious to know its development. Also, is there a problem with simply using s^2 / n? This is the unbiased variance estimator under simple random sampling, and so I would have thought that it would work adequately here. Best, James On Wed, Jan 3, 2018 at 4:18 AM, Viechtbauer Wolfgang (SP) <
wolfgang.viechtbauer at maastrichtuniversity.nl> wrote:
Two additions: 1) Estimation of the sampling variance of a mean proportion is a bit more complex. Assume that in a given study there are n subjects, each of which completes t trials. So, for each subject, there is a proportion, p_i = x_i/t, where x_i denotes the number of 'successes' on the t trials. Let p = sum p_i / n denote the mean proportion and s^2 the variance of the proportions. Then the sampling variance of p can be estimated with: v = (p*(1-p) - s^2) / (n*t). So, when meta-analyzing values of p from multiple studies, the sampling variances should be computed in this way. 2) Instead of meta-analyzing values of p directly (which indeed might lead to predicted values outside of the 0-1 range), we can meta-analyze ln(p/(1-p)) values, which are unbounded and back-transformed values will always be in the 0-1 range. The sampling variance of ln(p/(1-p)) can be estimated with: v = 1/(p*(1-p))^2 * (p*(1-p) - s^2) / (n*t) Best, Wolfgang -----Original Message----- From: Michael Dewey [mailto:lists at dewey.myzen.co.uk] Sent: Wednesday, 03 January, 2018 10:59 To: Akifumi Yanagisawa; Viechtbauer Wolfgang (SP) Cc: r-sig-meta-analysis at r-project.org Subject: Re: [R-meta] Question regarding Generalized Linear Mixed-effects Model for Meta-analysis Dear Aki In that case why not just use the mean and its sampling variance in the usual way? This may lead to impossible predictions as there will be no way of specifying that the means are bounded above and below but it may be the best you can do with what they have published. Michael On 02/01/2018 20:48, Akifumi Yanagisawa wrote:
Thank you for your reply, Wolfgang. Your guess is right. I do not have a single count out of a total number
of trials in each study. What I am using is the mean proportion and SD among the proportions.
I am sad to hear that I cannot use the binomial distribution in glmer()
in this case, and weights argument cannot be used as usual weights.
Do you have any ideas on how to deal with this type of data? Thank you very much. Best regards, Aki On Jan 2, 2018, at 4:03 AM, Viechtbauer Wolfgang (SP) <
wolfgang.viechtbauer at maastrichtuniversity.nl<mailto:wolfgang.viechtbauer@ maastrichtuniversity.nl>> wrote:
Dear Aki, Before I could even suggest a modeling approach, I would need to better
understand your dependent variable. You say that you have 'proportional data', but then also mention 'means' and 'SDs'. So, it seems to me that you do not have proportions per se (that is, you do not have a single count out of a total number of trials in each study -- which we could indeed model using a binomial GLMM with a logit link).
Maybe you have studies where each participant conducted a number of
trials, so that there is a proportion per participant and what is reported is the mean proportion and the SD among the proportions. But now I am just guessing.
In either case, your glmer() syntax doesn't make sense. For a binomial
GLMM, the 'weights' argument is used to give the number of trials when the response is the proportion of successes, but you are using 1/vi as weights.
Best, Wolfgang -----Original Message----- From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-
bounces at r-project.org] On Behalf Of Akifumi Yanagisawa
Sent: Wednesday, 20 December, 2017 19:57 To: r-sig-meta-analysis at r-project.org<mailto:r-sig-meta-
analysis at r-project.org>
Subject: [R-meta] Question regarding Generalized Linear Mixed-effects
Model for Meta-analysis
Dear all, I am having some difficulty dealing with proportional data; the
dependent variable is learning gain from an activity, in which means and SDs are converted into proportion. The learning gains are nested in each article; each article examined the learning gains from different types of activities and measured the learning gain at different timing (i.e., immediate post and delayed post). The main thing I would like to do is to get the estimated learning gain percentage and its confidence interval for each activity.
Using the rma.mv() function, I noticed that estimation values go over
100% sometimes; then I thought I should use generalized linear mixed effects model. On the metafor?s webpage (http://www.metafor-project. org/doku.php/todo), I found that the rma.glmm() command does not support Multilevel Models so far and suggested using the LME4 package. I have been trying to figure out how to do this by myself, but I am not sure if I am doing this right. I would appreciate it if you could see if my approach is appropriate and answer to some of my questions.
(1) The approach I tried was, (1) calculated variance from means, SDs,
and the numbers of participants by using the escalc function, and (2) then I tried ?results <- glmer (learning_gain ~ ACTIVITY * TEST_TYPE * TEST_TIMING + (1|article_number/participant_group) + (1|TEST_TIMING:participant_group), weights = 1/vi, family = binomial (link = logit))?. I use the sjPlot package for plotting and the emmeans package to get estiamted learning gain percentages. Does this sound like the proper approach? Are there other options should I add?
(2) Is it possible for me to get I^2 and H^2 values? I would like to
know the proportion of variance explained by each the moderator.
(3) Is there anyway I can conduct (a) Test for Residual Heterogeneity
and (b) Test of Moderators? If so, which R package would you recommend? I noticed that the anova function does not provide p-values for the test, and the LmerTest package does not work with the glmer function, either.
Any suggestions and comments will be greatly appreciated. Thank you for
your help.
Aki
_______________________________________________ R-sig-meta-analysis mailing list R-sig-meta-analysis at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-meta-analysis
Hi James, I tried to be clever and derived it myself. But now that I had a bit more time to think about this, I don't think it is applicable for these purposes. The equation gives an estimate of the sampling variance of p if we would repeatedly observe the performance of the same n individuals; that is, under repeated observations, their p_i values would differ, but it assumes that the underlying true probabilities stay the same across repeated observations. But the more appropriate sampling variance would be for repeated observations of n new individuals and their true probabilities would change across repeated observations. The latter type of sampling variance is indeed just estimated by s^2 / n. So, Aki, please ignore my previous mail. Well, except that you can still analyze ln(p/(1-p)). And the sampling variance of ln(p/(1-p)) would then be estimated with v = 1/(p*(1-p))^2 * s^2 / n. Best, Wolfgang -----Original Message----- From: James Pustejovsky [mailto:jepusto at gmail.com] Sent: Wednesday, 03 January, 2018 15:12 To: Viechtbauer Wolfgang (SP) Cc: Michael Dewey; Akifumi Yanagisawa; r-sig-meta-analysis at r-project.org Subject: Re: [R-meta] Question regarding Generalized Linear Mixed-effects Model for Meta-analysis Wolfgang, Please forgive me for following up with questions that are pure statistical geekery. Do you have a reference for the formula you gave on estimating the sampling variance of a mean proportion? I haven't seen it before and was curious to know its development. Also, is there a problem with simply using s^2 / n? This is the unbiased variance estimator under simple random sampling, and so I would have thought that it would work adequately here. Best, James
On Wed, Jan 3, 2018 at 4:18 AM, Viechtbauer Wolfgang (SP) <wolfgang.viechtbauer at maastrichtuniversity.nl> wrote:
Two additions: 1) Estimation of the sampling variance of a mean proportion is a bit more complex. Assume that in a given study there are n subjects, each of which completes t trials. So, for each subject, there is a proportion, p_i = x_i/t, where x_i denotes the number of 'successes' on the t trials. Let p = sum p_i / n denote the mean proportion and s^2 the variance of the proportions. Then the sampling variance of p can be estimated with: v = (p*(1-p) - s^2) / (n*t). So, when meta-analyzing values of p from multiple studies, the sampling variances should be computed in this way. 2) Instead of meta-analyzing values of p directly (which indeed might lead to predicted values outside of the 0-1 range), we can meta-analyze ln(p/(1-p)) values, which are unbounded and back-transformed values will always be in the 0-1 range. The sampling variance of ln(p/(1-p)) can be estimated with: v = 1/(p*(1-p))^2 * (p*(1-p) - s^2) / (n*t) Best, Wolfgang -----Original Message----- From: Michael Dewey [mailto:lists at dewey.myzen.co.uk] Sent: Wednesday, 03 January, 2018 10:59 To: Akifumi Yanagisawa; Viechtbauer Wolfgang (SP) Cc: r-sig-meta-analysis at r-project.org Subject: Re: [R-meta] Question regarding Generalized Linear Mixed-effects Model for Meta-analysis Dear Aki In that case why not just use the mean and its sampling variance in the usual way? This may lead to impossible predictions as there will be no way of specifying that the means are bounded above and below but it may be the best you can do with what they have published. Michael
On 02/01/2018 20:48, Akifumi Yanagisawa wrote:
Thank you for your reply, Wolfgang. Your guess is right. I do not have a single count out of a total number of trials in each study. What I am using is the mean proportion and SD among the proportions. I am sad to hear that I cannot use the binomial distribution in glmer() in this case, and weights argument cannot be used as usual weights. Do you have any ideas on how to deal with this type of data? Thank you very much. Best regards, Aki On Jan 2, 2018, at 4:03 AM, Viechtbauer Wolfgang (SP) <wolfgang.viechtbauer at maastrichtuniversity.nl<mailto:wolfgang.viechtbauer at maastrichtuniversity.nl>> wrote: Dear Aki, Before I could even suggest a modeling approach, I would need to better understand your dependent variable. You say that you have 'proportional data', but then also mention 'means' and 'SDs'. So, it seems to me that you do not have proportions per se (that is, you do not have a single count out of a total number of trials in each study -- which we could indeed model using a binomial GLMM with a logit link). Maybe you have studies where each participant conducted a number of trials, so that there is a proportion per participant and what is reported is the mean proportion and the SD among the proportions. But now I am just guessing. In either case, your glmer() syntax doesn't make sense. For a binomial GLMM, the 'weights' argument is used to give the number of trials when the response is the proportion of successes, but you are using 1/vi as weights. Best, Wolfgang -----Original Message----- From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces at r-project.org] On Behalf Of Akifumi Yanagisawa Sent: Wednesday, 20 December, 2017 19:57 To: r-sig-meta-analysis at r-project.org<mailto:r-sig-meta-analysis at r-project.org> Subject: [R-meta] Question regarding Generalized Linear Mixed-effects Model for Meta-analysis Dear all, I am having some difficulty dealing with proportional data; the dependent variable is learning gain from an activity, in which means and SDs are converted into proportion. The learning gains are nested in each article; each article examined the learning gains from different types of activities and measured the learning gain at different timing (i.e., immediate post and delayed post). The main thing I would like to do is to get the estimated learning gain percentage and its confidence interval for each activity. Using the rma.mv() function, I noticed that estimation values go over 100% sometimes; then I thought I should use generalized linear mixed effects model. On the metafor?s webpage (http://www.metafor-project.org/doku.php/todo), I found that the rma.glmm() command does not support Multilevel Models so far and suggested using the LME4 package. I have been trying to figure out how to do this by myself, but I am not sure if I am doing this right. I would appreciate it if you could see if my approach is appropriate and answer to some of my questions. (1) The approach I tried was, (1) calculated variance from means, SDs, and the numbers of participants by using the escalc function, and (2) then I tried ?results <- glmer (learning_gain ~ ACTIVITY * TEST_TYPE * TEST_TIMING + (1|article_number/participant_group) + (1|TEST_TIMING:participant_group), weights = 1/vi, family = binomial (link = logit))?. I use the sjPlot package for plotting and the emmeans package to get estiamted learning gain percentages.? Does this sound like the proper approach? Are there other options should I add? (2) Is it possible for me to get I^2 and H^2 values? I would like to know the proportion of variance explained by each the moderator. (3) Is there anyway I can conduct (a) Test for Residual Heterogeneity and (b) Test of Moderators? If so, which R package would you recommend? I noticed that the anova function does not provide p-values for the test, and the LmerTest package does not work with the glmer function, either. Any suggestions and comments will be greatly appreciated. Thank you for your help. Aki
1 day later
Thank you for you comments, Wolfgang, Michael, and James. Thank you very much for suggesting using ln(p/(1-p)) for response variables, Wolfgang. That?s really nice to hear that I can limit the range of response variables from 0 to 1 by using this function. I will try this approach with my data! I would like to learn more about this approach. So, If you know any, could you let me know some of the research articles or statistics textbooks that explain how to use this approach? Thank you very much. Best regards, Aki
On Jan 3, 2018, at 9:59 AM, Viechtbauer Wolfgang (SP) <wolfgang.viechtbauer at maastrichtuniversity.nl<mailto:wolfgang.viechtbauer at maastrichtuniversity.nl>> wrote:
Hi James, I tried to be clever and derived it myself. But now that I had a bit more time to think about this, I don't think it is applicable for these purposes. The equation gives an estimate of the sampling variance of p if we would repeatedly observe the performance of the same n individuals; that is, under repeated observations, their p_i values would differ, but it assumes that the underlying true probabilities stay the same across repeated observations. But the more appropriate sampling variance would be for repeated observations of n new individuals and their true probabilities would change across repeated observations. The latter type of sampling variance is indeed just estimated by s^2 / n. So, Aki, please ignore my previous mail. Well, except that you can still analyze ln(p/(1-p)). And the sampling variance of ln(p/(1-p)) would then be estimated with v = 1/(p*(1-p))^2 * s^2 / n. Best, Wolfgang -----Original Message----- From: James Pustejovsky [mailto:jepusto at gmail.com] Sent: Wednesday, 03 January, 2018 15:12 To: Viechtbauer Wolfgang (SP) Cc: Michael Dewey; Akifumi Yanagisawa; r-sig-meta-analysis at r-project.org<mailto:r-sig-meta-analysis at r-project.org> Subject: Re: [R-meta] Question regarding Generalized Linear Mixed-effects Model for Meta-analysis Wolfgang, Please forgive me for following up with questions that are pure statistical geekery. Do you have a reference for the formula you gave on estimating the sampling variance of a mean proportion? I haven't seen it before and was curious to know its development. Also, is there a problem with simply using s^2 / n? This is the unbiased variance estimator under simple random sampling, and so I would have thought that it would work adequately here. Best, James
On Wed, Jan 3, 2018 at 4:18 AM, Viechtbauer Wolfgang (SP) <wolfgang.viechtbauer at maastrichtuniversity.nl<mailto:wolfgang.viechtbauer at maastrichtuniversity.nl>> wrote:
Two additions: 1) Estimation of the sampling variance of a mean proportion is a bit more complex. Assume that in a given study there are n subjects, each of which completes t trials. So, for each subject, there is a proportion, p_i = x_i/t, where x_i denotes the number of 'successes' on the t trials. Let p = sum p_i / n denote the mean proportion and s^2 the variance of the proportions. Then the sampling variance of p can be estimated with: v = (p*(1-p) - s^2) / (n*t). So, when meta-analyzing values of p from multiple studies, the sampling variances should be computed in this way. 2) Instead of meta-analyzing values of p directly (which indeed might lead to predicted values outside of the 0-1 range), we can meta-analyze ln(p/(1-p)) values, which are unbounded and back-transformed values will always be in the 0-1 range. The sampling variance of ln(p/(1-p)) can be estimated with: v = 1/(p*(1-p))^2 * (p*(1-p) - s^2) / (n*t) Best, Wolfgang -----Original Message----- From: Michael Dewey [mailto:lists at dewey.myzen.co.uk] Sent: Wednesday, 03 January, 2018 10:59 To: Akifumi Yanagisawa; Viechtbauer Wolfgang (SP) Cc: r-sig-meta-analysis at r-project.org<mailto:r-sig-meta-analysis at r-project.org> Subject: Re: [R-meta] Question regarding Generalized Linear Mixed-effects Model for Meta-analysis Dear Aki In that case why not just use the mean and its sampling variance in the usual way? This may lead to impossible predictions as there will be no way of specifying that the means are bounded above and below but it may be the best you can do with what they have published. Michael
On 02/01/2018 20:48, Akifumi Yanagisawa wrote:
Thank you for your reply, Wolfgang. Your guess is right. I do not have a single count out of a total number of trials in each study. What I am using is the mean proportion and SD among the proportions. I am sad to hear that I cannot use the binomial distribution in glmer() in this case, and weights argument cannot be used as usual weights. Do you have any ideas on how to deal with this type of data? Thank you very much. Best regards, Aki
On Jan 2, 2018, at 4:03 AM, Viechtbauer Wolfgang (SP) <wolfgang.viechtbauer at maastrichtuniversity.nl<mailto:wolfgang.viechtbauer at maastrichtuniversity.nl><mailto:wolfgang.viechtbauer at maastrichtuniversity.nl>> wrote:
Dear Aki, Before I could even suggest a modeling approach, I would need to better understand your dependent variable. You say that you have 'proportional data', but then also mention 'means' and 'SDs'. So, it seems to me that you do not have proportions per se (that is, you do not have a single count out of a total number of trials in each study -- which we could indeed model using a binomial GLMM with a logit link). Maybe you have studies where each participant conducted a number of trials, so that there is a proportion per participant and what is reported is the mean proportion and the SD among the proportions. But now I am just guessing. In either case, your glmer() syntax doesn't make sense. For a binomial GLMM, the 'weights' argument is used to give the number of trials when the response is the proportion of successes, but you are using 1/vi as weights. Best, Wolfgang -----Original Message----- From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces at r-project.org] On Behalf Of Akifumi Yanagisawa Sent: Wednesday, 20 December, 2017 19:57 To: r-sig-meta-analysis at r-project.org<mailto:r-sig-meta-analysis at r-project.org><mailto:r-sig-meta-analysis at r-project.org> Subject: [R-meta] Question regarding Generalized Linear Mixed-effects Model for Meta-analysis Dear all, I am having some difficulty dealing with proportional data; the dependent variable is learning gain from an activity, in which means and SDs are converted into proportion. The learning gains are nested in each article; each article examined the learning gains from different types of activities and measured the learning gain at different timing (i.e., immediate post and delayed post). The main thing I would like to do is to get the estimated learning gain percentage and its confidence interval for each activity. Using the rma.mv() function, I noticed that estimation values go over 100% sometimes; then I thought I should use generalized linear mixed effects model. On the metafor?s webpage (http://www.metafor-project.org/doku.php/todo), I found that the rma.glmm() command does not support Multilevel Models so far and suggested using the LME4 package. I have been trying to figure out how to do this by myself, but I am not sure if I am doing this right. I would appreciate it if you could see if my approach is appropriate and answer to some of my questions. (1) The approach I tried was, (1) calculated variance from means, SDs, and the numbers of participants by using the escalc function, and (2) then I tried ?results <- glmer (learning_gain ~ ACTIVITY * TEST_TYPE * TEST_TIMING + (1|article_number/participant_group) + (1|TEST_TIMING:participant_group), weights = 1/vi, family = binomial (link = logit))?. I use the sjPlot package for plotting and the emmeans package to get estiamted learning gain percentages. Does this sound like the proper approach? Are there other options should I add? (2) Is it possible for me to get I^2 and H^2 values? I would like to know the proportion of variance explained by each the moderator. (3) Is there anyway I can conduct (a) Test for Residual Heterogeneity and (b) Test of Moderators? If so, which R package would you recommend? I noticed that the anova function does not provide p-values for the test, and the LmerTest package does not work with the glmer function, either. Any suggestions and comments will be greatly appreciated. Thank you for your help. Aki
2 days later
To be precise, ln(p/(1-p)) doesn't limit the range of the response variable, it actually maps p (which is restricted to 0 to 1) to -Inf to +Inf. It is then via the back-transformation that the final estimate or predicted values become restricted to the 0 to 1 range. As for articles/books: Just search for 'logit transformation'. Best, Wolfgang -----Original Message----- From: Akifumi Yanagisawa [mailto:ayanagis at uwo.ca] Sent: Friday, 05 January, 2018 15:54 To: Viechtbauer Wolfgang (SP) Cc: James Pustejovsky; Michael Dewey; r-sig-meta-analysis at r-project.org Subject: Re: [R-meta] Question regarding Generalized Linear Mixed-effects Model for Meta-analysis Thank you for you comments, Wolfgang, Michael, and James.? Thank you very much for suggesting using ln(p/(1-p)) for response variables, Wolfgang. That?s really nice to hear that I can limit the range of response variables from 0 to 1 by using this function. I will try this approach with my data! I would like to learn more about this approach. So, If you know any, could you let me know some of the research articles or statistics textbooks that explain how to use this approach?? Thank you very much.? Best regards, Aki
On Jan 3, 2018, at 9:59 AM, Viechtbauer Wolfgang (SP) <wolfgang.viechtbauer at maastrichtuniversity.nl> wrote:
Hi James, I tried to be clever and derived it myself. But now that I had a bit more time to think about this, I don't think it is applicable for these purposes. The equation gives an estimate of the sampling variance of p if we would repeatedly observe the performance of the same n individuals; that is, under repeated observations, their p_i values would differ, but it assumes that the underlying true probabilities stay the same across repeated observations. But the more appropriate sampling variance would be for repeated observations of n new individuals and their true probabilities would change across repeated observations. The latter type of sampling variance is indeed just estimated by s^2 / n. So, Aki, please ignore my previous mail. Well, except that you can still analyze ln(p/(1-p)). And the sampling variance of ln(p/(1-p)) would then be estimated with v = 1/(p*(1-p))^2 * s^2 / n. Best, Wolfgang
Thank you very much for clarifying my understanding, Wolfgang. If you would not mind me asking one more question, could you let me know if there is any publication which I can cite how to calculate the sampling variance in this case: 1/(p*(1-p))^2 * s^2 / n)? I was able to figure out ?logit transformation?, and found the usual sample variance calculation for logit transformation for meta-analysis: 1/(np(1 ? p)); however, I could not find '1/(p*(1-p))^2 * s^2 / n)' by myself. Thank you so much, Aki
On Jan 7, 2018, at 5:09 PM, Viechtbauer Wolfgang (SP) <wolfgang.viechtbauer at maastrichtuniversity.nl> wrote: To be precise, ln(p/(1-p)) doesn't limit the range of the response variable, it actually maps p (which is restricted to 0 to 1) to -Inf to +Inf. It is then via the back-transformation that the final estimate or predicted values become restricted to the 0 to 1 range. As for articles/books: Just search for 'logit transformation'. Best, Wolfgang -----Original Message----- From: Akifumi Yanagisawa [mailto:ayanagis at uwo.ca] Sent: Friday, 05 January, 2018 15:54 To: Viechtbauer Wolfgang (SP) Cc: James Pustejovsky; Michael Dewey; r-sig-meta-analysis at r-project.org Subject: Re: [R-meta] Question regarding Generalized Linear Mixed-effects Model for Meta-analysis Thank you for you comments, Wolfgang, Michael, and James. Thank you very much for suggesting using ln(p/(1-p)) for response variables, Wolfgang. That?s really nice to hear that I can limit the range of response variables from 0 to 1 by using this function. I will try this approach with my data! I would like to learn more about this approach. So, If you know any, could you let me know some of the research articles or statistics textbooks that explain how to use this approach? Thank you very much. Best regards, Aki On Jan 3, 2018, at 9:59 AM, Viechtbauer Wolfgang (SP) <wolfgang.viechtbauer at maastrichtuniversity.nl> wrote: Hi James, I tried to be clever and derived it myself. But now that I had a bit more time to think about this, I don't think it is applicable for these purposes. The equation gives an estimate of the sampling variance of p if we would repeatedly observe the performance of the same n individuals; that is, under repeated observations, their p_i values would differ, but it assumes that the underlying true probabilities stay the same across repeated observations. But the more appropriate sampling variance would be for repeated observations of n new individuals and their true probabilities would change across repeated observations. The latter type of sampling variance is indeed just estimated by s^2 / n. So, Aki, please ignore my previous mail. Well, except that you can still analyze ln(p/(1-p)). And the sampling variance of ln(p/(1-p)) would then be estimated with v = 1/(p*(1-p))^2 * s^2 / n. Best, Wolfgang
1/(np(1 ? p)) applies when you have a single proportion based on a binomial distribution, but this isn't what you have. In your case, you have p ~ N(P, sigma^2 / n) (asymptotically) and then I just use the delta method (https://en.wikipedia.org/wiki/Delta_method) to get ln(p/(1-p)) ~ N(ln(P/(1-P)), 1/(P(1-P))^2 * sigma^2 / n). Then substitute p for P and s^2 for sigma^2. 1/(np(1 ? p)) is derived in the same way. For a 'binomial proportion', p ~ N(P, P(1-P)/n) asymptotically. Then ln(p/(1-p)) ~ N(ln(P/(1-P)), 1/(P(1-P))^2 * P(1-P)/n), which simplifies to 1/(nP(1-P) and then again substitute p for P. Best, Wolfgang -----Original Message----- From: Akifumi Yanagisawa [mailto:ayanagis at uwo.ca] Sent: Sunday, 07 January, 2018 23:41 To: Viechtbauer Wolfgang (SP) Cc: r-sig-meta-analysis at r-project.org Subject: Re: [R-meta] Question regarding Generalized Linear Mixed-effects Model for Meta-analysis Thank you very much for clarifying my understanding, Wolfgang. If you would not mind me asking one more question, could you let me know if there is any publication which I can cite how to calculate the sampling variance in this case: 1/(p*(1-p))^2 * s^2 / n)? I was able to figure out ?logit transformation?, and found the usual sample variance calculation for logit transformation for meta-analysis: 1/(np(1 ? p)); however, I could not find '1/(p*(1-p))^2 * s^2 / n)' by myself. Thank you so much, Aki
On Jan 7, 2018, at 5:09 PM, Viechtbauer Wolfgang (SP) <wolfgang.viechtbauer at maastrichtuniversity.nl> wrote: To be precise, ln(p/(1-p)) doesn't limit the range of the response variable, it actually maps p (which is restricted to 0 to 1) to -Inf to +Inf. It is then via the back-transformation that the final estimate or predicted values become restricted to the 0 to 1 range. As for articles/books: Just search for 'logit transformation'. Best, Wolfgang -----Original Message----- From: Akifumi Yanagisawa [mailto:ayanagis at uwo.ca] Sent: Friday, 05 January, 2018 15:54 To: Viechtbauer Wolfgang (SP) Cc: James Pustejovsky; Michael Dewey; r-sig-meta-analysis at r-project.org Subject: Re: [R-meta] Question regarding Generalized Linear Mixed-effects Model for Meta-analysis Thank you for you comments, Wolfgang, Michael, and James. Thank you very much for suggesting using ln(p/(1-p)) for response variables, Wolfgang. That?s really nice to hear that I can limit the range of response variables from 0 to 1 by using this function. I will try this approach with my data! I would like to learn more about this approach. So, If you know any, could you let me know some of the research articles or statistics textbooks that explain how to use this approach? Thank you very much. Best regards, Aki On Jan 3, 2018, at 9:59 AM, Viechtbauer Wolfgang (SP) <wolfgang.viechtbauer at maastrichtuniversity.nl> wrote: Hi James, I tried to be clever and derived it myself. But now that I had a bit more time to think about this, I don't think it is applicable for these purposes. The equation gives an estimate of the sampling variance of p if we would repeatedly observe the performance of the same n individuals; that is, under repeated observations, their p_i values would differ, but it assumes that the underlying true probabilities stay the same across repeated observations. But the more appropriate sampling variance would be for repeated observations of n new individuals and their true probabilities would change across repeated observations. The latter type of sampling variance is indeed just estimated by s^2 / n. So, Aki, please ignore my previous mail. Well, except that you can still analyze ln(p/(1-p)). And the sampling variance of ln(p/(1-p)) would then be estimated with v = 1/(p*(1-p))^2 * s^2 / n. Best, Wolfgang
Thank you so much for explaining the calculation for me, Wolfgang. This is not something I can come up with by myself. I will learn statistics more, so that I can understand the formula more clearly and deeply. I cannot thank you enough for all the support. I have learned so much through asking and reading responses from this mailing-list. Thank you again and I hope you have a great day, Best regards, Aki
On Jan 7, 2018, at 6:03 PM, Viechtbauer Wolfgang (SP) <wolfgang.viechtbauer at maastrichtuniversity.nl> wrote: 1/(np(1 ? p)) applies when you have a single proportion based on a binomial distribution, but this isn't what you have. In your case, you have p ~ N(P, sigma^2 / n) (asymptotically) and then I just use the delta method (https://en.wikipedia.org/wiki/Delta_method) to get ln(p/(1-p)) ~ N(ln(P/(1-P)), 1/(P(1-P))^2 * sigma^2 / n). Then substitute p for P and s^2 for sigma^2. 1/(np(1 ? p)) is derived in the same way. For a 'binomial proportion', p ~ N(P, P(1-P)/n) asymptotically. Then ln(p/(1-p)) ~ N(ln(P/(1-P)), 1/(P(1-P))^2 * P(1-P)/n), which simplifies to 1/(nP(1-P) and then again substitute p for P. Best, Wolfgang -----Original Message----- From: Akifumi Yanagisawa [mailto:ayanagis at uwo.ca] Sent: Sunday, 07 January, 2018 23:41 To: Viechtbauer Wolfgang (SP) Cc: r-sig-meta-analysis at r-project.org Subject: Re: [R-meta] Question regarding Generalized Linear Mixed-effects Model for Meta-analysis Thank you very much for clarifying my understanding, Wolfgang. If you would not mind me asking one more question, could you let me know if there is any publication which I can cite how to calculate the sampling variance in this case: 1/(p*(1-p))^2 * s^2 / n)? I was able to figure out ?logit transformation?, and found the usual sample variance calculation for logit transformation for meta-analysis: 1/(np(1 ? p)); however, I could not find '1/(p*(1-p))^2 * s^2 / n)' by myself. Thank you so much, Aki
On Jan 7, 2018, at 5:09 PM, Viechtbauer Wolfgang (SP) <wolfgang.viechtbauer at maastrichtuniversity.nl> wrote: To be precise, ln(p/(1-p)) doesn't limit the range of the response variable, it actually maps p (which is restricted to 0 to 1) to -Inf to +Inf. It is then via the back-transformation that the final estimate or predicted values become restricted to the 0 to 1 range. As for articles/books: Just search for 'logit transformation'. Best, Wolfgang -----Original Message----- From: Akifumi Yanagisawa [mailto:ayanagis at uwo.ca] Sent: Friday, 05 January, 2018 15:54 To: Viechtbauer Wolfgang (SP) Cc: James Pustejovsky; Michael Dewey; r-sig-meta-analysis at r-project.org Subject: Re: [R-meta] Question regarding Generalized Linear Mixed-effects Model for Meta-analysis Thank you for you comments, Wolfgang, Michael, and James. Thank you very much for suggesting using ln(p/(1-p)) for response variables, Wolfgang. That?s really nice to hear that I can limit the range of response variables from 0 to 1 by using this function. I will try this approach with my data! I would like to learn more about this approach. So, If you know any, could you let me know some of the research articles or statistics textbooks that explain how to use this approach? Thank you very much. Best regards, Aki On Jan 3, 2018, at 9:59 AM, Viechtbauer Wolfgang (SP) <wolfgang.viechtbauer at maastrichtuniversity.nl> wrote: Hi James, I tried to be clever and derived it myself. But now that I had a bit more time to think about this, I don't think it is applicable for these purposes. The equation gives an estimate of the sampling variance of p if we would repeatedly observe the performance of the same n individuals; that is, under repeated observations, their p_i values would differ, but it assumes that the underlying true probabilities stay the same across repeated observations. But the more appropriate sampling variance would be for repeated observations of n new individuals and their true probabilities would change across repeated observations. The latter type of sampling variance is indeed just estimated by s^2 / n. So, Aki, please ignore my previous mail. Well, except that you can still analyze ln(p/(1-p)). And the sampling variance of ln(p/(1-p)) would then be estimated with v = 1/(p*(1-p))^2 * s^2 / n. Best, Wolfgang