Hi All, I am trying to best analyse a set of foraging ecology data with >10 behaviour categories (DVs) and 3 levels of IV (season, sex, age). The time which an animal spent engaged in a behaviour was recorded and then divided by the total time spent in sight of the observer, so my data are proportional. As is typical, not all animals engaged in all behaviours and there are a large number of zeros in my dataset which is severely over-dispersed. I had initially analysed all the data using the glm function (family = quasibinomial, followed by anova. The intention was then to use the false discovery rate alpha to account for the large number of analyses. However, it was pointed out to me that a multivariate approach might be better so I have been trying to figure out (a) if it's possible to run a quasi-binomial multivariate analysis of proportion data (b) how to go about it. In the R Documentation quasi-binomial family function page ( http://artax.karlin.mff.cuni.cz/r-help/library/VGAM/html/quasibinomialff.html ) it is stated that if multivariate response = TRUE the response matrix should be binary. This seems a pretty straightforward indictment of my idea to run this analysis on my proportion data, but I am wondering why - is this just not possible, or is there a particular package that could help? If anyone could provide me with an answer or some much needed guidance on this topic I would be very grateful. Thanks, Amanda
Multivariate quasi-bionomial analysis of proportion data?
4 messages · Amanda Greer, Bob OHara, Philippi, Tom +1 more
On 08/02/15 12:27, Amanda Greer wrote:
Hi All, I am trying to best analyse a set of foraging ecology data with >10 behaviour categories (DVs) and 3 levels of IV (season, sex, age). The time which an animal spent engaged in a behaviour was recorded and then divided by the total time spent in sight of the observer, so my data are proportional. As is typical, not all animals engaged in all behaviours and there are a large number of zeros in my dataset which is severely over-dispersed. I had initially analysed all the data using the glm function (family = quasibinomial, followed by anova. The intention was then to use the false discovery rate alpha to account for the large number of analyses. However, it was pointed out to me that a multivariate approach might be better so I have been trying to figure out (a) if it's possible to run a quasi-binomial multivariate analysis of proportion data (b) how to go about it. In the R Documentation quasi-binomial family function page ( http://artax.karlin.mff.cuni.cz/r-help/library/VGAM/html/quasibinomialff.html ) it is stated that if multivariate response = TRUE the response matrix should be binary. This seems a pretty straightforward indictment of my idea to run this analysis on my proportion data, but I am wondering why - is this just not possible, or is there a particular package that could help? If anyone could provide me with an answer or some much needed guidance on this topic I would be very grateful.
Ignoring the zeroes problem for the moment, I think (quasi-)binomial distributions are a distraction: binomials are based on counts of things (see Petr Keil's post: http://www.petrkeil.com/?p=603). If you're looking at proportions of times, then it might be better to think in terms of gamma distributions, which lead to a beta distribution for the proportion of times spent doing one thing, and a Dirichlet distribution if you have several items (as you do here). Once you have to worry about the zeroes, you need to do something more, for example see this paper: <http://onlinelibrary.wiley.com/doi/10.1111/2041-210X.12122/abstract> Bob
Thanks, Amanda [[alternative HTML version deleted]]
_______________________________________________ R-sig-ecology mailing list R-sig-ecology at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Bob O'Hara Biodiversity and Climate Research Centre Senckenberganlage 25 D-60325 Frankfurt am Main, Germany Tel: +49 69 7542 1863 Mobile: +49 1515 888 5440 WWW: http://www.bik-f.de/root/index.php?page_id=219 Blog: http://blogs.nature.com/boboh Journal of Negative Results - EEB: www.jnr-eeb.org
Amanda-- I'm not sure I would be convinced by you analyses, as I don't think your statistical model corresponds to your sampling or data generating process. But, I'd need to know more information about the response design (data collection) to make any suggestions. For binomial or quasi-, you aren't analyzing the ratio of time observed (DV) to total time observed, you're presumably using the number of minutes or seconds? If so, note that you get very different answers depending on the units, because the binomial response is treating each point observation as independent. Depending on the animal and the behaviors, in my experience not even minute or 10 minute observations are independent. How long is an individual animal observed in a given bout (period of consecutive recording)? Are individuals monitored for more than 1 bout? How many behaviors does it perform (on average) in one observation bout? How many times does it switch behavior in a bout? Even if it only does behaviors A & B, if it is doing A when you start observing, at some point it switches to B, and is still doing B when you stop recording, that is very different than it switching back & forth A B A B A B A B A B in a single bout. If you have lots of switching by individual animals in individual bouts, then there may be a reasonable mixed-model binomial-based approach, treating individual animals as random subjects. If not, there are some approaches to proportional data that might be a better approximation to your data and components of variation. But I've already stuck my neck out far enough guessing about how you might have collected your data, so I'll stop here unless you provide more information. I hope that this helps... Tom 2
On Sun, Feb 8, 2015 at 3:27 AM, Amanda Greer <manda.greer at gmail.com> wrote:
Hi All, I am trying to best analyse a set of foraging ecology data with >10 behaviour categories (DVs) and 3 levels of IV (season, sex, age). The time which an animal spent engaged in a behaviour was recorded and then divided by the total time spent in sight of the observer, so my data are proportional. As is typical, not all animals engaged in all behaviours and there are a large number of zeros in my dataset which is severely over-dispersed. I had initially analysed all the data using the glm function (family = quasibinomial, followed by anova. The intention was then to use the false discovery rate alpha to account for the large number of analyses. However, it was pointed out to me that a multivariate approach might be better so I have been trying to figure out (a) if it's possible to run a quasi-binomial multivariate analysis of proportion data (b) how to go about it. In the R Documentation quasi-binomial family function page ( http://artax.karlin.mff.cuni.cz/r-help/library/VGAM/html/quasibinomialff.html ) it is stated that if multivariate response = TRUE the response matrix should be binary. This seems a pretty straightforward indictment of my idea to run this analysis on my proportion data, but I am wondering why - is this just not possible, or is there a particular package that could help? If anyone could provide me with an answer or some much needed guidance on this topic I would be very grateful. Thanks, Amanda [[alternative HTML version deleted]]
_______________________________________________ R-sig-ecology mailing list R-sig-ecology at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
------------------------------------------- Tom [[alternative HTML version deleted]]
Thank you Bob and Tom for assistance, I was unaware of the distributions you referred to, Bob. Some more information: We filmed ~80 foraging bouts of varying length (all > 1 min) with behaviour categories such as eating flowers, eating roots, eating seeds, walking, digging... Each bout describes the behaviour of 1 focal individual. Individuals switched between behaviours a lot during each bout: eat, walk to next plant, eat, walk to next, eat, preen, eat, walk, eat, in a short space of time (< 1 min) would be typical, although I don't have the data on how many switches to hand. All data were recorded in seconds. The same individual was occasionally recorded in a second bout but only if longer than 15 minutes had elapsed from the end of the previous bout. All of our DVs are foraging or searching behaviours, as these are not the only behaviours the animals engaged in they do not necessarily total to 100% of the bout recorded. We are interested in the effects of season, sex and age on each DV. Our original analysis was: seasonal (oneway ANOVAs) and age by sex (3 x 2 factorial ANOVAs). We ran all age by sex ANOVAs exclusively on bouts recorded in summer as this was the only season with an even spread of age and sex categories. We used the Benjamini-Hochberg procedure to adjust p values. Any further advice you have would be greatly appreciated, please let me know if I can provide any more info. Thanks, Amanda -- View this message in context: http://r-sig-ecology.471788.n2.nabble.com/Multivariate-quasi-bionomial-analysis-of-proportion-data-tp7579294p7579300.html Sent from the r-sig-ecology mailing list archive at Nabble.com.