Dear list, I am new to this list and I hope it is ok to post here even though I already posted this question on Cross Validated. I am trying to predict the daily amount of waste per person produced in the fishery sector. We surveyed fishing boats at the end of their fishing trip and the variables I have are duration of trip (days), number of fishers, waste category and waste weight (g), boat ID. For each fishing trip I calculated grams of waste per person per day, i.e. daily waste per capita. To predict daily waste per capita, I am using a gaussian mixed effect model with log(waste per capita) as response variable (I transformed it cause it was not normally distributed - and I'm not sure it's correct to do so). Explanatory variable is waste category and boat ID is a random effect. I use the predict function to estimate daily waste per capita for each category and then back transformed it with exp(...). My question is: is it correct to transform daily weight per capita to fit a gaussian model? Thanks so much for your advice! Alessandra
predicting waste per capita - is a gaussian model correct?
5 messages · Alessandra Bielli, Jeff Newmiller, John C Frain +1 more
It could possibly be alright, except that: a) you included no reference to your other post b) you posted here using HTML format, which can severely corrupt what we see on this plain text only mailing list c) your question is off topic, as your question is about statistics (theory) rather than R (a syntax and semantics for implementing theory). So, no, not ok this time.
On May 9, 2020 5:40:42 PM PDT, Alessandra Bielli <bielli.alessandra at gmail.com> wrote:
Dear list, I am new to this list and I hope it is ok to post here even though I already posted this question on Cross Validated. I am trying to predict the daily amount of waste per person produced in the fishery sector. We surveyed fishing boats at the end of their fishing trip and the variables I have are duration of trip (days), number of fishers, waste category and waste weight (g), boat ID. For each fishing trip I calculated grams of waste per person per day, i.e. daily waste per capita. To predict daily waste per capita, I am using a gaussian mixed effect model with log(waste per capita) as response variable (I transformed it cause it was not normally distributed - and I'm not sure it's correct to do so). Explanatory variable is waste category and boat ID is a random effect. I use the predict function to estimate daily waste per capita for each category and then back transformed it with exp(...). My question is: is it correct to transform daily weight per capita to fit a gaussian model? Thanks so much for your advice! Alessandra [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Sent from my phone. Please excuse my brevity.
On Sun, 10 May 2020 at 02:00, Alessandra Bielli <bielli.alessandra at gmail.com> wrote:
Dear list, I am new to this list and I hope it is ok to post here even though I already posted this question on Cross Validated. I am trying to predict the daily amount of waste per person produced in the fishery sector. We surveyed fishing boats at the end of their fishing trip and the variables I have are duration of trip (days), number of fishers, waste category and waste weight (g), boat ID. For each fishing trip I calculated grams of waste per person per day, i.e. daily waste per capita. To predict daily waste per capita, I am using a gaussian mixed effect model with log(waste per capita) as response variable (I transformed it cause it was not normally distributed - and I'm not sure it's correct to do so). Explanatory variable is waste category and boat ID is a random effect. I use the predict function to estimate daily waste per capita for each category and then back transformed it with exp(...). My question is: is it correct to transform daily weight per capita to fit a gaussian model? Thanks so much for your advice! Alessandra
There is no requirement that the dependent variable in a "regression" type estimation follows a gaussian distribution. You need a model of the process and then use an estimation technique to estimate your model. If effects in your model are additive do not use a log transformation. If effects are multiplicative then use a log transformation. The choice should be determined by the mechanics of the problem and not by the statistics. If you do use a log transformation the trying to reverse the process using an exponential transformation will be biased. The extent of that bias depends on your problem and it would not be possible to estimate the significance of the bias without a much greater knowledge of the process and data. I would suggest that you consult a competent statistician. John C Frain 3 Aranleigh Park Rathfarnham Dublin 14 Ireland www.tcd.ie/Economics/staff/frainj/home.html mailto:frainj at tcd.ie mailto:frainj at gmail.com
Well, this is 100% off-topic... And I wasn't planning to answer the OP's question. However, I disagree with your answer.
There is no requirement that the dependent variable in a "regression" type estimation follows a gaussian distribution.
False. It's depends on what type of '"regression" type estimation' one uses, among other things.
You need a model of the process and then use an estimation technique to estimate your model. If effects in your model are additive do not use a log transformation. If effects are multiplicative then use a log transformation.
The main question is, does the model satisfy the *assumptions*.
The choice should be determined by the mechanics of the problem and not by the statistics.
While a mechanistic understanding is definitely valuable... If the criteria for a good model vs a bad model, was whether the model was consistent with mechanistic theory/understanding, then nearly every statistical model I've seen would be a bad model. I would say, a good model is one that is useful...
If you do use a log transformation the trying to reverse the process using an exponential transformation will be biased. The extent of that bias depends on your problem and it would not be possible to estimate the significance of the bias without a much greater knowledge of the process and data.
Never heard of this before... But I do note back-transformation is not trivial, and I'm not an expert on back-transformations.
I would suggest that you consult a competent statistician.
I agree on that part...
Dear all First of all apologies for the off-topic question and for not respecting the other points. Second, thanks for your advice and opinion I will definitely consult a statistician. Regards, Alessandra
On Sun, May 10, 2020 at 4:57 PM Abby Spurdle <spurdle.a at gmail.com> wrote:
Well, this is 100% off-topic... And I wasn't planning to answer the OP's question. However, I disagree with your answer.
There is no requirement that the dependent variable in a "regression"
type
estimation follows a gaussian distribution.
False. It's depends on what type of '"regression" type estimation' one uses, among other things.
You need a model of the process and then use an estimation technique to estimate your model. If effects in your model are additive do not use a log transformation. If effects are multiplicative then use a log transformation.
The main question is, does the model satisfy the *assumptions*.
The choice should be determined by the mechanics of the problem and not by the statistics.
While a mechanistic understanding is definitely valuable... If the criteria for a good model vs a bad model, was whether the model was consistent with mechanistic theory/understanding, then nearly every statistical model I've seen would be a bad model. I would say, a good model is one that is useful...
If you do use a log transformation the trying to reverse the process using an exponential transformation will be biased. The extent of that bias depends on your problem and it would not be possible to
estimate
the significance of the bias without a much greater knowledge of the process and data.
Never heard of this before... But I do note back-transformation is not trivial, and I'm not an expert on back-transformations.
I would suggest that you consult a competent statistician.
I agree on that part...