How to ignore data
Oh dear oh dear!!! another arrogant statistician/scientist One asks for help and instead one gets an ear full!!! So much for the much vaunted helpful R community. But thanks anyway, I guess you were trying........ Steve
On 2010/12/13 08:17 PM, Bert Gunter wrote:
Inline below. -- Bert On Mon, Dec 13, 2010 at 9:42 AM, Steve Sidney<sbsidney at mweb.co.za> wrote:
Thanks for the questions. 1) The data represents micro-organism counts and a count of zero in this case is highly unlikely given the info we have; including the other participants.
?? Censoring or an experimental failure? Big difference.
2) The data is submitted in duplicate and then a standardised sum and difference is established and is used to calculate a Z-score which is used as a measure of performance.
Z scores are usually inappropriate for count data, which are discrete and tend to be skew.
Given both 1) and 2) it is necessary to exclude a raw count of zero (since the log of 0 is meaningless) and a count of one (since the log of 1 of course is zero).
False. Correct statement is: "Because I do not know the statistical methodology necessary to handle such discrete data with 0 counts, I exclude them." You are confusing your ignorance of statistical methodology with the need for spurious ad hoc treatments. 0 counts can and should be handled by appropriate statistical methods (e.g. possibly 0 inflated Poisson models via glm() or otherwise).
I guess one can think of these values as outliers and that is what I am trying to exclude.
This is a wholly unscientific statement, I'm afraid.
There is ample evidence that such an approach is acceptable.
What evidence, pray tell? -- a prior culture of inappropriate analyses, perhaps? I do not wish to engage in a debate about this, but, again, all I can say is that the above statement is not scientific. If I were consulting with you, I would say "Please show me your 'evidence.' " But, of course, I am not, and won't. None of this is to say that you aren't correct in all respects. It is just that you have raised all my usual warning flags, so that I am somewhat skeptical. But that's MY problem. This is the last I will say on the matter, so feel free to get in the final word, as I will not respond. And I wish you success in your efforts. -- Bert
Thanks for the interest Steve On 2010/12/13 06:47 PM, Stavros Macrakis wrote:
If you need to take the log of the values for your calculation, then
what does it mean that you have 0 values in the input?
And why do you need to exclude the 1 values?
Are you sure that a) you are doing the correct kind of analysis and b)
the analysis is correct if you exclude 0 and 1?
-s
On Mon, Dec 13, 2010 at 10:38, Steve Sidney<sbsidney at mweb.co.za> wrote:
Dear list I have quite a small data set in which I need to have the following values ignored - not used when performing an analysis but they need to be included later in the report that I write. Can anyone help with a suggestion as to how this can be accomplished Values to be ignored 0 - zero and 1 this is in addition to NA (null) The reason is that I need to use the log10 of the values when performing the calculation. Currently I hand massage the data set, about a 100 values, of which less than 5 to 10 are in this category. The NA values are NOT the problem What I was hoping was that I did not have to use a series of if and ifelse statements. Perhaps there is a more elegant solution. Any ideas would be welcomed. Regards Steve
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.