Changing the binning of collected data
Lorenzo Isella wrote:
Dear All,
Apologies if this is too simple for this list.
Let us assume that you have an instrument measuring particle distributions.
The output is a set of counts {n_i} corresponding to a set of average
sizes {d_i}.
The set of {d_i} ranges from d_i_min to d_i_max either linearly of
logarithmically.
There is no access to further detailed information about the
distribution of the measured sizes, but at least you know enough to
plot n(d_i) (number of counts as a function of particle size).
If you can fit the {n_i} to a known distribution (e.g. normal or
lognormal), then you can choose a new set of average sizes, {D_i} and
plot the corresponding n_i(D_i).
But what if the initial {n_i}'s observations do not belong to a known
distribution and you still want to calculate n(D_i)?
On the top of my head, I think that whatever I do must conserve the
original total number of observations N=\sum_i{n_i}, but this does not
terribly constrain the problem.
Any suggestion is welcome.
Hi Lorenzo, You should probably be aware that both the position and spacing of category boundaries can have a large effect on parameter location tests carried out on the categorized data. See: Wainer, H., Geseroli, M. & Verdi, M. (2006) Finding what is not there through the unfortunate binning of results: The Mendel effect. Chance,19(1): 49-52. Lemon, J. On the perils of categorizing responses. Tutorials in Quantitative Methods for Psychology, 5(1): 35-39. Jim