Skip to content

Distribution plus background fitting

2 messages · Richard Longland, David Winsemius

#
Hi All,

I apologise if this question has been answered before, but my background is
a little different from most people using R, and the language we use seems
to be different! I am trying to analyse some nuclear physics data, which
consists of an ensemble of "energy" readings in a detector that, when
binned, form a number of Gaussian shaped peaks superimposed on a varying
background (see attached pdf image).

In my field, the common practice is to bin these readings into a histogram,
and then use a variety of chi-squared fitting techniques to fit a linear
background and Gaussian directly to the binned counts to obtain a position
and area under a peak. I understand that this is poor practice, especially
given my rather low counting rate, so I have been attempting to fit the raw
data with the fitdistrplus package (linked to from
http://www.r-bloggers.com/fitting-distributions-with-r).

I can fit an isolated Gaussian peak successfully using these methods (the
central peak in my attached image). However, I cannot find a suitable
distribution to account for the background in the peaks to the left of this
(which I assume to vary linearly under each peak for now). Is this possible
with usual distribution fitting methods, or should I stick to fitting the
binned data (possibly including some interval censoring corrections)?

Thank in advance for the help,
Richard Longland
-------------- next part --------------
A non-text attachment was scrubbed...
Name: SamplePeak.pdf
Type: application/pdf
Size: 6852 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130311/e2b51343/attachment.pdf>
#
On Mar 11, 2013, at 3:40 AM, Richard Longland wrote:

            
When you say "each peak" it suggests either that you know where peaks should be on the basis of theory or your eyeball-brain interaction is producing a filtered assessment of where your wetware peak detectors are registering a probable peak. Yu are implicitly creating some sort of distribution for the major peaks and then looking within the shoulders of those peaks for secondary peaks. (My detectors suggest perhaps as many 8 peaks but human beings are notorious for finding patterns in noise.) 

You have a mixture distribution and one possibility is that these are mixtures of normal distributions (aka Gaussians) with different mixing proportions. I do not think the fitdistrplus functions are designed for that task. When I go looking for packages and functions to do a task I use the sos package and the `findFn` function therein:

install.packages("sos")
library(sos)
findFn("mixture of normals")

The function I see in that list of 93 candidates that seems most likely to address my understanding of the problem is:

http://finzi.psych.upenn.edu/R/library/mixtools/html/normalmixEM.html

But it also looks as though there are other candidates. I would not think that binning of the data should be needed. Leaving it in its original state should allow the proper estimation of the data features by algorithms, unless requested by package authors in their vignettes or help pages.