Skip to content

smooth.spline error while fitting bacterial growth curves with grofit

4 messages · Jeffrey David Johnson, Bert Gunter

#
I'm trying to use the grofit package to compare growth rates between
bacterial cultures, but I've come across a couple glitches/things I
don't understand. I'm not sure if they're related to the package or to a
problem with my growth data, which is messy. Some strains don't follow
a proper logarithmic growth curve because they died or didn't grow over
the course of the experiment. I could remove those but it will get more
time consuming once I have more cultures going.

I've attached the 'time' matrix and 'data' data frame. This code should
fit the growth curves, but when I run it I get an error related to
`smooth.spline`:

require(grofit)
mytime <- as.matrix(read.table('time.txt'))
mydata <- read.csv('data.csv')
dimnames(mytime) <- NULL
fits <- gcFit(mytime, mydata, grofit.control(
  interactive=FALSE, # don't ask if the graphs look OK
  nboot.gc=1000,     # number of bootstraps
  fit.opt="s"        # just do splines, no models
))

= 1. growth curve =================================
----------------------------------------------------
= 2. growth curve =================================
----------------------------------------------------
= 3. growth curve =================================
----------------------------------------------------
Error in smooth.spline(time, data, spar = control$smooth.gc) : 
  'tol' must be strictly positive and finite
Error in gcFitSpline(time.cur, data.cur, gcID, control.change) : 
  object 'y.spl' not found

That error usually occurs at some point, though I've run through all 17
successfully a couple times. The documentation says:
I tried setting different values (0.1, 0.5, 0.9, 1, 10) and they all
cause the same error. If instead I use the `gcBootSpline` function
directly, it gives a different error about the number of bootstraps
being 0, when they clearly aren't:

fits <- gcBootSpline(mytime, mydata, grofit.control(nboot.gc=1000))

Error in gcBootSpline(mytime, mydata, grofit.control(nboot.gc =
1000)) : Number of bootstrap samples is zero! See grofit.control()

Am I using these right? Is there something about the data that would
make it un-fittable?
Jeff
#
1. Very likely, you have insufficient data in some of your growth
curves to do the fits using gcv. If  you remove the curves where the
bacteria didn't grow, things should work. Alternatively, there may
well be ways of expressing the model that would allow pooling across
cultures that didn't grow. (Sounds like a mixtures problem, actually:
you are mixing cultures that grow  with those that don't and need to
determine the mixing proportion and the growth parameters of those
that grew).

2. HOWEVER, IF you remove the curves, you may very well be getting the
wrong (biased) results -- i.e. your results will be irreproducible
garbage, as you will only be taking data from cultures that grew well.
I would **strongly** suggest you work with a local statistical expert
to help you deal with these issues. I do not think you should trust
remote advice from the internet on such complex data (including mine!)

Cheers,
Bert


Cheers,
Bert

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
Clifford Stoll




On Sun, May 17, 2015 at 10:42 AM, Jeffrey David Johnson
<jefdaj at berkeley.edu> wrote:
#
Thanks, I think you're right. I removed the strains whose final OD was
below 0.2 since all the ones that clearly grew are above that, and
grofit produces fewer errors on the remaining 6. The error still happens
occasionally, but if I stick to 1000 bootstraps instead of 10000 it's
not often. Of course I won't rely on these numbers! I'll try again once
my current timecourse is done with 6 replicates per strain, and if
everything is still messy rethink the experimental design.

... Which brings up another question. Would it be better to estimate
growth parameters (mu, lambda, etc.) for each replicate and then take
the mean and standard deviation of those, or to average the growth data
first and calculate one set of parameters per strain? (Sorry if that's
very basic statistics)
Jeff

On Sun, 17 May 2015 11:42:27 -0700
Bert Gunter <gunter.berton at gene.com> wrote:

            
#
Your question is OFFTOPIC for this list. Post on a statistics list
like stats.stackexchange.com .

But both your proposals are wrong, though depending on your data and
purpose, they may be adequate. I suggest you consult wit a local
statistician on the use of mixed effects models for repeated
measures/growth curves or post it on the same topics.

Cheers,
Bert

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
Clifford Stoll




On Sun, May 17, 2015 at 7:08 PM, Jeffrey David Johnson
<jefdaj at berkeley.edu> wrote: