Skip to content

[Fwd: Re: Coefficients of Logistic Regression from bootstrap - how to get them?]

2 messages · Michal Figurski, Frank E Harrell Jr

#
Thank you Frank and all for your advices.

Here I attach the raw data from the Pawinski's paper. I have obtained
permission from the corresponding Author to post it here for everyone.
The only condition of use is that the Authors retain ownership of the
data, and any publication resulting from these data must be managed by them.

The dataset is composed as follows: patient number / MMF dose in [g] /
Day of study (since start of drug administration) / MPA concentrations
[mg/L] in plasma at following time points: 0, 0.5 ... 12 hours / and the
value of AUC(0-12h) calculated using all time-points.

The goal of the analysis, as you can read from the paper, was to
estimate the value of AUC using maximum 3 time-points within 2 hours
post dose, that is using only 3 of the 4 time-points: 0, 0.5, 1, 2 - but
always include the "0" time-point.

In my analysis of similar problem I was also concerned about the fact
that data come from several visits of a single patient. I have examined
the effect of "PT" with repeated "day" using mixed effects model, and
these effects turned out to be insignificant. Do you guys think it is
enough justification to use the dataset as if coming from 50 separate
patients?

Also, as to estimation of the bias, variance, etc, Pawinski used CI and
Sy/x. In my analysis I additionally used RMSE values. Please excuse
another naive question, but: do you think it is sufficient information
to compare between models and account for bias?

Regarding the "multiple stepwise regression" - according to the cited
SPSS manual, there are 5 options to select from. I don't think they used
'stepwise selection' option, because their models were already
pre-defined. Variables were pre-selected based on knowledge of
pharmacokinetics of this drug and other factors. I think this part I
understand pretty well.

I see the Frank's point about recalibration on Fig.2 - although the
expectation was set that the prediction be within 15% of the original
value. In my opinion it is *very strict* - I actually used 20% in my
work. This is because of very high variability and imprecision in the
results themselves. These are real biological data and you have to
account for errors like analytical errors (HPLC method), timing errors
and so on, when you look at these data. In other words, if you take two
blood samples at each time-point from a particular patient, and run
them, you will 100% certainly get two distinct (although similar)
profiles. You will get even more difference, if you run one set of
samples on one day, and another set on second day.

Therefore the value of AUC(0-12) itself, to which we compare the
predicted AUC, is not 'holy' - some variability here is inherent.

Nevertheless, I see that the Fig.2 may be incorrect, if we look from
orthodox statistical perspective. I used the same plots in my work as
well - it's too late now. How should I properly estimate the Rsq then?

I greatly appreciate your time and advices in this matter.

--
Michal J. Figurski
Frank E Harrell Jr wrote:
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: Dataset.csv
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20080724/f8ce0b2b/attachment.pl>
#
Michal Figurski wrote:
I don't think that is the way to assess it, but rather estimation the 
intra-subject correlation should be used.  Or compare variances from the 
cluster bootstrap and the ordinary bootstrap.
RMSE is usually a good approch.
Validation Rsq is 1 - sum of squared errors / sum of squared total.

Frank