Skip to content
Prev 396636 / 398502 Next

please help generate a square correlation matrix

It sounds as though you have null hypothesis "x records independent
Bernoulli trials with the same (unknown) success probability p_x, y
records independent Bernoulli trials with the same (unknown) success
probability p_y, x and y are independent" and alternative hypothesis
"x and y succeed less often than they would under the null
hypothesis".  The obvious way to do that is to fit \hat{p_x} =
sum(x)/length(x), \hat{p_y| = sum(y)/length(y), and then compute the
lower tail Pr(number of times x and y succeed <= sum(xy) | p_x =
\hat{p_x} \& p_y = \hat{p_y}, and unless I am completely off my head
with sleepiness, this is just

pbinom(sum(x*y), length(x), mean(x)*mean(y))

So I don't quite see why you wanted correlations.

Since you say that "WE measured ..." the various Signor-Lipps-like
scenarios I was thinking of probably don't apply.  There are other
threats to validity:
- the presence of two (or more) mutations may be hard for your
equipment to detect
- patients with multiple mutations may die faster so may be less
likely to be captured for your study
- cell division rates decrease with age
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6789572/ so mutations
whose likelihood  depends on *rate* might tend to occur earlier in
life while mutations that depend on *accumulated* error might tend to
occur later in life, so "x occurs SOME time in a patient's life" and
"y occurs SOME time in a patient's life" might be independent while "x
and y occur at the SAME time in a patient's life" might be unlikely.
It would be interesting to check whether the frequency of each
mutation is independent of patient age, because you might want to
stratify the pbinom test by age in that case.  Exposure to
environmental mutagens is also likely to vary with age.

Looking at supermarket data in the past primed me to expect rates to
vary with age.  Sunscreen and cough mixture are negatively associated
(:-).
On Sun, 28 Jul 2024 at 12:40, Yuan Chun Ding <ycding at coh.org> wrote:
Message-ID: <CABcYAdK0f9wSfViZxPjtuMTst+AA4wZSfzoRp2XDyYCSRW6_Qg@mail.gmail.com>
In-Reply-To: <MN2PR02MB69116399477B384CCBB04BFCD4B62@MN2PR02MB6911.namprd02.prod.outlook.com>