Prev 387628 / 398502 Next

Off topic --- underdispersed (pseudo) binomial data.

Abby Spurdle

Sat, Mar 27, 2021 4:29 PM

Further to yesterday's posts:
I think the "n" value would be the maximum possible number of jumps,
not the number of students.
In theory, the minimum possible number is zero, so the distributions
are more binomial-like than they look.

Also, there was a mistake in my comments.
The jump is from non-A-grade to A-grade, not non-pass to pass.

On Sat, Mar 27, 2021 at 10:00 PM Abby Spurdle <spurdle.a at gmail.com> wrote:

Sorry.
I just realized, after posting, that the "n" value in the dispersion
calculation isn't correct.
I'll have to revisit the simulation, tomorrow.

On Sat, Mar 27, 2021 at 9:11 PM Abby Spurdle <spurdle.a at gmail.com> wrote:

Hi Rolf,

Let's say we have a course called Corgiology 101, with a single moderated exam.
And let's say the moderators transform initial exam scores, such that
there are fixed percentages of pass rates and A grades.

Rather than count the number of passes, we can count the number of "jumps".
That is, the number of people that pass the corgiology exam after
moderation, that would not have passed without moderation.

I've created a function to test for underdispersion, based on your expression.
(I hope I got it right).

Then I've gone on to create simulations, using both constant and
nonconstant class sizes.
The nonconstant simulations apply an (approx) discrete scaling
transformation, referred to previously.

We can see from the examples that there are a lot of these jumps.
And more importantly, they appear to be underdispersed.

----code----
PASS.SCORE <- 0.5
A.SCORE <- 0.8

#target parameters
PASS.RATE <- 0.8
A.RATE <- 0.2
#unmoderated parameters
UNMOD.MEAN.SCORE <- 0.65
UNMOD.SD.SCORE <- 0.075

NCLASSES <- 2000
NSTUD.CONST <- 200
NSTUD.NONCONST.LIMS <- c (50, 800)

sim.njump <- function (nstud, mean0=UNMOD.MEAN.SCORE, sd0=UNMOD.SD.SCORE,
    pass.score=PASS.SCORE, a.score=A.SCORE,
    pass.rate=PASS.RATE, a.rate=A.RATE)
{   x <- rnorm (nstud, mean0, sd0)
    q <- quantile (x, 1 - c (pass.rate, a.rate), names=FALSE)
    dq <- diff (q)
    q <- (a.score - pass.score) / dq * q
    y <- pass.score - q [1] + (a.score - pass.score) / dq * x
    sum (x < a.score & y >= a.score)
}

sim.nclasses <- function (nclasses, nstud, nstud.std)
{   nstud <- rep_len (nstud, nclasses)
    njump <- integer (nclasses)
    for (i in 1:nclasses)
        njump [i] <- sim.njump (nstud [i])
    if (missing (nstud.std) )
        njump
    else
        round (nstud.std / nstud * njump)
}

is.under <- function (x, n)
    var (x) < mean (x) * (1 - mean (x) / n)

njump.hom <- sim.nclasses (NCLASSES, NSTUD.CONST)
nstud <- round (runif (NCLASSES, NSTUD.NONCONST.LIMS [1],
NSTUD.NONCONST.LIMS [2]) )
njump.het <- sim.nclasses (NCLASSES, nstud, NSTUD.CONST)

under.hom <- is.under (njump.hom, NSTUD.CONST)
under.het <- is.under (njump.het, NSTUD.CONST)
main.hom <- paste0 ("const class size (under=", under.hom, ")")
main.het <- paste0 ("diff class sizes (under=", under.het, ")")

p0 <- par (mfrow = c (2, 1) )
hist (njump.hom, main=main.hom)
hist (njump.het, main=main.het)
par (p0)
----code----

best,
B.


On Thu, Mar 25, 2021 at 2:33 PM Rolf Turner <r.turner at auckland.ac.nz> wrote:


I would like a real-life example of a data set which one might think to
model by a binomial distribution, but which is substantially
underdispersed. I.e. a sample X = {X_1, X_2, ..., X_N} where each X_i
is an integer between 0 and n (n known a priori) such that var(X) <<
mean(X)*(1 - mean(X)/n).

Does anyone know of any such examples?  Do any exist?  I've done
a perfunctory web search, and had a look at "A Handbook of Small
Data Sets" by Hand, Daly, Lunn, et al., and drawn a blank.

I've seen on the web some references to underdispersed "pseudo-Poisson"
data, but not to underdispersed "pseudo-binomial" data.  And of course
there's lots of *over* dispersed stuff.  But that's not what I want.

I can *simulate* data sets of the sor that I am looking for (so far the
only ideas I've had for doing this are pretty simplistic and
artificial) but I'd like to get my hands on a *real* example, if
possible.

Grateful for any pointers/suggestions.

cheers,

Rolf Turner

--
Honorary Research Fellow
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Thread (7 messages)

Rolf Turner Off topic --- underdispersed (pseudo) binomial data. Mar 24 Abby Spurdle Off topic --- underdispersed (pseudo) binomial data. Mar 25 Rolf Turner Off topic --- underdispersed (pseudo) binomial data. Mar 25 Duncan Murdoch Off topic --- underdispersed (pseudo) binomial data. Mar 26 Abby Spurdle Off topic --- underdispersed (pseudo) binomial data. Mar 27 Abby Spurdle Off topic --- underdispersed (pseudo) binomial data. Mar 27 Abby Spurdle Off topic --- underdispersed (pseudo) binomial data. Mar 27