Splitting a data column randomly into 3 groups
Hi Bert and All: good morning
I promise this would be the last time to write about this topic.
I come up with this R function (please see below), for sure with your help.
It works for all sample sizes. I also provided three different simple
examples.
with many thanks
abou
################## Here it is ###############
Random.Sample.IDs <- function (N,n, ngroups){ #### N = population size,
and n = sample size, ngroups = number of groups
population.IDs <- seq(1, N, by = 1)
sample.IDs <- sample(population.IDs,n)
##### to print sample.IDs in a column format
##### --------------------------------------------------
sample.IDs.in.column<-data.frame(sample.IDs)
print(sample.IDs.in.column)
reminder.n<-n%%ngroups
reminder.n
n.final<-n-reminder.n
n.final
m <- n %/% 3
m
s <- sample(1:n, n)
if (reminder.n == 0) {
group1.IDs <- sample.IDs[s[1:m]]
group2.IDs <- sample.IDs[s[(m+1):(2*m)]]
group3.IDs <- sample.IDs[s[(m*2+1):(3*m)]]
} else if(reminder.n == 1){
group1.IDs <- sample.IDs[s[1:(m+1)]]
group2.IDs <- sample.IDs[s[(m+2):(2*m+1)]]
group3.IDs <- sample.IDs[s[(m*2+2):(3*m+1)]]
} else if(reminder.n == 2){
group1.IDs <- sample.IDs[s[1:(m+1)]]
group2.IDs <- sample.IDs[s[(m+2):(2*m+2)]]
group3.IDs <- sample.IDs[s[(m*2+3):(3*m+2)]]
}
nn<-max(length(group1.IDs),length(group2.IDs),length(group3.IDs))
nn
length(group1.IDs) <- nn
length(group2.IDs) <- nn
length(group3.IDs) <- nn
groups.IDs <-cbind(group1.IDs,group2.IDs,group3.IDs)
groups.IDs
}
##### Examples
##### --------
Random.Sample.IDs (100,12,3) #### group sizes are equal (n1=n2=n3=4)
Random.Sample.IDs (100,13,3) #### group sizes are NOT equal (n1=5, n2=4,
n3=4)
Random.Sample.IDs (100,17,3) #### group sizes are NOT equal (n1=6, n2=6,
n3=5)
______________________
*AbouEl-Makarim Aboueissa, PhD*
*Professor, Statistics and Data Science*
*Graduate Coordinator*
*Department of Mathematics and Statistics*
*University of Southern Maine*
On Sun, Sep 5, 2021 at 6:50 PM Bert Gunter <bgunter.4567 at gmail.com> wrote:
In case anyone is still interested in my query, note that if there are n total items to be split into g groups as evenly as possible, if we define this as at most two different size groups whose size differs by 1, then: if n = k*g + r, where 0 <= r < g, then n = k*(g - r) + (k + 1)*r . i.e. g-r groups of size k and r groups of size k+1 So using R's modular arithmetic operators, which are handy to know about, we have: r = n %% g and k = n %/% g . (and note that you should disregard my previous stupid remark about numerical analysis). Cheers, Bert On Sat, Sep 4, 2021 at 3:34 PM Bert Gunter <bgunter.4567 at gmail.com> wrote:
I have a more general problem for you.
Given n items and 2 <=g <<n , how do you divide the n items into g
groups that are as "equal as possible."
First, operationally define "as equal as possible."
Second, define the algorithm to carry out the definition. Hint: Note
that sum{m[i]} for i <=g must sum to n, where m[i] is the number of
items in the ith group.
Third, write R code for the algorithm. Exercise for the reader.
I may be wrong, but I think numerical analysts might also have a
little fun here.
Randomization, of course, is trivial.
Cheers,
Bert
Bert Gunter
"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Sat, Sep 4, 2021 at 2:13 PM AbouEl-Makarim Aboueissa
<abouelmakarim1962 at gmail.com> wrote:
Dear Thomas: Thank you very much for your input in this matter. The core part of this R code(s) (please see below) was written by
*Richard
O'Keefe*. I had three examples with different sample sizes. *First sample of size n1 = 204* divided randomly into three groups of
sizes
68. *No problems with this one*. *The second sample of size n2 = 112* divided randomly into three
groups of
sizes 37, 37, and 38. BUT this R code generated three groups of equal
sizes
(37, 37, and 37). *How to fix the code to make sure that the output
will be
three groups of sizes 37, 37, and 38*. *The third sample of size n3 = 284* divided randomly into three groups
of
sizes 94, 95, and 95. BUT this R code generated three groups of equal
sizes
(94, 94, and 94). *Again*, h*ow to fix the code to make sure that the output will be three groups of sizes 94, 95, and 95*. With many thanks abou ########### ------------------------ ############# N1 <- 485 population1.IDs <- seq(1, N1, by = 1) #### population1.IDs n1<-204 ##### in this case the
size
of each group of the three groups = 68 sample1.IDs <- sample(population1.IDs,n1) #### sample1.IDs #### n1 <- length(sample1.IDs) m1 <- n1 %/% 3 s1 <- sample(1:n1, n1) group1.IDs <- sample1.IDs[s1[1:m1]] group2.IDs <- sample1.IDs[s1[(m1+1):(2*m1)]] group3.IDs <- sample1.IDs[s1[(m1*2+1):(3*m1)]] groups.IDs <-cbind(group1.IDs,group2.IDs,group3.IDs) groups.IDs ####### -------------------------- N2 <- 266 population2.IDs <- seq(1, N2, by = 1) #### population2.IDs n2<-112 ##### in this case the sizes of the
three
groups are(37, 37, and 38)
##### BUT this codes generate
three groups of equal sizes (37, 37, and 37)
sample2.IDs <- sample(population2.IDs,n2)
#### sample2.IDs
#### n2 <- length(sample2.IDs)
m2 <- n2 %/% 3
s2 <- sample(1:n2, n2)
group1.IDs <- sample2.IDs[s2[1:m2]]
group2.IDs <- sample2.IDs[s2[(m2+1):(2*m2)]]
group3.IDs <- sample2.IDs[s2[(m2*2+1):(3*m2)]]
groups.IDs <-cbind(group1.IDs,group2.IDs,group3.IDs)
groups.IDs
####### --------------------------
N3 <- 674
population3.IDs <- seq(1, N3, by = 1)
#### population3.IDs
n3<-284 ##### in this case the sizes of the
three
groups are(94, 95, and 95)
##### BUT this codes generate
three groups of equal sizes (94, 94, and 94)
sample2.IDs <- sample(population2.IDs,n2)
sample3.IDs <- sample(population3.IDs,n3)
#### sample3.IDs
#### n3 <- length(sample2.IDs)
m3 <- n3 %/% 3
s3 <- sample(1:n3, n3)
group1.IDs <- sample3.IDs[s3[1:m3]]
group2.IDs <- sample3.IDs[s3[(m3+1):(2*m3)]]
group3.IDs <- sample3.IDs[s3[(m3*2+1):(3*m3)]]
groups.IDs <-cbind(group1.IDs,group2.IDs,group3.IDs)
groups.IDs
______________________ *AbouEl-Makarim Aboueissa, PhD* *Professor, Statistics and Data Science* *Graduate Coordinator* *Department of Mathematics and Statistics* *University of Southern Maine* On Sat, Sep 4, 2021 at 11:54 AM Thomas Subia <tgs77m at yahoo.com> wrote: Abou, I?ve been following your question on how to split a data column
randomly
into 3 groups using R. My method may not be amenable for a large set of data but it surely
worth
considering since it makes sense intuitively. mydata <- LETTERS[1:11]
mydata
[1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" # Let?s choose a random sample of size 4 from mydata
random_grp1
[1] "J" "H" "D" "A" Now my next random selection of data is defined by data_wo_random <- setdiff(mydata,random_grp1) # this makes sense because I need to choose random data from a set
which
is defined by the difference of the sets mydata and random_grp1
data_wo_random
[1] "B" "C" "E" "F" "G" "I" "K" This is great! So now I can randomly select data of any size from
this set.
Repeating this process can easily generate subgroups of your original dataset of any size you want. Surely this method could be improved so that this could be done automatically. Nevertheless, this is an intuitive method which I believe is easier
to
understand than some of the other methods posted. Hope this helps! Thomas Subia Statistician
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.