Skip to content

How do you test for "consecutivity"?

9 messages · Anthony28, Marc Schwartz, Doran, Harold +4 more

#
I need to use R to model a large number of experiments (say, 1000). Each
experiment involves the random selection of 5 numbers (without replacement)
from a pool of numbers ranging between 1 and 30.

What I need to know is what *proportion* of those experiments contains two
or more numbers that are consecutive. So, for instance, an experiment that
yielded the numbers 2, 28, 31, 4, 27 would be considered a "consecutive =
true" experiment since 28 and 27 are two consecutive numbers, even though
they are not side-by-side.

I am quite new to R, so really am puzzled as to how to go about this. I've
tried sorting each experiment, and then subtracting adjacent pairs of
numbers to see if the difference is plus or minus 1. I'm also unsure about
whether to use an array to store all the data first.

Any assistance would be much appreciated.
#
Anthony28 wrote:
Vec <- c(2, 28, 31, 4, 27)

 > Vec
[1]  2 28 31  4 27

# Sort the vector
 > sort(Vec)
[1]  2  4 27 28 31

# Get differences between sequential elements
 > diff(sort(Vec))
[1]  2 23  1  3

# Are any differences == 1?
 > any(diff(sort(Vec)) == 1)
[1] TRUE

See ?sort, ?diff and ?any for more information

On your last question, if the data are all numeric and each experiment 
contains 30 elements from which you select five, then you can store the 
data in a N x 30 matrix, where N is the number of source data sets. The 
result could be stored in a N x 5 matrix.

You can then run your test of sequential members as follows, presuming 
'Res' contains the N x 5 result matrix:

   prop.table(table(apply(Res, 1, function(x) any(diff(sort(x)) == 1)))

The output will be the proportion TRUE/FALSE of rows that have 
sequential elements.

HTH,

Marc Schwartz
#
How about this


result <- numeric(10)
for(i in 1:10){
 x <- sample(1:30, 5, replace = FALSE)
 x <- sort(x)
 result[i] <- any(diff(x) == 1)
}
#
This will work:

my.list <- c(2, 28, 31, 4, 27)
sort(my.list)
diff(sort(my.list))
any(diff(sort(my.list)) == 1)


the middle two lines are only to illustrate what's going on.

Best wishes!


Charles Annis, P.E.

Charles.Annis at StatisticalEngineering.com
phone: 561-352-9699
eFax:  614-455-3265
http://www.StatisticalEngineering.com
 

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf Of Anthony28
Sent: Tuesday, April 29, 2008 8:52 AM
To: r-help at r-project.org
Subject: [R] How do you test for "consecutivity"?


I need to use R to model a large number of experiments (say, 1000). Each
experiment involves the random selection of 5 numbers (without replacement)
from a pool of numbers ranging between 1 and 30.

What I need to know is what *proportion* of those experiments contains two
or more numbers that are consecutive. So, for instance, an experiment that
yielded the numbers 2, 28, 31, 4, 27 would be considered a "consecutive =
true" experiment since 28 and 27 are two consecutive numbers, even though
they are not side-by-side.

I am quite new to R, so really am puzzled as to how to go about this. I've
tried sorting each experiment, and then subtracting adjacent pairs of
numbers to see if the difference is plus or minus 1. I'm also unsure about
whether to use an array to store all the data first.

Any assistance would be much appreciated.
#
On Tue, 29 Apr 2008, Anthony28 wrote:

            
Are the numbers 1:30 equiprobable??

If so, you can find the probability by direct enumeration.
0     1     2     3     4
65780 59800 15600  1300    26
FALSE      TRUE
0.4615946 0.5384054
If the numbers are not equiprobable, you will need to weight the values of 
tab[2,] according to the probability of each column of mat.

HTH,

Chuck
Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901
#
Hey Anthony,
There must be many ways to do this.  This is one of them:

#First, define a function to calculate the proportion of consecutive 
numbers in a vector.

prop.diff=function(x){
d=diff(sort(x))
prop=(sum(d==1)+1)/length(x)
return(prop)}

#Note that I am counting both numbers in a consecutive pair.  For 
example, the vector c(1,2,6,9,10) will contain 4 consecutive numbers.  I 
think this is what you wanted do do, right?

#Next, generate a matrix with 1000 columns (one for each experiment) and 
5 rows (the five numbers in each experiment).  Note the use of the 
'replicate' function to generate multiple sets of random numbers

selection=replicate(1000,sort(sample(1:30,5)))

#Third, use the apply function to apply the function we defined above to 
each column of the matrix

diffs=apply(selection,2,prop.diff)

# This will give you a vector with the 1000 proportions of consecutive 
numbers

Julian
Anthony28 wrote:
#
Hey Anthony,
My previous function may not work in all cases.  Say one of the 
experiments yields these numbers:

1,2,3,6,7

Would you say that the proportion of consecutive numbers is 100%?  If 
so, this will work:

prop.diff=function(x){
d=diff(sort(x))
prop=sum((c(0,d==1)+c(d==1,0))>0)
prop=prop/length(x)
return(prop)}

This function first identifies which numbers in your original vector are 
part of a sequence of consecutive numbers.

Julian
Julian Burgos wrote:
#
I'd just like to thank all you guys for stepping in so promptly with help. I
haven't yet had a chance to implement any of your code yet, but just by
looking over what you've suggested, I think I have enough to guide me. So
thanks once again!
#
Charles C. Berry:
Or by a simple formula:

    * Probabilities of Consecutive Integers in Lotto
    * Author(s): Stanley P. Gudder and James N. Hagler
    * Source: Mathematics Magazine, Vol. 74, No. 3 (Jun., 2001), pp. 216-222
    * Publisher: Mathematical Association of America
    * Stable URL: http://www.jstor.org/stable/2690723