Skip to content
Prev 397702 / 398500 Next

Drawing a sample based on certain condition

1) select only denied individuals from the original data. This is S.
2) There is a fixed sample size of exactly t
3) There is a fixed target sum T such that sum(t values from S) = T

You can reduce the problem. All large values where the max(S) + (t-1 smallest values) > T can be eliminated from S. Likewise if (t-1 max S) + min(S) < T then the smallest values can be eliminated.

You can estimate the number of options by asking how many ways can I draw t objects from the size of S with replacement. This is the size of S raised to the t power. If Joe has a score of 6 and Kim has a score of 6 these are different outcomes. If not, then S can be reduced to the number of unique values.

Could you randomly sample S, filter for the sum=T and then use the random sample? Do you need every case when there are millions of cases (or more)?

Is a long execution time important, or what do you consider a long execution time? Are you planning on looking at all t and T within a range?

Tim

-----Original Message-----
From: R-help <r-help-bounces at r-project.org> On Behalf Of Bert Gunter
Sent: Monday, April 14, 2025 4:16 PM
To: Duncan Murdoch <murdoch.duncan at gmail.com>
Cc: r-help at r-project.org
Subject: Re: [R] Drawing a sample based on certain condition

[External Email]

Just a comment:

You wish to draw subsets of size n with or without replacement -- and I suspect without replacement is simpler than with -- from a set of positive integers that sum to a fixed value T. This sounds related to the so-called subset sum problem in computational complexity theory: Given a set S of positive integers and positive total, T, is there *any* subset of S (i.e. a sample without replacement) whose sum is T? This is known to be NP-complete. So this means that listing all such subsets must be NP-complete. I don't know whether specifying that, as you have, the size of the subsets must be a fixed number (obviously?) simplifies the problem sufficiently to make it computationally tractable; computational wizards or suitable web searches might tell you this. But for these reasons,  your query about how to sample the set of all such subsets -- both those drawn with or without replacement -- seems computationally difficult to me.


Cheers,
Bert



"An educated person is one who can entertain new ideas, entertain others, and entertain herself."



On Mon, Apr 14, 2025 at 8:37?AM Duncan Murdoch <murdoch.duncan at gmail.com>
wrote:
______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide https://www.r-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Message-ID: <CH3PR22MB4514037DAF67F40068A0C128CFB32@CH3PR22MB4514.namprd22.prod.outlook.com>
In-Reply-To: <CAGxFJbRZKsyTqVA9Ra6H_cBCB1=3Yj48Yjaz4-ed2bjescDOFg@mail.gmail.com>