Splitting Area under curve into equal portions
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi Milton,
Not quite, that would be an equal number of data points in each colour group.
What I want is an unequal number of points in each group such that:
sum(work[group.members]) is approximately the same for each group of data points.
In the mean time, I came up with the following, and took a leaf out of your book
with the colouring for example:
<code>
n <- 2002
work <- vector()
for(x in 1:(n-2)) {
work[x] <- ((n-1-x)*(n-x))/2
}
plot(work)
tasks <- vector('list')
tasks_per_slave <- 1
work_per_task <- sum(work) / (n_slaves * tasks_per_slave)
# Now define ranges of x of equal "work"
block_start <- 1
for(x in (1:(length(work)))) {
if(x == length(work)) {
# this will be the last block
tasks[[length(tasks)+1]] <- list(x=block_start:length(work))
break
}
work_in_block_to_x <- sum(work[block_start:(x)])
if(work_in_block_to_x > work_per_task) {
# use this value of x as the chunk end
tasks[[length(tasks)+1]] <- list(x=block_start:x)
# move the block_start position
block_start <- x+1
}
}
colours <- vector()
for(i in 1:length(tasks)) {
colours <- append(colours,rep(i,length(tasks[[i]]$x)))
}
plot(work, col=colours)
</code>
Essentially, the area under the line for each of the coloured groups (i.e. the
total work associated with those values of x) should be approximately equal and
I believe the above code achieves this. Just found the cumsum() function. You
could look at it this way:
<code>
plot(cumsum(work), col=colours)
</code>
The coloured groupings coincide with splitting the cumulative total (y-axis)
into 4 approximately equal bits.
There must be a nicer way to do this!
Nathan
milton ruser wrote:
Hi Nathan,
I am not sure that I understood what you need, and
also I know that it is not a elegant solution, but may
do the job.
n <- 1991
work <- vector()
for(x in 1:n) {
work[x] <- sum(1:(n-x+1))
}
plot(work)
number.groups <- 5
last.i<-0
number.groups.list<-NULL
for (i in 1:(number.groups-1))
{
number.groups.list<-c(number.groups.list, rep(i,
round(length(work)/number.groups,0)))
}
number.groups.list<-c(number.groups.list, rep(number.groups,
(length(work)-length(number.groups.list)) ))
aggregate(work, list(number.groups.list), sum)
plot(work, col=number.groups.list)
Regards a lot,
miltinho
brazil
On Wed, Mar 25, 2009 at 9:48 PM, Nathan S. Watson-Haigh
<nathan.watson-haigh at csiro.au> wrote:
I have some data generated as follows:
<code>
n <- 2000
work <- vector()
for(x in 1:n) {
work[x] <- sum(1:(n-x+1))
}
plot(work)
</code>
What I want to do
-----------------
I want to split work into a number of unequal chunks such that the
sum of the
values in each chunk is approximately equal.
The numbers in "work" are proportional to the amount of work to be
performed for
each value of x by a function I've written. i.e. For each value of
x, there are
work[x] * y calculations to be performed (where y is a constant).
I've written a parallel version of my function where I simply assign
z number of
x values to each slave. This is not ideal, since a slave that gets
the 1:z
smallest values of x will take longer to compute than the (n-z+1):n
set of x
values. For example, if I have 4 slaves available:
slave 1 processes x in 1:500
slave 2 processes x in 501:1000
slave 3 processes x in 1001:1500
slave 4 processes x in 1501:2000
This means the total work performed by each slave is:
slave 1 sum(work[1:500]) = 771708500
slave 2 sum(work[501:1000]) = 396458500
slave 3 sum(work[1001:1500]) = 146208500
slave 4 sum(work[1501:2000]) = 20958500
Manually plitting work into chunks where the sum of the values for
the chunks is
approximately equal, I get the following:
sum(work[1:184])
[1] 335533384
sum(work[185:415])
[1] 334897871
sum(work[416:745])
[1] 334672085
sum(work[746:2000])
[1] 330230660 I need to be able to do this automatically for any value of n and I think I should be able to do this by calculating the area under the curve and slicing it into equally sized regions, but don't really know how to get there from what I've said above! Cheers, Nathan
______________________________________________ R-help at r-project.org <mailto:R-help at r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html <http://www.r-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. - -- - -------------------------------------------------------- Dr. Nathan S. Watson-Haigh OCE Post Doctoral Fellow CSIRO Livestock Industries Queensland Bioscience Precinct St Lucia, QLD 4067 Australia Tel: +61 (0)7 3214 2922 Fax: +61 (0)7 3214 2900 Web: http://www.csiro.au/people/Nathan.Watson-Haigh.html - -------------------------------------------------------- -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAknLGacACgkQ9gTv6QYzVL5zsgCfU4sJwZtLVDsky9IgXn5JbvHy COgAnihLhkuJm5vpgVpfcJGA2lP524in =CjBV -----END PGP SIGNATURE-----