Skip to content

Splitting Area under curve into equal portions

4 messages · Nathan S. Watson-Haigh, milton ruser, Daniel Nordlund

#
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I have some data generated as follows:

<code>
n <- 2000
work <- vector()
for(x in 1:n) {
  work[x] <- sum(1:(n-x+1))
}
plot(work)
</code>

What I want to do
- -----------------
I want to split work into a number of unequal chunks such that the sum of the
values in each chunk is approximately equal.



The numbers in "work" are proportional to the amount of work to be performed for
each value of x by a function I've written. i.e. For each value of x, there are
work[x] * y calculations to be performed (where y is a constant).

I've written a parallel version of my function where I simply assign z number of
x values to each slave. This is not ideal, since a slave that gets the 1:z
smallest values of x will take longer to compute than the (n-z+1):n set of x
values. For example, if I have 4 slaves available:

slave 1 processes x in 1:500
slave 2 processes x in 501:1000
slave 3 processes x in 1001:1500
slave 4 processes x in 1501:2000

This means the total work performed by each slave is:

slave 1 sum(work[1:500])     = 771708500
slave 2 sum(work[501:1000])  = 396458500
slave 3 sum(work[1001:1500]) = 146208500
slave 4 sum(work[1501:2000]) = 20958500

Manually plitting work into chunks where the sum of the values for the chunks is
approximately equal, I get the following:

sum(work[1:184])
[1] 335533384
[1] 334897871
[1] 334672085
[1] 330230660

I need to be able to do this automatically for any value of n and I think I
should be able to do this by calculating the area under the curve and slicing it
into equally sized regions, but don't really know how to get there from what
I've said above!

Cheers,
Nathan

- --
- --------------------------------------------------------
Dr. Nathan S. Watson-Haigh
OCE Post Doctoral Fellow
CSIRO Livestock Industries
Queensland Bioscience Precinct
St Lucia, QLD 4067
Australia

Tel: +61 (0)7 3214 2922
Fax: +61 (0)7 3214 2900
Web: http://www.csiro.au/people/Nathan.Watson-Haigh.html
- --------------------------------------------------------

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAknK3vkACgkQ9gTv6QYzVL68TACeI0gXqUXRr+W64iZaGe7olvov
b9IAnjVENA6rn0r5QFv+Pu/poWjydEC7
=dgnE
-----END PGP SIGNATURE-----
#
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Milton,

Not quite, that would be an equal number of data points in each colour group.
What I want is an unequal number of points in each group such that:
sum(work[group.members]) is approximately the same for each group of data points.

In the mean time, I came up with the following, and took a leaf out of your book
with the colouring for example:

<code>
n <- 2002
work <- vector()
for(x in 1:(n-2)) {
  work[x] <- ((n-1-x)*(n-x))/2
}
plot(work)

tasks <- vector('list')
tasks_per_slave <- 1
work_per_task <- sum(work) / (n_slaves * tasks_per_slave)

# Now define ranges of x of equal "work"
block_start <- 1
for(x in (1:(length(work)))) {
  if(x == length(work)) {
    # this will be the last block
    tasks[[length(tasks)+1]] <- list(x=block_start:length(work))
    break
  }
  work_in_block_to_x <- sum(work[block_start:(x)])

  if(work_in_block_to_x > work_per_task) {
    # use this value of x as the chunk end
    tasks[[length(tasks)+1]] <- list(x=block_start:x)

    # move the block_start position
    block_start <- x+1
  }
}

colours <- vector()
for(i in 1:length(tasks)) {
  colours <- append(colours,rep(i,length(tasks[[i]]$x)))
}

plot(work, col=colours)
</code>

Essentially, the area under the line for each of the coloured groups (i.e. the
total work associated with those values of x) should be approximately equal and
I believe the above code achieves this. Just found the cumsum() function. You
could look at it this way:

<code>
plot(cumsum(work), col=colours)
</code>

The coloured groupings coincide with splitting the cumulative total (y-axis)
into 4 approximately equal bits.

There must be a nicer way to do this!
Nathan
milton ruser wrote:
______________________________________________
R-help at r-project.org <mailto:R-help at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
<http://www.r-project.org/posting-guide.html>
and provide commented, minimal, self-contained, reproducible code.




- --
- --------------------------------------------------------
Dr. Nathan S. Watson-Haigh
OCE Post Doctoral Fellow
CSIRO Livestock Industries
Queensland Bioscience Precinct
St Lucia, QLD 4067
Australia

Tel: +61 (0)7 3214 2922
Fax: +61 (0)7 3214 2900
Web: http://www.csiro.au/people/Nathan.Watson-Haigh.html
- --------------------------------------------------------

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAknLGacACgkQ9gTv6QYzVL5zsgCfU4sJwZtLVDsky9IgXn5JbvHy
COgAnihLhkuJm5vpgVpfcJGA2lP524in
=CjBV
-----END PGP SIGNATURE-----
#
Nathan,

Someone will probably come up with a more elegant way, but does this help?
slice() will partition work into n groups where the sum in each group is
approximately the same.  slice() returns the index of the last element of
work[] for each group (except the last group).  The first group can be
indexed by 1:p[1]. The second by (p[1]+1):p[2] ... And the n-th group by
p[n-1]:N, where N <- length(work).

slice <- function(v, n){
  subtot <- floor(sum(v)/n)
  cumtot <- cumsum(v)
  p <- rep(0,n-1)
  for(i in 1:(n-1)) p[i] <- max(which(cumtot < (subtot*i)))
  p
  }

#to break work into ten groups
slice(work,10)


Hope this is helpful,

Dan

Daniel Nordlund
Bothell, WA USA