Skip to content

how to calculate centroid (or centre of gravity) of a population (count data)

5 messages · Marcelino de la Cruz, Rolf Turner, Tiago Marques +1 more

#
Dear all

I am working with count data and I want to assess whether the centre of
gravity of the population (centroid or mean latitude?) has change over
time, indicating some redistribution or shift ongoing. To simplify, let's
say that I have ca. 2000 sites censused in two consecutive years (same
sites censused both years - all sites) and the abundance (count) of the
species registered.

I first thought about doing a kernelUD (package adehabitatHR) but
apparently this only takes into account the location of the sites to
calculate the kernel and then the centroids. Thus, since I have the exact
same sites in both years, the centroids for year 1 and year 2 are the same.
In my case, what I would like to do is to calculate that centroid but
taking into account the counts, because a site that had 3 individuals in
both years can't have the same weight than a site that hosted 3000
individuals when calculating the centroids.

So, what I would like to have is the centroid (or centre of gravity) of the
counts not of the sites surveyed (which is what adehabitatHR does,a s far
as I understood).

Do you have any suggestions which package other than adehabitatXX to use
for this purpose? Or if this can be done with adehabitat?

Thank you very much for your help.

Diego
#
Hi Diego,

it seems to me that what you want to compute are weighted centroids.

Here some advice is given (it is for polygons but you can get the idea):

https://stat.ethz.ch/pipermail/r-sig-geo/2016-February/024107.html

Cheers,

Marcelino



El 20/04/2016 a las 8:45, Diego Pavon escribi?:
#
Maybe I'm being naive, or missing the point, or something.  But I would
presume that your data are something like:

x_1, x_2, ..., x_n # x-coordinates of the locations --- say, stored as x
y_1, y_2, ..., y_n # y-coordinates of the locations --- say, stored as y
k_1, k_2, ..., k_n # counts at the given locations --- say, stored as k

This is interpreted as meaning that (x_i,y_i) is the centroid of the 
i-th region, in which count k_i was obtained.

If so, can you not just calculate:

xc <- sum(k*x)/sum(k)
yc <- sum(k*y)/sum(k)

???

What am I not understanding?

cheers,

Rolf Turner
#
Hi Diego,

A perhaps dumb yet straightforward way of doing it is to replicate each 
point the number of times of its corresponding count, this will get you 
the right unbiased centroid, essentially  a weighted average as 
Marcelino suggests, assuming that the weight you use is proportional to 
the count. It's like saying that each bird contributes once to the 
centroid location, instead of each point contributing once.

Note that just by using your suggested kernelUD you don't really get the 
centroid of the points (I think, never used it myself actually), I 
suspect you simply get a kernel estimate of the bivariate distribution 
of the points in space. Going from that to an actual centroid is 
possible but non necessarily straightforward.

The answer to your specific question is much simpler than that: in 2D, 
simple calculate the means of the X and Y coordinates, multiply each 
coordinate by a weight, the count - that would be an alternative to what 
I suggested above - and there is your centroid.

Note that this gets you the right mean centroid, but not necessarily the 
right variance for that centroid estimate. Also, you might want to think 
about if that is the right weight. But those are all questions beyond 
your original question ;)

cheers

Tiago

?s 09:36 de 20/04/2016, Marcelino de la Cruz escreveu:
---
Este e-mail foi verificado em termos de v?rus pelo software antiv?rus Avast.
https://www.avast.com/antivirus
#
Thanks Marcelino, Rolf and Tiago for your quick responses.

I think that the suggested method by Rolf and Tiago, which is also found in
the link provided by Marcelino works fine with my data. Yes, Rolf, it was
as simple as that :)

So,  I multiplied the X_i and Y_i coordinates of each of the 2000 sites by
the count in each site (C_i) and then divided by the total number of counts
(sum of all counts across all sites).
Tiago had, however, interesting points that I will need to address as well.

Thank you very much for your help.

Best

Diego



2016-04-20 13:19 GMT+03:00 Tiago Marques <tiagoandremarques at gmail.com>: