Skip to content

Compute the Gini coefficient

3 messages · Marine Regis, Erich Neuwirth, Achim Zeileis

#
Hello,

I would like to build a Lorenz curve and calculate a Gini coefficient in order to find how much parasites does the top 20% most infected hosts support.

Here is my data set:

Number of parasites per host:
parasites = c(0,1,2,3,4,5,6,7,8,9,10)

Number of hosts associated with each number of parasites given above:
hosts = c(18,20,28,19,16,10,3,1,0,0,0)

To represent the Lorenz curve:
I manually calculated the cumulative percentage of parasites and hosts:

cumul_parasites <- cumsum(parasites)/max(cumsum(parasites))
cumul_hosts <- cumsum(hosts)/max(cumsum(hosts))
plot(cumul_hosts, cumul_parasites, type= "l")
Thank you very much for your help.
Have a nice day
Marine
#
Your values in hosts are frequencies. So you need to calculate

cumul_hosts = cumsum(hosts)/sum(hosts)
cumul_parasites = cumsum(hosts*parasites)/sum(parasites)

The Lorenz curves starts at (0,0), so to draw it, you need to extend these vectors

cumul_hosts = c(0,cumul_hosts)
cumul_parasites = c(0,cumul_parasites)

plot(cumul_hosts,cum9l_parasites,type=?l?)


The Gini coefficient can be calculated as
library(reldist)
gini(parasites,hosts)


If you want to check, you can ?recreate? the original data (number of parasited for each host) with

num_parasites = rep(parasites,hosts)

and
gini(num_parasites)

will also give you the Gini coefficient you want.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 670 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20160330/84c0177d/attachment.bin>
#
On Wed, 30 Mar 2016, Erich Neuwirth wrote:

            
That's what I thought as well but Marine explicitly said that the 'host' 
are _not_ weights. Hence I was confused what this would actually mean.

Using the "ineq" package you can also do
plot(Lc(parasites, hosts))