Skip to content

Gender balance in R

11 messages · Sarah Goslee, Rainer M Krug, Scott Kostyshak +4 more

#
Hi there,

I can't help to notice that the gender balance among R developers and 
ordinary members is extremely skewed (as it is with open source software 
in general).

Have a look at http://www.r-project.org/foundation/memberlist.html - at 
most a handful of women are listed among the 'supporting members', and 
none at all among the 29 'ordinary members'.

On the other hand I personally know many happy R users of both genders.

My questions are thus: Should R developers (and users) be worried that 
the 'other half' is excluded? If so, how could female R users/developers 
be persuaded to become more visible (e.g. added as supporting or 
ordinary members)?

Thanks,

Maarten
#
I took a look at apparent gender among list participants a few years ago:
https://stat.ethz.ch/pipermail/r-help/2011-June/280272.html

Same general thing: very few regular participants on the list were
women. I don't see any sign that that has changed in the last three
years. The bar to participation in the R-help list is much, much lower
than that to become a developer.

It would be interesting to look at the stats for CRAN packages as well.

The very low percentage of regular female participants is one of the
things that keeps me active on this list: to demonstrate that it's not
only men who use R and participate in the community.

(If you decide to do the stats for 2014, be aware that I've been out
on medical leave for the past two months, so the numbers are even
lower than usual.)

Sarah

On Mon, Nov 24, 2014 at 10:10 AM, Maarten Blaauw
<maarten.blaauw at qub.ac.uk> wrote:
#
Sarah Goslee <sarah.goslee at gmail.com> writes:
Apart from that, your input is very valuable and your answers very
hands-on helpful - and this is why I am glad that you are on the list -
and not because you are female.

Looking at R developers / CRAN package developers / list posts gender ratios might be
interesting, but I don't think it tells you anything: If there is a
skewed ratio in any of these, the question is if this is the gender
ratio in the user base and, more importantly, in the pool of potential
users.

I have no idea about the gender ratios in potential users, but I would
guess that some disciplines already have a skewed gender ratio, which is
then reflected in R.

The gender ratio in R should reflect the gender ratio of the potential
users, as this is the pool the R users / developers are coming from.

As long as nobody is excluded because of their gender, background, hair
or eye color, OS usage, or whatever ridiculous excuse one could find, I
think R will thrive.
Don't get me wring - nothing against promoting R to new user groups.

But anyway - interesting question.

I was teaching True Basic for several years, and I definitely did not
see a gender bias in their programming abilities - the differences was
in many cases that males thought they could do it, and females thought
they could not do it because it involves maths... But I was able to
prove quite a few wrong.

Cheers,

Rainer

  
    
#
Thanks for the responses so far.

 > The gender ratio in R should reflect the gender ratio of the potential
 > users, as this is the pool the R users / developers are coming from.

I agree with this, but then again I don't think R really has 0% female 
users/developers as the R member list suggests. I'd rather expect to see 
10-50% women (my quick guess of gender balance in STEM areas, depending 
on where on the ladder and in which country one samples). Perhaps the R 
community should be assessing if there's some additional bias applied 
during the selection of supporting or ordinary members?

Cheers,

Maarten
On 25/11/14 09:15, Rainer M Krug wrote:

  
    
#
On Mon, Nov 24, 2014 at 12:34 PM, Sarah Goslee <sarah.goslee at gmail.com> wrote:
I plotted the gender of posters on r-help over time. The plot is here:
https://twitter.com/scottkosty/status/449933971644633088

The code to reproduce that plot is here:
https://github.com/scottkosty/genderAnalysis
The R file there will call devtools::install_github to install a
package from Github used for guessing the gender based on the first
name (https://github.com/scottkosty/gender).

Note also on that tweet that Gabriela de Queiroz posted it, who is the
founder of R-ladies; and that David Smith showed interest in
discussing the topic. So there is definitely demand for some data
analysis and discussion on the topic.
Thank you for that!

Scott


--
Scott Kostyshak
Economics PhD Candidate
Princeton University
#
Nice graph, Scott, thanks!

Based on your code I plotted not the absolute numbers but the ratios, 
which show slowly increasing relative participation of female Rhelpers 
over time (red = women, blue=men, black=unknown). After a c. 5% female 
contribution in 1998, this has grown to about 15% now. At this rate 
we'll reach parity around AD 2080.

My code:

if (!require(gender)) {
library(devtools)
install_github("scottkosty/gender")
library(gender)
}
rHelp <- rHelpNames
rHelp[is.na(rHelp$gender), "gender"] <- "unknown"

yr <- unique(rHelp$year)

helpers <- list(dates, M=rep(0, length(yr)), F=rep(0, length(yr)), 
unkn=rep(0, length(yr)))

for(i in 1:nrow(rHelp))
  {
   j <- which(yr == rHelp$year[i])
   gender <- rHelp$gender[i]
   if(gender == "M")
    helpers$M[[j]] <- helpers$M[[j]]+1 else
     if(gender == "F")
      helpers$F[[j]] <- helpers$F[[j]]+1 else
       if(gender == "unknown")
        helpers$unkn[[j]] <- helpers$unkn[[j]]+1
  }
plot(yr, helpers$M / (helpers$M+helpers$F+helpers$unkn), type="l", 
col=4, ylim=c(0,1), ylab="proportions", yaxs="i")
lines(yr, helpers$F / (helpers$M+helpers$F+helpers$unkn), col=2) 

lines(yr, helpers$unkn / (helpers$M+helpers$F+helpers$unkn))

Cheers,

Maarten
On 25/11/14 12:11, Scott Kostyshak wrote:

  
    
#
On 24 Nov 2014, at 18:34 , Sarah Goslee <sarah.goslee at gmail.com> wrote:

            
...and very welcome back!!! (I did notice the chronicles on your blog).

Re. the gender issue, it is certainly not that women aren't welcome, it's more that they aren't there. There are various potential reasons that come to mind, but it easily ends up in speculation and stereotyping. 

It is a bit of an embarrasment and people are discussing what to do about it, but some of the countermeasures have a tendency to backfire, so we need to be a little careful. 

- Peter D.

  
    
#
I just saw this comment and I agree with Peter. I have occasion to ask
questions and get help on the R forum but I am not a programmer and use
programs as I need them and I suppose I must comment more often. :)
On 11/25/14, 11:28 AM, "peter dalgaard" <pdalgd at gmail.com> wrote:

            
#
On 11/25/2014 04:11 AM, Scott Kostyshak wrote:
It would be great to include in your package the script that scraped author 
names from R-help archives (I guess that's what you did?). Presumably it easily 
applies to other mailing lists hosted at the same location (R-devel, further 
along the ladder from user to developer, and Bioconductor / Bioc-devel, in a 
different domain and perhaps confounded with a different 'feel' to the list). 
Also the R community is definitely international, so finding more versatile 
gender-assignment approaches seems important.

it might be interesting to ask about participation in mailing list forums versus 
other, and in particular the recent Bioconductor transition from mailing list to 
'StackOverflow' style support forum (https://support.bioconductor.org) -- on the 
one hand the 'gamification' elements might seem to only entrench male 
participation, while on the other we have already seen increased (quantifiable) 
and broader (subjective) participation from the Bioconductor community. I'd be 
happy to make support site usage data available, and am interested in 
collaborating in an academically well-founded analysis of this data; any 
interested parties please feel free to contact me off-list.

Martin Morgan
Bioconductor

  
    
#
On Tue, Nov 25, 2014 at 8:24 AM, Maarten Blaauw
<maarten.blaauw at qub.ac.uk> wrote:
Interesting forecasts Maarten! Let's hope for a trend break to make them wrong.

Scott


--
Scott Kostyshak
Economics PhD Candidate
Princeton University
#
On Tue, Nov 25, 2014 at 1:15 PM, Martin Morgan <mtmorgan at fredhutch.org> wrote:
I just put the script up on https://github.com/scottkosty/genderAnalysis
I don't have much time at the moment to generalize it, but a pull
request is always welcome. Alternatively, anyone is welcome (at least
as far as I'm concerned) to take the script and modify it for any
purpose.
I would be interested in collaborating on such a project in the future also.

Scott


--
Scott Kostyshak
Economics PhD Candidate
Princeton University