Hierfstat: g-statistics permutation
Hi Using some of the permutation functions in hierfstat (test.g; test.between; gstat.randtest), I've encountered the same issue raised in this recent post: https://stat.ethz.ch/pipermail/r-sig-genetics/2016-August/000099.html In my case (6 pops x 8ind; ~8000 loci), testing for significant differentiation among pops with small numbers of loci (e.g. 20) appears normal, but as number of loci increase >100 the permuted g-statistics mostly converge on a single value or 0. This appears related to missing data. Demonstrated below on small dataset. library(hierfstat) #Simulate 100 loci dat.sim <- sim.genot(size=8,nbal=2,nbloc=100,nbpop=6,N=1000,mig=0.001,mut=0.0001,f=0) #perform permutation test with no missing data g <- test.g(dat.sim[,-1], level = dat.sim$Pop, nperm = 100) #are all 100 permuted g-statistics are unique? yes length(unique(g$g.star)) #add one missing genotype per locus. Total missing data = 1% dat.sim[,-1] <- apply(dat.sim[,-1], 2, function(x){x[sample(1:48,1)] <- NA; x}) #how many permuted g-statistics are unique? generally less than 20. length(unique(g$g.star)) Is there any way to use these functions to calculate p-values when missing data is present? The functions work fine when I remove loci with missing data, so it's not the end of the world. Thanks for any advice. Regards, Dan
Dr. Daniel J. Schmidt Research Fellow, Australian Rivers Institute Griffith University 170 Kessels Road, Nathan Brisbane QLD 4111 Australia d.schmidt at griffith.edu.au Office: +61 7 37354165 http://www.rivers.edu.au [[alternative HTML version deleted]]