Deleting columns where the frequency of values are too disparate - R-help

Josh B · 2009-01-19T04:06:04Z

An embedded and charset-unspecified text was scrubbed... Name: not available URL:

Mon, Jan 19, 2009 4:13 AM #

eye.")

Most of the problem is just organising the data into a sensible form.

# read in data
data <- readLines(tc <- textConnection("1AGTAAC
2AGGACC
3ACGGCC
4ACGGCC
5AGGGAC
6TGGGCC
7TCGGCC
8TCGGAC
9TGGGCC
10TCGGCC
11AGGGAC
12ACGGCC
13AGGGCC
14AGGGAC
15ACGGCC
16TCGGCC
17TGGGAC
18TGGGCC
19TGGGCC
20TCGGAC")); close(tc)

# retrieve the useful bit
loci <- sub("[[:digit:]]{1,2}", "", data)

# you may also want this
ID <- grep("[[:digit:]]{1,2}", data)

# find out how many of each base occurs at each locus
freqs <- list()
n <- length(ID)
for(i in 1:6)
{
   assign(paste("locus", i, sep=""), factor(substring(loci,i,i), 
levels=c("A","C","G","T")))
   freqs[[i]] <- summary(get(paste("locus", i, sep=""))) 
}
freqs

# remove loci with 90% or more cases of same base
loci.to.remove <- sapply(freqs, function(x) any(x>0.9*n))

Regards,
Richie.

Mathematical Sciences Unit
HSL


------------------------------------------------------------------------
ATTENTION:

This message contains privileged and confidential inform...{{dropped:20}}