Skip to content
Prev 326263 / 398502 Next

Recoding variables based on reference values in data frame

Hello,

I'm not sure I understood, but try the following.


Kgeno <- read.table(text = "
SNP_ID SNP1 SNP2 SNP3 SNP4
Maj_Allele C G  C  A
Min_Allele T A T G
ID1 CC     GG     CT     AA
ID2 CC     GG     CC AA
ID3 CC     GG    nc    AA
ID4 _ _ _ _
ID5 CC     GG     CC     AA
ID6 CC     GG     CC     AA
ID7 CC     GG     CT     AA
ID8 _ _ _ _
ID9 CT     GG     CC AG
ID10 CC     GG     CC     AA
ID11 CC     GG     CT     AA
ID12 _ _ _ _
ID13 CC     GG     CC     AA
", header = TRUE, stringsAsFactors = FALSE)

dat

fun <- function(x){
	x[x %in% c("nc", "_")] <- NA
	MM <- paste0(x[1], x[1])  # Major Major
	Mm <- paste0(x[1], x[2])  # Major minor
	mm <- paste0(x[2], x[2])  # minor minor
	x[x == MM] <- 0
	x[x == Mm] <- 1
	x[x == mm] <- 2
	x
}

Kgeno[, -1] <- sapply(Kgeno[, -1], fun)
Kgeno


Also, the best way to post data is by using ?dput.

dput(head(Kgeno[, 1:5], 30))  # post the output of this


Hope this helps,

Rui Barradas

Em 02-07-2013 21:46, kathleen askland escreveu: