Compare data in two rows and replace objects in data frame
On Mon, Aug 4, 2014 at 4:53 AM, raz <barvazduck at gmail.com> wrote:
Dear all,
I have a data frame 144 x 20000 values.
I need to take every value in the first row and compare to the second row,
and the same for rows 3-4 and 5-6 and so on.
the output should be one line for each of the two row comparison.
the comparison is:
if row1==1 and row2==1 <-'HT'
if row1==1 and row2==0 <-'A'
if row1==0 and row2==1 <-'B'
if row1==1 and row2=='-' <-'Aht'
if row1=='-' and row2==1 <-'Bht'
for example:
if the data is:
CloneID genotype 2001 genotype 2002 genotype 2003
2471250 1 1 1
2471250 0 0 0
2433062 0 0 0
2433062 1 1 1
100021605 1 1 0
100021605 1 0 1
100005599 1 1 0
100005599 1 1 1
100002798 1 1 0
100002798 1 1 1
then the output should be:
CloneID genotype 2001 genotype 2002 genotype 2003
2471250 A A A
2433062 B B B
100021605 HT A B
100005599 HT HT B
100002798 HT HT B
I tried this for the whole data, but its so slow:
AX <- data.frame(lapply(AX, as.character), stringsAsFactors=FALSE)
for (i in seq(1,nrow(AX),by=2)){
for (j in 6:144){
if (AX[i,j]==1 & AX[i+1,j]==0){
AX[i,j]<-'A'
}
if (AX[i,j]==0 & AX[i+1,j]==1){
AX[i,j]<-'B'
}
if (AX[i,j]==1 & AX[i+1,j]==1){
AX[i,j]<-'HT'
}
if (AX[i,j]==1 & AX[i+1,j]=="-"){
AX[i,j]<-'Aht'
}
if (AX[i,j]=="-" & AX[i+1,j]==1){
AX[i,j]<-'Bht'
}
}
}
AX1<-AX[!duplicated(AX[,3]),]
AX2<-AX[duplicated(AX[,3]),]
Thanks for any help,
Raz
I don't know if you've received a solution as yet. Below is my generic solution. I don't know how fast it will be, but it does _NOT_ do any looping. It does do a few if functions. The result is in the variable new_data. The variables data_odd and data_even are temporaries which can be removed. Or you can wrap the code up in a function which returns new_data and they will simply "go away" when the function ends. # # Read in the data data <- read.csv(file="data.csv",header=TRUE,stringsAsFactors=FALSE); # # The criteria #if row1==1 and row2==1 <-'HT' #if row1==1 and row2==0 <-'A' #if row1==0 and row2==1 <-'B' #if row1==1 and row2=='-' <-'Aht' #if row1=='-' and row2==1 <-'Bht' # # The following assumes that data is properly ordered! data$rowNumber <- seq(1:nrow(data)); data_odd <-data[data$rowNumber %% 2 == 1,]; data_even <-data[data$rowNumber %% 2 == 0,]; # # You really need to make sure that # the CloneID values are correct in data_odd # and data_even. Something like: stopifnot(data_odd$CloneID == data_even$CloneID); CloneIDs <- data_even[,1]; # Get the list of CloneIDs #data_even[,1] <- NULL; # Remove CloneIDs from even data #data_odd[,1] <- NULL; # And also from odd data # # Initialize new_data - make everything NA so # it will stick out later! new_data <- data_even; new_data[,colnames(data_even)] <- NA; # new_data[data_odd == 1 & data_odd ==1] <- 'HT'; new_data[data_odd == 1 & data_even == 0] <- 'A'; new_data[data_odd == 0 & data_even == 1] <- 'B'; new_data[data_odd == 1 & data_even == '.'] <- 'Aht'; new_data[data_odd == '-' & data_even == 1] <- 'Bht'; new_data$CloneID <- CloneIDs; new_data$rowNumber<-NULL; # #stopifnot( !is.na(new_data)); # Make sure no NAs left
There is nothing more pleasant than traveling and meeting new people! Genghis Khan Maranatha! <>< John McKown