Skip to content
Prev 308940 / 398503 Next

How to pick colums from a ragged array?

Hi,

?According to the OP "So the function should only exclude an ID, having identified a first (or last) DATE duplicate, the DGs for these two dates are different."
Rui:
By running your modified function (using dte <- tapply(x[,2], x[,1], FUN = function(x) duplicated(fun(x, 2),fromLast = TRUE))), 

?id.d$INCLUDE <- !(rm1 | rm2)
?head(id.d)
#???? ID???? DATE DG INCLUDE
#1??? 58 20060821? 1??? TRUE
#2??? 58 20061207? 2??? TRUE
#3??? 58 20080102? 1??? TRUE
#4??? 58 20090904? 1??? TRUE
#5?? 167 20040205? 4?? FALSE
#6?? 167 20040205? 4?? FALSE

For #167, DGs are same.? Not sure whether to exclude it or not.


My modified solution is similar but I am excluding 167 and 814.


fun1<-function(dat){
res1first<- data.frame(flag=tapply(dat[,2],dat[,1],FUN=function(x) head(duplicated(x)|duplicated(x,fromLast=TRUE),1)))
?res1last<- data.frame(flag=tapply(dat[,2],dat[,1],FUN=function(x) tail(duplicated(x)|duplicated(x,fromLast=TRUE),1)))
res2first<-dat[dat[,1]%in%names(res1first[res1first$flag==TRUE,])&(duplicated(dat[,1:2])|duplicated(dat[,1:2],fromLast=TRUE)),]
res2last<-dat[dat[,1]%in%names(res1last[res1last$flag==TRUE,])&(duplicated(dat[,1:2])|duplicated(dat[,1:2],fromLast=TRUE)),]
res3first<-res2first[!res2first$ID%in% res2first[duplicated(res2first)|duplicated(res2first,fromLast=TRUE),]$ID,]
res3last<-res2last[!res2last$ID%in% res2last[duplicated(res2last)|duplicated(res2last,fromLast=TRUE),]$ID,]
res3firstsubset<-do.call(rbind,lapply(split(res3first,res3first$ID),head,1))
res3firstsubset$INCLUDE<-FALSE
res3lastsubset<-do.call(rbind,lapply(split(res3last,res3last$ID),tail,1))
res3lastsubset$INCLUDE<-FALSE
?res4<-merge(dat,merge(res3first,merge(res3firstsubset,merge(res3lastsubset,res3last,all=TRUE),all=TRUE),all=TRUE),all=TRUE)
?res4$INCLUDE[is.na(res4$INCLUDE)]<-TRUE
res4
}

tail(fun1(id.d))
#???? ID???? DATE DG INCLUDE
#35? 910 20080521? 4??? TRUE
#36? 910 20091224? 2??? TRUE
#37? 999 20050503? 2??? TRUE
#38 1019 19870508? 1??? TRUE
#39 1019 19870508? 2?? FALSE
#40 1019 19880330? 1??? TRUE

A.K.












----- Original Message -----
From: Rui Barradas <ruipbarradas at sapo.pt>
To: arun <smartpink111 at yahoo.com>
Cc: R help <r-help at r-project.org>; Stuart Leask <Stuart.Leask at nottingham.ac.uk>
Sent: Wednesday, October 24, 2012 2:50 PM
Subject: Re: [r] How to pick colums from a ragged array?

Hello,

Inline.
Em 24-10-2012 19:05, arun escreveu:
Why? Look at the last ID, 1019. The last of all must be included, the 
date doesn't repeat. And one of the first must also be included, if not 
we would be completely excluding that date. Or at least this is how I'm 
understanding the problem.

Rui Barradas