How to pick colums from a ragged array?

Close - but it's evaluating on 'first date' AND 'last date' - I'll be considering groups defined by 'first diagnosis' and groups defined by 'last diagnosis' completely separately, so I need it to run considering the first date (to produce e.g. INCLUDE.FIRST), then on a separate run to consider the last date (to produce e.g. INCLUDE.LAST).

-----Original Message-----
From: arun [mailto:smartpink111 at yahoo.com]
Sent: 25 October 2012 12:32
To: Stuart Leask
Cc: R help; Petr PIKAL
Subject: Re: [r] How to pick colums from a ragged array?

Hi Stuart,

So, I guess my result (below) serves the purpose!
A.K.

----- Original Message -----
From: Stuart Leask <Stuart.Leask at nottingham.ac.uk>
To: arun <smartpink111 at yahoo.com>
Cc:
Sent: Thursday, October 25, 2012 3:13 AM
Subject: RE: [r] How to pick colums from a ragged array?

Even confusing myself now, serves me right for replying late at night!

** If DGs are the same, then the first (or last) diagnosis is unambiguous even if date is duplicated - so I can use the data.**

Consider we want INCLUDE.FIRST to look at first dates.
Duplicate dates: 167, 323,814, 841, 910 1019 AND This dup is the first date: 167, 841, 1019 AND This dup has different DGs: 841 1019 = give all rows of 841 and 1019  FALSE.
(All other rows TRUE)

Now consider we want INCLUDE.LAST to look at last dates.
Duplicate dates: 167, 323,814, 841, 910 1019 AND This dup is the last date: 167, 323, 814 AND This dup has different DGs: 323 = give all rows of 323 FALSE.
(All others TRUE)

Of course, I'm happy to run a function twice, either one with a 'first/last' switch, or one that assumes initial order of sort by DATE determines whether you end up with first or last date duplicates.

-----Original Message-----
From: arun [mailto:smartpink111 at yahoo.com]
Sent: 24 October 2012 22:59
To: Stuart Leask
Subject: Re: [r] How to pick colums from a ragged array?

Hi Stuart,
So, 167 should be FALSE eventhough DG is same because it comes under earliest/first date, but TRUE for 814 because it comes under latest/last date.  167 comes under both cases.
Let me try to make sense of that:

I am just pasting my earlier solution and its results again to see whether we are on the same page:
res1<- data.frame(flag=tapply(id.d[,2],id.d[,1],FUN=function(x) head(duplicated(x)|duplicated(x,fromLast=TRUE),1)|tail(duplicated(x)|duplicated(x,fromLast=TRUE),1)))
res2<-id.d[id.d[,1]%in%names(res1[res1$flag==TRUE,])&(duplicated(id.d[,1:2])|duplicated(id.d[,1:2],fromLast=TRUE)),]
res3<-res2[!res2$ID%in% res2[duplicated(res2)|duplicated(res2,fromLast=TRUE),]$ID,]
id.d1<-id.d
bad<-id.d1[id.d1$ID%in%res3$ID,]
bad$INCLUDE<-FALSE
res4<-merge(id.d1,bad,all=TRUE)
res4$INCLUDE[is.na(res4$INCLUDE)]<-TRUE
res4
     ID     DATE DG INCLUDE
1    58 20060821  1    TRUE
2    58 20061207  2    TRUE
3    58 20080102  1    TRUE
4    58 20090904  1    TRUE
5   167 20040205  4    TRUE
6   167 20040205  4    TRUE
7   323 20051111  3   FALSE
8   323 20060111  2   FALSE
9   323 20071119  3   FALSE
10  323 20080107  2   FALSE
11  323 20080407  1   FALSE
12  323 20080521  2   FALSE
13  323 20080521  3   FALSE
14  547 20041005  2    TRUE
15  794 20070905  1    TRUE
16  814 20020814  2    TRUE
17  814 20021125  2    TRUE
18  814 20040429  2    TRUE
19  814 20040429  2    TRUE
20  814 20071205  2    TRUE
21  814 20071205  2    TRUE
22  841 20050421  1   FALSE
23  841 20050421  2   FALSE
24  841 20060428  1   FALSE
25  841 20060602  1   FALSE
26  841 20060816  1   FALSE
27  841 20061025  1   FALSE
28  841 20061129  1   FALSE
29  841 20070112  1   FALSE
30  841 20070514  4   FALSE
31  910 19870508  3    TRUE
32  910 20040205  3    TRUE
33  910 20040205  3    TRUE
34  910 20080521  3    TRUE
35  910 20080521  4    TRUE
36  910 20091224  2    TRUE
37  999 20050503  2    TRUE
38 1019 19870508  1   FALSE
39 1019 19870508  2   FALSE
40 1019 19880330  1   FALSE
A.K.

----- Original Message -----
From: Stuart Leask <Stuart.Leask at nottingham.ac.uk>
To: arun <smartpink111 at yahoo.com>; Rui Barradas <ruipbarradas at sapo.pt>
Cc: R help <r-help at r-project.org>
Sent: Wednesday, October 24, 2012 5:40 PM
Subject: RE: [r] How to pick colums from a ragged array?

I mis-typed, missing an if. I think you've got it, but let me try again:

"The function should:
-  put FALSE in a column for every instance of an ID IF ( that ID has a first (or last) DATE duplicated ) AND IF (the DGs for the duplicated dates are different)."

So for the earliest/first date function, INCLUDE should be TRUE, apart from FALSE for _all_ the instances of IDs 167, 841 and 1019 For the latest/last date function, INCLUDE should be TRUE, apart from FALSE for all the instances of ID  323.

Stuart

-----Original Message-----
From: arun [mailto:smartpink111 at yahoo.com]
Sent: 24 October 2012 21:30
To: Rui Barradas
Cc: R help; Stuart Leask
Subject: Re: [r] How to pick colums from a ragged array?

Hi,

According to the OP "So the function should only exclude an ID, having identified a first (or last) DATE duplicate, the DGs for these two dates are different."
Rui:
By running your modified function (using dte <- tapply(x[,2], x[,1], FUN = function(x) duplicated(fun(x, 2),fromLast = TRUE))),

id.d$INCLUDE <- !(rm1 | rm2)
head(id.d)
#     ID     DATE DG INCLUDE
#1    58 20060821  1    TRUE
#2    58 20061207  2    TRUE
#3    58 20080102  1    TRUE
#4    58 20090904  1    TRUE
#5   167 20040205  4   FALSE
#6   167 20040205  4   FALSE

For #167, DGs are same.  Not sure whether to exclude it or not.

My modified solution is similar but I am excluding 167 and 814.

fun1<-function(dat){
res1first<- data.frame(flag=tapply(dat[,2],dat[,1],FUN=function(x) head(duplicated(x)|duplicated(x,fromLast=TRUE),1)))
res1last<- data.frame(flag=tapply(dat[,2],dat[,1],FUN=function(x) tail(duplicated(x)|duplicated(x,fromLast=TRUE),1)))
res2first<-dat[dat[,1]%in%names(res1first[res1first$flag==TRUE,])&(duplicated(dat[,1:2])|duplicated(dat[,1:2],fromLast=TRUE)),]
res2last<-dat[dat[,1]%in%names(res1last[res1last$flag==TRUE,])&(duplicated(dat[,1:2])|duplicated(dat[,1:2],fromLast=TRUE)),]
res3first<-res2first[!res2first$ID%in% res2first[duplicated(res2first)|duplicated(res2first,fromLast=TRUE),]$ID,]
res3last<-res2last[!res2last$ID%in% res2last[duplicated(res2last)|duplicated(res2last,fromLast=TRUE),]$ID,]
res3firstsubset<-do.call(rbind,lapply(split(res3first,res3first$ID),head,1))
res3firstsubset$INCLUDE<-FALSE
res3lastsubset<-do.call(rbind,lapply(split(res3last,res3last$ID),tail,1))
res3lastsubset$INCLUDE<-FALSE
res4<-merge(dat,merge(res3first,merge(res3firstsubset,merge(res3lastsubset,res3last,all=TRUE),all=TRUE),all=TRUE),all=TRUE)
res4$INCLUDE[is.na(res4$INCLUDE)]<-TRUE
res4
}

tail(fun1(id.d))
#     ID     DATE DG INCLUDE
#35  910 20080521  4    TRUE
#36  910 20091224  2    TRUE
#37  999 20050503  2    TRUE
#38 1019 19870508  1    TRUE
#39 1019 19870508  2   FALSE
#40 1019 19880330  1    TRUE

A.K.

----- Original Message -----
From: Rui Barradas <ruipbarradas at sapo.pt>
To: arun <smartpink111 at yahoo.com>
Cc: R help <r-help at r-project.org>; Stuart Leask <Stuart.Leask at nottingham.ac.uk>
Sent: Wednesday, October 24, 2012 2:50 PM
Subject: Re: [r] How to pick colums from a ragged array?

Hello,

Inline.
Em 24-10-2012 19:05, arun escreveu:

How to pick colums from a ragged array?

Thread (4 messages)