Skip to content
Prev 309064 / 398503 Next

How to pick colums from a ragged array?

Close - but it's evaluating on 'first date' AND 'last date' - I'll be considering groups defined by 'first diagnosis' and groups defined by 'last diagnosis' completely separately, so I need it to run considering the first date (to produce e.g. INCLUDE.FIRST), then on a separate run to consider the last date (to produce e.g. INCLUDE.LAST).

-----Original Message-----
From: arun [mailto:smartpink111 at yahoo.com]
Sent: 25 October 2012 12:32
To: Stuart Leask
Cc: R help; Petr PIKAL
Subject: Re: [r] How to pick colums from a ragged array?

Hi Stuart,

So, I guess my result (below) serves the purpose!
A.K.




----- Original Message -----
From: Stuart Leask <Stuart.Leask at nottingham.ac.uk>
To: arun <smartpink111 at yahoo.com>
Cc:
Sent: Thursday, October 25, 2012 3:13 AM
Subject: RE: [r] How to pick colums from a ragged array?

Even confusing myself now, serves me right for replying late at night!

** If DGs are the same, then the first (or last) diagnosis is unambiguous even if date is duplicated - so I can use the data.**

Consider we want INCLUDE.FIRST to look at first dates.
Duplicate dates: 167, 323,814, 841, 910 1019 AND This dup is the first date: 167, 841, 1019 AND This dup has different DGs: 841 1019 = give all rows of 841 and 1019  FALSE.
(All other rows TRUE)

Now consider we want INCLUDE.LAST to look at last dates.
Duplicate dates: 167, 323,814, 841, 910 1019 AND This dup is the last date: 167, 323, 814 AND This dup has different DGs: 323 = give all rows of 323 FALSE.
(All others TRUE)

Of course, I'm happy to run a function twice, either one with a 'first/last' switch, or one that assumes initial order of sort by DATE determines whether you end up with first or last date duplicates.



-----Original Message-----
From: arun [mailto:smartpink111 at yahoo.com]
Sent: 24 October 2012 22:59
To: Stuart Leask
Subject: Re: [r] How to pick colums from a ragged array?

Hi Stuart,
So, 167 should be FALSE eventhough DG is same because it comes under earliest/first date, but TRUE for 814 because it comes under latest/last date.  167 comes under both cases.
Let me try to make sense of that:

I am just pasting my earlier solution and its results again to see whether we are on the same page:
res1<- data.frame(flag=tapply(id.d[,2],id.d[,1],FUN=function(x) head(duplicated(x)|duplicated(x,fromLast=TRUE),1)|tail(duplicated(x)|duplicated(x,fromLast=TRUE),1)))
res2<-id.d[id.d[,1]%in%names(res1[res1$flag==TRUE,])&(duplicated(id.d[,1:2])|duplicated(id.d[,1:2],fromLast=TRUE)),]
res3<-res2[!res2$ID%in% res2[duplicated(res2)|duplicated(res2,fromLast=TRUE),]$ID,]
id.d1<-id.d
bad<-id.d1[id.d1$ID%in%res3$ID,]
bad$INCLUDE<-FALSE
res4<-merge(id.d1,bad,all=TRUE)
res4$INCLUDE[is.na(res4$INCLUDE)]<-TRUE
res4
     ID     DATE DG INCLUDE
1    58 20060821  1    TRUE
2    58 20061207  2    TRUE
3    58 20080102  1    TRUE
4    58 20090904  1    TRUE
5   167 20040205  4    TRUE
6   167 20040205  4    TRUE
7   323 20051111  3   FALSE
8   323 20060111  2   FALSE
9   323 20071119  3   FALSE
10  323 20080107  2   FALSE
11  323 20080407  1   FALSE
12  323 20080521  2   FALSE
13  323 20080521  3   FALSE
14  547 20041005  2    TRUE
15  794 20070905  1    TRUE
16  814 20020814  2    TRUE
17  814 20021125  2    TRUE
18  814 20040429  2    TRUE
19  814 20040429  2    TRUE
20  814 20071205  2    TRUE
21  814 20071205  2    TRUE
22  841 20050421  1   FALSE
23  841 20050421  2   FALSE
24  841 20060428  1   FALSE
25  841 20060602  1   FALSE
26  841 20060816  1   FALSE
27  841 20061025  1   FALSE
28  841 20061129  1   FALSE
29  841 20070112  1   FALSE
30  841 20070514  4   FALSE
31  910 19870508  3    TRUE
32  910 20040205  3    TRUE
33  910 20040205  3    TRUE
34  910 20080521  3    TRUE
35  910 20080521  4    TRUE
36  910 20091224  2    TRUE
37  999 20050503  2    TRUE
38 1019 19870508  1   FALSE
39 1019 19870508  2   FALSE
40 1019 19880330  1   FALSE
A.K.





----- Original Message -----
From: Stuart Leask <Stuart.Leask at nottingham.ac.uk>
To: arun <smartpink111 at yahoo.com>; Rui Barradas <ruipbarradas at sapo.pt>
Cc: R help <r-help at r-project.org>
Sent: Wednesday, October 24, 2012 5:40 PM
Subject: RE: [r] How to pick colums from a ragged array?

I mis-typed, missing an if. I think you've got it, but let me try again:

"The function should:
-  put FALSE in a column for every instance of an ID IF ( that ID has a first (or last) DATE duplicated ) AND IF (the DGs for the duplicated dates are different)."

So for the earliest/first date function, INCLUDE should be TRUE, apart from FALSE for _all_ the instances of IDs 167, 841 and 1019 For the latest/last date function, INCLUDE should be TRUE, apart from FALSE for all the instances of ID  323.

Stuart

-----Original Message-----
From: arun [mailto:smartpink111 at yahoo.com]
Sent: 24 October 2012 21:30
To: Rui Barradas
Cc: R help; Stuart Leask
Subject: Re: [r] How to pick colums from a ragged array?

Hi,

According to the OP "So the function should only exclude an ID, having identified a first (or last) DATE duplicate, the DGs for these two dates are different."
Rui:
By running your modified function (using dte <- tapply(x[,2], x[,1], FUN = function(x) duplicated(fun(x, 2),fromLast = TRUE))),

id.d$INCLUDE <- !(rm1 | rm2)
head(id.d)
#     ID     DATE DG INCLUDE
#1    58 20060821  1    TRUE
#2    58 20061207  2    TRUE
#3    58 20080102  1    TRUE
#4    58 20090904  1    TRUE
#5   167 20040205  4   FALSE
#6   167 20040205  4   FALSE

For #167, DGs are same.  Not sure whether to exclude it or not.


My modified solution is similar but I am excluding 167 and 814.


fun1<-function(dat){
res1first<- data.frame(flag=tapply(dat[,2],dat[,1],FUN=function(x) head(duplicated(x)|duplicated(x,fromLast=TRUE),1)))
res1last<- data.frame(flag=tapply(dat[,2],dat[,1],FUN=function(x) tail(duplicated(x)|duplicated(x,fromLast=TRUE),1)))
res2first<-dat[dat[,1]%in%names(res1first[res1first$flag==TRUE,])&(duplicated(dat[,1:2])|duplicated(dat[,1:2],fromLast=TRUE)),]
res2last<-dat[dat[,1]%in%names(res1last[res1last$flag==TRUE,])&(duplicated(dat[,1:2])|duplicated(dat[,1:2],fromLast=TRUE)),]
res3first<-res2first[!res2first$ID%in% res2first[duplicated(res2first)|duplicated(res2first,fromLast=TRUE),]$ID,]
res3last<-res2last[!res2last$ID%in% res2last[duplicated(res2last)|duplicated(res2last,fromLast=TRUE),]$ID,]
res3firstsubset<-do.call(rbind,lapply(split(res3first,res3first$ID),head,1))
res3firstsubset$INCLUDE<-FALSE
res3lastsubset<-do.call(rbind,lapply(split(res3last,res3last$ID),tail,1))
res3lastsubset$INCLUDE<-FALSE
res4<-merge(dat,merge(res3first,merge(res3firstsubset,merge(res3lastsubset,res3last,all=TRUE),all=TRUE),all=TRUE),all=TRUE)
res4$INCLUDE[is.na(res4$INCLUDE)]<-TRUE
res4
}

tail(fun1(id.d))
#     ID     DATE DG INCLUDE
#35  910 20080521  4    TRUE
#36  910 20091224  2    TRUE
#37  999 20050503  2    TRUE
#38 1019 19870508  1    TRUE
#39 1019 19870508  2   FALSE
#40 1019 19880330  1    TRUE

A.K.












----- Original Message -----
From: Rui Barradas <ruipbarradas at sapo.pt>
To: arun <smartpink111 at yahoo.com>
Cc: R help <r-help at r-project.org>; Stuart Leask <Stuart.Leask at nottingham.ac.uk>
Sent: Wednesday, October 24, 2012 2:50 PM
Subject: Re: [r] How to pick colums from a ragged array?

Hello,

Inline.
Em 24-10-2012 19:05, arun escreveu:
Why? Look at the last ID, 1019. The last of all must be included, the date doesn't repeat. And one of the first must also be included, if not we would be completely excluding that date. Or at least this is how I'm understanding the problem.

Rui Barradas