Skip to content
Prev 308738 / 398503 Next

How to pick colums from a ragged array?

Thanks Rui - your initial, very elegant suggestion, has spurred me on!

1. As you noticed, my example data had no examples of duplicate first dates (DOH!) 
I have corrected this, and added a test - an ID that has a duplicate which is not the earliest DATE, but is the same DATE an earliest/duplicate for another ID.

2. Your suggestion gave me all the duplicates:

how.many  <-  ave ( id.d [ ,1], id.d [,1], id.d [,2], FUN = length)
nd.b<- id.d [ how.many  > 1,  ]

3. I can then simply make a table of earliest DATEs by ID, and then see which DATEs in this table are shared:

earliest <- tapply ( DATE, ID, min)                
rownames(earliest[earliest%in%nd.b])   

This seems to work - and it does seem exclude IDs which have a duplicate date which is the same as a minimum date for another ID.
I'm trying to work out why!


Many, many thanks for the gift of that function. I will compare the two approaches (and assume that mine is flawed!).


Stuart


************************************************

ID <- c(58,58,58,58,167,167,323,323,323,323,323,323,323
,547,794,814,814,814,814,814,814,841,841,841,841,841
,841,841,841,841,910,910,910,910,910,910,999,1019,1019
,1019)

DATE <- 
 c(20060821,20061207,20080102,20090904,20040205,20040205,20051111
 ,20060111,20071119,20080107,20080407,20080521,20080711,20041005
 ,20070905,20020814,20021125,20040429,20040429,20071205,20080227
 ,20050421,20050421,20060428,20060602,20060816,20061025,20061129
 ,20070112,20070514, 19870409,19870508,19870508, 20091120,20091210
 ,20091224,20050503,19870508,19870508,19880330)

 id.d <- cbind (ID,DATE )

how.many <- ave(id.d[,1], id.d[,1], id.d[,2], FUN = length)
nd.b<- id.d[how.many > 1, ]

earliest <- tapply  ( DATE, ID, min)                    # table of earliest DATEs
rownames (earliest [earliest %in% nd.b ] )   # IDs of duplicates at the earliest date for that individual. I think...




******************************************************************



-----Original Message-----
From: Rui Barradas [mailto:ruipbarradas at sapo.pt] 
Sent: 23 October 2012 12:21
To: Stuart Leask
Cc: r-help at r-project.org
Subject: Re: [R] [r] How to pick colums from a ragged array?

Hello,

Thinking again, if you just want the first/last in each ID that repeats the DATE, the following function does the job. Since there were no such cases in your data example, I've added 3 rows to the dataset.

ID <- c(58,58,58,58,167,167,323,323,323,323,323,323,323
,547,794,814,814,814,814,814,814,841,841,841,841,841
,841,841,841,841,910,910,910,910,910,910,910,910,999,1019,1019
,1019,1019)

DATE <- c(20060821,20061207,20080102,20090904,20040205,20040323,20051111
,20060111,20071119,20080107,20080407,20080521,20080711,20041005
,20070905,20020814,20021125,20040429,20040429,20071205,20080227
,20050421,20060130,20060428,20060602,20060816,20061025,20061129
,20070112,20070514,20091105,20091105,20091117,20091119,20091120,20091210
,20091224,20091224,20050503,19870508,19880223,19880330,19880330)

id.d <- cbind(ID, DATE)


getRepeat <- function(x, first = TRUE){
     fun <- if(first) head else tail
     sp <- split(data.frame(x), x[,1])
     first.date <- tapply(x[,2], x[,1], FUN = fun, 1)
     lst <- lapply(seq_along(sp), function(j) sp[[j]][,2] == first.date[j])
     n <- unlist(lapply(lst, sum))
     sp1 <- sp[n > 1]
     i1 <- lst[n > 1]
     lapply(seq_along(sp1), function(j) sp1[[j]][i1[[j]], ]) }

getRepeat(id.d)  # defaults to first = TRUE getRepeat(id.d, first = FALSE)  # to get the last ones


Hope this helps,

Rui Barradas


Em 23-10-2012 10:59, Rui Barradas escreveu: