Skip to content

How to pick colums from a ragged array?

3 messages · arun, root, Rui Barradas

#
Hi,
Also one more thing:
This should get the dates which are duplicated.? In my first reply, I was looking for the duplicated rows. Sorry for that! 

id.d<-data.frame(ID,DATE)

new1<-id.d[duplicated(id.d$DATE)|duplicated(id.d$DATE,fromLast=TRUE),]


new2<-new1[order(new1$ID,new1$DATE),]
?tapply(new2$ID,new2$DATE,head,1)
#19870508 20040205 20040429 20050421 
? # ? 910????? 167????? 814????? 841 

But, still the result is not that you wanted, because 910's date is the earliest date when compared to 1019.
new1[order(new1$ID,new1$DATE),]
#???? ID???? DATE
#5?? 167 20040205
#6?? 167 20040205
#18? 814 20040429
#19? 814 20040429
#22? 841 20050421
#23? 841 20050421
#31? 910 19870508
#32? 910 20040205
#33? 910 20040205
#38 1019 19870508
#39 1019 19870508

A.K.

----- Original Message -----
From: Stuart Leask <Stuart.Leask at nottingham.ac.uk>
To: arun <smartpink111 at yahoo.com>
Cc: Petr PIKAL <petr.pikal at precheza.cz>
Sent: Tuesday, October 23, 2012 9:15 AM
Subject: RE: [R] [r] How to pick colums from a ragged array?

Sorry Arun, but when I run it I get an error:
+ ,547,794,814,814,814,814,814,814,841,841,841,841,841
+ ,841,841,841,841,910,910,910,910,910,910,999,1019,1019
+ ,1019)
+? c(20060821,20061207,20080102,20090904,20040205,20040205,20051111
+? ,20060111,20071119,20080107,20080407,20080521,20080711,20041005
+? ,20070905,20020814,20021125,20040429,20040429,20071205,20080227
+? ,20050421,20050421,20060428,20060602,20060816,20061025,20061129
+? ,20070112,20070514, 19870508,20040205,20040205, 20091120,20091210
+? ,20091224,20050503,19870508,19870508,19880330)
Error in new1$DATE : $ operator is invalid for atomic vectors




-----Original Message-----
From: arun [mailto:smartpink111 at yahoo.com] 
Sent: 23 October 2012 14:05
To: Stuart Leask
Cc: R help; Petr PIKAL
Subject: Re: [R] [r] How to pick colums from a ragged array?

HI,
I was not following the thread.
May be this is what you are looking for:
new1<-id.d[duplicated(id.d)|duplicated(id.d,fromLast=TRUE),]


tapply(new1$ID,new1$DATE,head,1)
#19870508 20040205 20040429 20050421
? #? 1019????? 167????? 814????? 841
A.K.




----- Original Message -----
From: Stuart Leask <Stuart.Leask at nottingham.ac.uk>
To: PIKAL Petr <petr.pikal at precheza.cz>; "r-help at r-project.org" <r-help at r-project.org>
Cc: 
Sent: Tuesday, October 23, 2012 8:28 AM
Subject: Re: [R] [r] How to pick colums from a ragged array?

Hi there.

Not sure I follow what you are doing.

I want a list of all the IDs that have duplicate DATE entries, only when the DATE is the earliest (or last) date for that ID.

I have refined my test dataset, to include some tests (e.g. 910 has the same dup as 1019, but for 910 it's not the earliest date):


ID <- c(58,58,58,58,167,167,323,323,323,323,323,323,323
,547,794,814,814,814,814,814,814,841,841,841,841,841
,841,841,841,841,910,910,910,910,910,910,999,1019,1019
,1019)

DATE <-
c(20060821,20061207,20080102,20090904,20040205,20040205,20051111
,20060111,20071119,20080107,20080407,20080521,20080711,20041005
,20070905,20020814,20021125,20040429,20040429,20071205,20080227
,20050421,20050421,20060428,20060602,20060816,20061025,20061129
,20070112,20070514, 19870508,20040205,20040205, 20091120,20091210
,20091224,20050503,19870508,19870508,19880330)

Correct output: 
"167"? "841"? "1019"

Stuart

-----Original Message-----
From: PIKAL Petr [mailto:petr.pikal at precheza.cz]
Sent: 23 October 2012 13:15
To: Stuart Leask; r-help at r-project.org
Subject: RE: [r] How to pick colums from a ragged array?

Hi

Rui's answer brought me to more elaborated solution which still needs data frame to be ordered by date

fff<-function(data, first=TRUE, remove=FALSE) {

testfirst <- function(x) x[1,2]==x[2,2]
testlast <- function(x) x[length(x),2]==x[length(x)-1,2]

if(first) sel <- as.numeric(names(which(sapply(split(data, data[,1]), testfirst)))) else sel <- as.numeric(names(which(sapply(split(data, data[,1]), testlast))))

if (remove) data[data[,1]!=sel,] else data[data[,1]==sel,] }
? ? ID? ?? DATE
31 910 20091105
32 910 20091105
33 910 20091117
34 910 20091119
35 910 20091120
36 910 20091210
37 910 20091224
38 910 20091224
? ?? ID? ?? DATE
1? ? 58 20060821
2? ? 58 20061207
3? ? 58 20080102
4? ? 58 20090904
5?? 167 20040205
6?? 167 20040323
7?? 323 20051111
8?? 323 20060111
9?? 323 20071119
10? 323 20080107
11? 323 20080407
12? 323 20080521
13? 323 20080711
14? 547 20041005
15? 794 20070905
16? 814 20020814
17? 814 20021125
18? 814 20040429
19? 814 20040429
20? 814 20071205
21? 814 20080227
22? 841 20050421
23? 841 20060130
24? 841 20060428
25? 841 20060602
26? 841 20060816
27? 841 20061025
28? 841 20061129
29? 841 20070112
30? 841 20070514
39? 999 20050503
40 1019 19870508
41 1019 19880223
42 1019 19880330
43 1019 19880330
Regards
Petr
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

This message and any attachment are intended solely for the addressee and may contain confidential information. If you have received this message in error, please send it back to me, and immediately delete it.?  Please do not use, copy or disclose the information contained in this message or in any attachment.? Any views or opinions expressed by the author of this email do not necessarily reflect the views of the University of Nottingham.

This message has been checked for viruses but the contents of an attachment
may still contain software viruses which could damage your computer system:
you are advised to perform your own checks. Email communications with the
University of Nottingham may be monitored as permitted by UK legislation.
#
I too had a parsimonious solution that was also fooled by IDs that had a duplicate date that wasn't the first date, but was the same as another ID's duplicate+first.

The right answer
ID <- c(58,58,58,58,167,167,323,323,323,323,323,323,323
,547,794,814,814,814,814,814,814,841,841,841,841,841
,841,841,841,841,910,910,910,910,910,910,999,1019,1019
,1019)

DATE <-
 c(20060821,20061207,20080102,20090904,20040205,20040205,20051111
 ,20060111,20071119,20080107,20080407,20080521,20080711,20041005
 ,20070905,20020814,20021125,20040429,20040429,20071205,20080227
 ,20050421,20050421,20060428,20060602,20060816,20061025,20061129
 ,20070112,20070514, 19870508,20040205,20040205, 20091120,20091210
 ,20091224,20050503,19870508,19870508,19880330)

 id.d <- cbind (ID,DATE )

is:

167, 841  and 1019 - correct.
814 910 - incorrect. Although there are duplicate dates, they are not the first date.

-----Original Message-----
From: arun [mailto:smartpink111 at yahoo.com]
Sent: 23 October 2012 14:29
To: Stuart Leask
Cc: R help
Subject: Re: [R] [r] How to pick colums from a ragged array?

Hi,
Also one more thing:
This should get the dates which are duplicated.  In my first reply, I was looking for the duplicated rows. Sorry for that!

id.d<-data.frame(ID,DATE)

new1<-id.d[duplicated(id.d$DATE)|duplicated(id.d$DATE,fromLast=TRUE),]


new2<-new1[order(new1$ID,new1$DATE),]
 tapply(new2$ID,new2$DATE,head,1)
#19870508 20040205 20040429 20050421
  #   910      167      814      841

But, still the result is not that you wanted, because 910's date is the earliest date when compared to 1019.
new1[order(new1$ID,new1$DATE),]
#     ID     DATE
#5   167 20040205
#6   167 20040205
#18  814 20040429
#19  814 20040429
#22  841 20050421
#23  841 20050421
#31  910 19870508
#32  910 20040205
#33  910 20040205
#38 1019 19870508
#39 1019 19870508

A.K.

----- Original Message -----
From: Stuart Leask <Stuart.Leask at nottingham.ac.uk>
To: arun <smartpink111 at yahoo.com>
Cc: Petr PIKAL <petr.pikal at precheza.cz>
Sent: Tuesday, October 23, 2012 9:15 AM
Subject: RE: [R] [r] How to pick colums from a ragged array?

Sorry Arun, but when I run it I get an error:
+ ,547,794,814,814,814,814,814,814,841,841,841,841,841
+ ,841,841,841,841,910,910,910,910,910,910,999,1019,1019
+ ,1019)
+  c(20060821,20061207,20080102,20090904,20040205,20040205,20051111
+  ,20060111,20071119,20080107,20080407,20080521,20080711,20041005
+  ,20070905,20020814,20021125,20040429,20040429,20071205,20080227
+  ,20050421,20050421,20060428,20060602,20060816,20061025,20061129
+  ,20070112,20070514, 19870508,20040205,20040205, 20091120,20091210
+  ,20091224,20050503,19870508,19870508,19880330)
Error in new1$DATE : $ operator is invalid for atomic vectors




-----Original Message-----
From: arun [mailto:smartpink111 at yahoo.com]
Sent: 23 October 2012 14:05
To: Stuart Leask
Cc: R help; Petr PIKAL
Subject: Re: [R] [r] How to pick colums from a ragged array?

HI,
I was not following the thread.
May be this is what you are looking for:
new1<-id.d[duplicated(id.d)|duplicated(id.d,fromLast=TRUE),]


tapply(new1$ID,new1$DATE,head,1)
#19870508 20040205 20040429 20050421
  #  1019      167      814      841
A.K.




----- Original Message -----
From: Stuart Leask <Stuart.Leask at nottingham.ac.uk>
To: PIKAL Petr <petr.pikal at precheza.cz>; "r-help at r-project.org" <r-help at r-project.org>
Cc:
Sent: Tuesday, October 23, 2012 8:28 AM
Subject: Re: [R] [r] How to pick colums from a ragged array?

Hi there.

Not sure I follow what you are doing.

I want a list of all the IDs that have duplicate DATE entries, only when the DATE is the earliest (or last) date for that ID.

I have refined my test dataset, to include some tests (e.g. 910 has the same dup as 1019, but for 910 it's not the earliest date):


ID <- c(58,58,58,58,167,167,323,323,323,323,323,323,323
,547,794,814,814,814,814,814,814,841,841,841,841,841
,841,841,841,841,910,910,910,910,910,910,999,1019,1019
,1019)

DATE <-
c(20060821,20061207,20080102,20090904,20040205,20040205,20051111
,20060111,20071119,20080107,20080407,20080521,20080711,20041005
,20070905,20020814,20021125,20040429,20040429,20071205,20080227
,20050421,20050421,20060428,20060602,20060816,20061025,20061129
,20070112,20070514, 19870508,20040205,20040205, 20091120,20091210
,20091224,20050503,19870508,19870508,19880330)

Correct output:
"167"  "841"  "1019"

Stuart

-----Original Message-----
From: PIKAL Petr [mailto:petr.pikal at precheza.cz]
Sent: 23 October 2012 13:15
To: Stuart Leask; r-help at r-project.org
Subject: RE: [r] How to pick colums from a ragged array?

Hi

Rui's answer brought me to more elaborated solution which still needs data frame to be ordered by date

fff<-function(data, first=TRUE, remove=FALSE) {

testfirst <- function(x) x[1,2]==x[2,2]
testlast <- function(x) x[length(x),2]==x[length(x)-1,2]

if(first) sel <- as.numeric(names(which(sapply(split(data, data[,1]), testfirst)))) else sel <- as.numeric(names(which(sapply(split(data, data[,1]), testlast))))

if (remove) data[data[,1]!=sel,] else data[data[,1]==sel,] }
ID     DATE
31 910 20091105
32 910 20091105
33 910 20091117
34 910 20091119
35 910 20091120
36 910 20091210
37 910 20091224
38 910 20091224
ID     DATE
1    58 20060821
2    58 20061207
3    58 20080102
4    58 20090904
5   167 20040205
6   167 20040323
7   323 20051111
8   323 20060111
9   323 20071119
10  323 20080107
11  323 20080407
12  323 20080521
13  323 20080711
14  547 20041005
15  794 20070905
16  814 20020814
17  814 20021125
18  814 20040429
19  814 20040429
20  814 20071205
21  814 20080227
22  841 20050421
23  841 20060130
24  841 20060428
25  841 20060602
26  841 20060816
27  841 20061025
28  841 20061129
29  841 20070112
30  841 20070514
39  999 20050503
40 1019 19870508
41 1019 19880223
42 1019 19880330
43 1019 19880330
Regards
Petr
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

This message and any attachment are intended solely for the addressee and may contain confidential information. If you have received this message in error, please send it back to me, and immediately delete it.   Please do not use, copy or disclose the information contained in this message or in any attachment.  Any views or opinions expressed by the author of this email do not necessarily reflect the views of the University of Nottingham.

This message has been checked for viruses but the contents of an attachment may still contain software viruses which could damage your computer system:
you are advised to perform your own checks. Email communications with the University of Nottingham may be monitored as permitted by UK legislation.
#
Hello,

Inline.
Em 23-10-2012 14:53, Stuart Leask escreveu:
The error comes from the fact that id.d is a matrix, Arun is using one 
of the list or data.frame ways of accessing the elements. Try new1[, 
"ID"] and new1[, "DATE"].
Anyway I believe the solution will give all duplicates' first rows, not 
the first rows of the duplicates in first row of each ID.

Rui Barradas