This question likely has a 1 line answer, I'm just not seeing it. (2, 3, or 10 lines is fine too.) For a vector I can do group <- match(x, unqiue(x)) to get a vector that labels each element of x. What is an equivalent if x is a data frame? The result does not have to be fast: the data set will have < 100 elements. Since this is inside the survival package, and that package is on the 'recommended' list, I can't depend on any package outside the recommended list. Terry T.
help matching rows of a data frame
7 messages · Terry Therneau, Eric Berger, Jeff Newmiller +4 more
Hi Terry,
I take your question to mean how to label distinct rows of a data frame. If
that is not your question please clarify.
I found the row.match() function in the package prodlim that can be used to
solve this.
However since your request requires no additional dependencies I borrowed
the relevant code from the row.match function.
Here is some obfuscated code to provide your answer in one line, per your
request. (less obfuscated code just below that.
Assuming your data frame is called 'df':
df[,ncol(df)+1] <- match( do.call("paste", c(df[, , drop = FALSE], sep =
"\\r")), do.call("paste", c(unique(df)[, , drop = FALSE], sep = "\\r")) )
The last column of df now contains the 'label' i.e. the row number of the
first row in df that is the same as the given row.
Somewhat less obfuscated
getLabels <- function(df) {
match( do.call("paste", c(df[, , drop = FALSE],
sep = "\\r")),
do.call("paste", c(unique(df)[, , drop
= FALSE], sep = "\\r")) )
}
myDataFrame$label <- getLabels(myDataFrame)
HTH,
Eric
On Mon, Sep 18, 2017 at 3:13 PM, Therneau, Terry M., Ph.D. <
therneau at mayo.edu> wrote:
This question likely has a 1 line answer, I'm just not seeing it. (2, 3, or 10 lines is fine too.) For a vector I can do group <- match(x, unqiue(x)) to get a vector that labels each element of x. What is an equivalent if x is a data frame? The result does not have to be fast: the data set will have < 100 elements. Since this is inside the survival package, and that package is on the 'recommended' list, I can't depend on any package outside the recommended list. Terry T.
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posti ng-guide.html and provide commented, minimal, self-contained, reproducible code.
"Label" is not a clear term for data frames, but most data frames have rownames. If dta is a data frame, not a tibble, rownames( dta )[ !duplicated( dta ) ] Or could use row indexes directly which( !duplicated( dta ) )
Sent from my phone. Please excuse my brevity.
On September 18, 2017 6:54:29 AM PDT, Eric Berger <ericjberger at gmail.com> wrote:
>Hi Terry,
>I take your question to mean how to label distinct rows of a data
>frame. If
>that is not your question please clarify.
>I found the row.match() function in the package prodlim that can be
>used to
>solve this.
>However since your request requires no additional dependencies I
>borrowed
>the relevant code from the row.match function.
>Here is some obfuscated code to provide your answer in one line, per
>your
>request. (less obfuscated code just below that.
>
>Assuming your data frame is called 'df':
>
>df[,ncol(df)+1] <- match( do.call("paste", c(df[, , drop = FALSE], sep
>=
>"\\r")), do.call("paste", c(unique(df)[, , drop = FALSE], sep = "\\r"))
>)
>
>The last column of df now contains the 'label' i.e. the row number of
>the
>first row in df that is the same as the given row.
>
>Somewhat less obfuscated
>
>getLabels <- function(df) {
> match( do.call("paste", c(df[, , drop = FALSE],
>sep = "\\r")),
> do.call("paste", c(unique(df)[, , drop
>= FALSE], sep = "\\r")) )
> }
>
>myDataFrame$label <- getLabels(myDataFrame)
>
>
>HTH,
>
>Eric
>
>
>On Mon, Sep 18, 2017 at 3:13 PM, Therneau, Terry M., Ph.D. <
>therneau at mayo.edu> wrote:
>
>> This question likely has a 1 line answer, I'm just not seeing it.
>(2, 3,
>> or 10 lines is fine too.)
>>
>> For a vector I can do group <- match(x, unqiue(x)) to get a vector
>that
>> labels each element of x.
>> What is an equivalent if x is a data frame?
>>
>> The result does not have to be fast: the data set will have < 100
>> elements. Since this is inside the survival package, and that
>package is
>> on the 'recommended' list, I can't depend on any package outside the
>> recommended list.
>>
>> Terry T.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posti
>> ng-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> [[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
Hi!
2017-09-18 07:13 -0500, Therneau, Terry M., Ph.D. wrote:
This question likely has a 1 line answer, I'm just not seeing it.??(2, 3, or 10 lines is? fine too.) For a vector I can do group??<- match(x, unqiue(x)) to get a vector that labels each? element of x.
Actually, you get a vector of indices matching 'unique(x)', not a labelled vector.
x<-c("A","B","C","A","C","D")
group<-match(x, unique(x))
group
[1] 1 2 3 1 3 4
What is an equivalent if x is a data frame?
So you will generate an index where duplicated rows have the row index of the first occurrence, right? This could work:
?x<-data.frame("X0"=c("A","B","C","C","D","A"), "X1"=c(1,2,1,1,3,1))
group<-rownames(x)
?for (i in 1:(nrow(x)-1)) {?
? ? ?for (j in (i+1):nrow(x)) {?
? ? ? ? if (sum(as.numeric(x[i,]==x[j,]))==ncol(x)) {?
? ? ? ? ? ?group[j]<-group[i] }
? ? ?}
? ?}
?group
[1] "1" "2" "3" "3" "5" "1" HTH, Kimmo
You could use merge() with an ID column pasted onto the table of names, as in
tbl <- data.frame(FirstName=c("Abe","Abe","Bob","Chuck","Chuck"),
Surname=c("Xavier","Yates","Yates","Yates","Zapf"), Id=paste0("P",101:105))
tbl
FirstName Surname Id 1 Abe Xavier P101 2 Abe Yates P102 3 Bob Yates P103 4 Chuck Yates P104 5 Chuck Zapf P105
merge(data.frame(FirstName=c("Abe","Chuck","Dave"),
Surname=rep("Yates",3)), tbl, all.x=TRUE)
FirstName Surname Id
1 Abe Yates P102
2 Chuck Yates P104
3 Dave Yates <NA>
Bill Dunlap
TIBCO Software
wdunlap tibco.com
On Mon, Sep 18, 2017 at 5:13 AM, Therneau, Terry M., Ph.D. <
therneau at mayo.edu> wrote:
This question likely has a 1 line answer, I'm just not seeing it. (2, 3, or 10 lines is fine too.) For a vector I can do group <- match(x, unqiue(x)) to get a vector that labels each element of x. What is an equivalent if x is a data frame? The result does not have to be fast: the data set will have < 100 elements. Since this is inside the survival package, and that package is on the 'recommended' list, I can't depend on any package outside the recommended list. Terry T.
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posti ng-guide.html and provide commented, minimal, self-contained, reproducible code.
On Sep 18, 2017, at 5:13 AM, Therneau, Terry M., Ph.D. <therneau at mayo.edu> wrote: This question likely has a 1 line answer, I'm just not seeing it. (2, 3, or 10 lines is fine too.) For a vector I can do group <- match(x, unqiue(x)) to get a vector that labels each element of x. What is an equivalent if x is a data frame?
In the past I've use apply with past to generate "group" identifiers:
x<-data.frame("X0"=c("A","B","C","C","D","A"), "X1"=c(1,2,1,1,3,1))
apply(x, 1, paste, collapse=".")
[1] "A.1" "B.2" "C.1" "C.1" "D.3" "A.1"
The result does not have to be fast: the data set will have < 100 elements. Since this is inside the survival package, and that package is on the 'recommended' list, I can't depend on any package outside the recommended list.
David Winsemius Alameda, CA, USA 'Any technology distinguishable from magic is insufficiently advanced.' -Gehm's Corollary to Clarke's Third Law
Yes. My understanding is that you want the identifier to have the same number of rows as the data frame. A slight variant of David's solution would then be: do.call(paste0,x) -- Bert On Mon, Sep 18, 2017 at 8:29 AM, David Winsemius <dwinsemius at comcast.net> wrote:
On Sep 18, 2017, at 5:13 AM, Therneau, Terry M., Ph.D. <
therneau at mayo.edu> wrote:
This question likely has a 1 line answer, I'm just not seeing it. (2,
3, or 10 lines is fine too.)
For a vector I can do group <- match(x, unqiue(x)) to get a vector that
labels each element of x.
What is an equivalent if x is a data frame?
In the past I've use apply with past to generate "group" identifiers:
x<-data.frame("X0"=c("A","B","C","C","D","A"), "X1"=c(1,2,1,1,3,1))
apply(x, 1, paste, collapse=".")
[1] "A.1" "B.2" "C.1" "C.1" "D.3" "A.1"
The result does not have to be fast: the data set will have < 100
elements. Since this is inside the survival package, and that package is on the 'recommended' list, I can't depend on any package outside the recommended list. David Winsemius Alameda, CA, USA 'Any technology distinguishable from magic is insufficiently advanced.' -Gehm's Corollary to Clarke's Third Law
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/ posting-guide.html and provide commented, minimal, self-contained, reproducible code.