Skip to content

Keep rows in a dataset if one value in a column is duplicated

4 messages · GradStudentDD, Rui Barradas, Simon Knapp

#
Hi,

I have a data set of observations by either one person or a pair of people.
I want to only keep the pair observations, and was using the code below
until it gave me the error " $ operator is invalid for atomic vectors". I am
just beginning to learn R, so I apologize if the code is really rough.

Basically I want to keep all the rows in the data set for which the value of
"Pairiddups" is TRUE. How do I do it? And how do I get past the error?

Thank you so much,
Diana

PairID<-c(Health2$pairid)

duplicated(PairID, incomparables=TRUE, fromLast=TRUE)

PairIDdup=duplicated(PairID)
cbind(PairID, PairIDdup)
PairID[which(PairIDdup)]

PairIDDuplicates<-PairID%in%PairID[which(PairIDdup)]
PairIDs<-cbind(PairID, PairIDDuplicates)

colnames(PairIDs)<-c("Pairid","Pairiddups")

Health2PairsOnly<-PairIDs[ which(PairIDs$Pairiddups=='TRUE'), ]



--
View this message in context: http://r.789695.n4.nabble.com/Keep-rows-in-a-dataset-if-one-value-in-a-column-is-duplicated-tp4644420.html
Sent from the R help mailing list archive at Nabble.com.
#
Hello,

That way of refering to variables can be troublesome. Try

PairIDs[, "Pairiddups"]


Hope this helps,

Rui Barradas

Em 27-09-2012 20:46, GradStudentDD escreveu:
#
Hello, again.

There was another error in the line in question. TRUE does not need 
quotes. In fact, with quotes you're comparing to a character string, not 
to a logical value.
And the other tip still holds, use as follows in the complete and 
corrected line below.

Health2PairsOnly <- PairIDs[ which(PairIDs[, "Pairiddups"] == TRUE), ]

Hope this helps,

Rui Barradas
Em 27-09-2012 20:46, GradStudentDD escreveu:
#
#By using cbind in:
PairIDs<-cbind(PairID, PairIDDuplicates)

#You create a numeric matrix (the logical
#vector PairIDDuplicates gets converted
#to numeric - note that your second column
#contains 1s and 0s, not Trues and Falses).
#Matricies are not subsetable using $,
#they are basically a vector with
#a dimension attribute - hence your error).

#Two ways you could have avoided your error are:
# 1) changing the cbind to data.frame
PairIDs <- data.frame(PairID, PairIDDuplicates)
names(PairIDs) <- c("Pairid","Pairiddups")
Health2PairsOnly <- PairIDs[PairIDs$Pairiddups,]

# 2) using the dimensions name like:
PairIDs<-cbind(PairID, PairIDDuplicates)
colnames(PairIDs) <- c("Pairid","Pairiddups")
Health2PairsOnly <- PairIDs[PairIDs[,'Pairiddups']==1,]

#In the latter you can save a line of code with
PairIDs <- data.frame(Pairid=PairID, Pairiddups=PairIDDuplicates)



#Note that there is a fair bit of redundancy throughout
#your code. A neater way of subsetting your original
#data, for instance, would be:
PairIDdup <- unique(PairID[duplicated(PairID)])
Health2[PairID %in% PairIDdup,]



Have Fun!
Simon Knapp
On Fri, Sep 28, 2012 at 5:46 AM, GradStudentDD <dd7kc at virginia.edu> wrote: