Skip to content
Prev 306490 / 398506 Next

Keep rows in a dataset if one value in a column is duplicated

#By using cbind in:
PairIDs<-cbind(PairID, PairIDDuplicates)

#You create a numeric matrix (the logical
#vector PairIDDuplicates gets converted
#to numeric - note that your second column
#contains 1s and 0s, not Trues and Falses).
#Matricies are not subsetable using $,
#they are basically a vector with
#a dimension attribute - hence your error).

#Two ways you could have avoided your error are:
# 1) changing the cbind to data.frame
PairIDs <- data.frame(PairID, PairIDDuplicates)
names(PairIDs) <- c("Pairid","Pairiddups")
Health2PairsOnly <- PairIDs[PairIDs$Pairiddups,]

# 2) using the dimensions name like:
PairIDs<-cbind(PairID, PairIDDuplicates)
colnames(PairIDs) <- c("Pairid","Pairiddups")
Health2PairsOnly <- PairIDs[PairIDs[,'Pairiddups']==1,]

#In the latter you can save a line of code with
PairIDs <- data.frame(Pairid=PairID, Pairiddups=PairIDDuplicates)



#Note that there is a fair bit of redundancy throughout
#your code. A neater way of subsetting your original
#data, for instance, would be:
PairIDdup <- unique(PairID[duplicated(PairID)])
Health2[PairID %in% PairIDdup,]



Have Fun!
Simon Knapp
On Fri, Sep 28, 2012 at 5:46 AM, GradStudentDD <dd7kc at virginia.edu> wrote: