Skip to content
Prev 277172 / 398503 Next

identify duplicate from more than one column

Hi Carlos,

Here is one option:

## read in your data
dat <- read.table(textConnection("
obs     unit            home       z    sex     age
1       015029  18             1        1       053
2       015029  18             1        2       049
3       015029  01             1        1       038
4       015029  01             1        2       033
5       015029  02             1        1       036
6       015029  02             1        2       033
7       015029  03             1        1       023
8       015029  03             1        2       019
9       015029  04             1        2       045
10      015029  05             1        2       047"),
  header = TRUE, stringsAsFactors = FALSE)
closeAllConnections()

## create a unique ID for matching unit and home
dat$mID <- with(dat, paste(unit, home, sep = ''))

## somewhat messy way of creating a couple number
## for each mID, if there is more than 1 row, and more than 1 sex
## it creates a couple id, otherwise 0
i <- 0L
dat$couple <- with(dat, unlist(lapply(split(sex, mID), function(x) {
  i <<- i + 1L
  if (length(x) > 1 && length(unique(x)) > 1) {
    rep(i, length(x))
  } else 0L
})))

## view results
dat
   obs  unit home z sex age     mID couple
1    1 15029   18 1   1  53 1502918      1
2    2 15029   18 1   2  49 1502918      1
3    3 15029    1 1   1  38  150291      2
4    4 15029    1 1   2  33  150291      2
5    5 15029    2 1   1  36  150292      3
6    6 15029    2 1   2  33  150292      3
7    7 15029    3 1   1  23  150293      4
8    8 15029    3 1   2  19  150293      4
9    9 15029    4 1   2  45  150294      0
10  10 15029    5 1   2  47  150295      0

See these functions for more details:

?ave # where I got my idea
?split
?lapply
?`<<-`

Cheers,

Josh
On Sat, Nov 12, 2011 at 8:16 PM, jour4life <jour4life at gmail.com> wrote: