Skip to content
Prev 385138 / 398502 Next

Mathematical working procedure of duplicated() function in r

Rui pointed out that you can examine the source yourself.  FAQ 7.40
has a link to an article with detail on finding and examining the
source code.

A general algorithm for checking for duplicates follows (I have not
examined to R source code to see if they use something more clever).

Create an empty object (I will call it seen).  This could be a simple
vector, but for efficiency it is better to use an object type that has
fast lookup, e.g. binary tree, associative array/hash/dictionary, etc.

Create an empty vector of logicals the same length as x (I will call it result).

loop from 1 to the length of x (or from the length to 1 if
fromLast=TRUE), on each iteration
 check to see if the value of x[i] is in seen
   If it is: set result[i] to TRUE
   If it is not: add the current value to seen and set result[i] to false

After the loop finishes, throw away seen and reclaim the memory, then
return result.

Since it looks like you are using this on a matrix or data frame,
there is probably a preprocessing step that combines all the values on
each row into a single character string.
On Tue, Aug 4, 2020 at 6:45 AM K Purna Prakash <prakash.nani at gmail.com> wrote: