Skip to content
Prev 274722 / 398506 Next

Function to "lump" factors together?

On Oct 17, 2011, at 9:45 PM, David Wolfskill wrote:

            
Here's a loopless lumping of random letters with an "other" value .  
There better ways, but my efforts with match and switch came to  
naught. "pmatch" returns a numeric vector that selects the group.

 > x <- sample(letters[1:10], 50, replace =TRUE)
 > c("abc","abc","abc","def","def","def","ghi","ghi","ghi", "j") 
[pmatch(x, letters[1:10], duplicates.ok=TRUE, nomatch=10)]
  [1] "ghi" "ghi" "ghi" "ghi" "ghi" "def" "def" "ghi" "def" "abc"  
"abc" "j"   "def" "def" "ghi"
[16] "abc" "j"   "def" "ghi" "abc" "ghi" "abc" "abc" "abc" "abc" "abc"  
"abc" "ghi" "def" "abc"
[31] "ghi" "def" "ghi" "def" "abc" "ghi" "ghi" "j"   "abc" "def" "abc"  
"ghi" "abc" "def" "def"
[46] "def" "j"   "ghi" "def" "def"

Classifying 5 million letters in about a second:

 > x <- sample(letters[1:10], 5000000, replace =TRUE)
 > system.time( v <- 
c("abc","abc","abc","def","def","def","ghi","ghi","ghi", "j") 
[pmatch(x, letters[1:10], duplicates.ok=TRUE, nomatch=10)] )
    user  system elapsed
   0.858   0.208   1.062

The same strategy (indexing to return a set membership) can be used  
with findInterval.