Very slow subsetting by name

Martin Morgan · 2010-07-15T18:16:14Z

On 07/15/2010 08:38 AM, Martin Morgan wrote: > On 07/15/2010 01:12 AM, Herv? Pag?s wrote: >> Hi, >> >> I'm subsetting a named vector using character indices. >> My vector of indices (or keys) is 10x longer than the vector >> I'm subsetting. All my keys are distinct and only 10% of them >> are valid (i.e. match a name of the vector being subsetted). >> It is surprisingly slow: >> >> x1 > names(x1) > keys >>

Martin Morgan

Thu, Jul 15, 2010 11:16 AM

On 07/15/2010 08:38 AM, Martin Morgan wrote:

this passes make check and does

user  system elapsed
  0.092   0.000   0.093

[1] TRUE

but uses some additional memory.

Martin

Index: src/main/subscript.c
===================================================================
--- src/main/subscript.c	(revision 52526)
+++ src/main/subscript.c	(working copy)
@@ -535,15 +535,17 @@
     }


+    SEXP sindx = PROTECT(match(s, s, 0)); /* first match */
     for (i = 0; i < ns; i++) {
 	sub = INTEGER(indx)[i];
 	if (sub == 0) {
-	    for (j = 0 ; j < i ; j++)
-		if (NonNullStringMatch(STRING_ELT(s, i), STRING_ELT(s, j))) {
-		    sub = INTEGER(indx)[j];
-		    SET_VECTOR_ELT(indexnames, i, STRING_ELT(s, j));
-		    break;
-		}
+            j = INTEGER(sindx)[i] - 1;
+            if (NA_STRING != STRING_ELT(s, j) &&
+                R_NilValue != STRING_ELT(s, j))
+            {
+                sub = INTEGER(indx)[j];
+                SET_VECTOR_ELT(indexnames, i, STRING_ELT(s, j));
+            }
 	}
 	if (sub == 0) {
 	    if (!canstretch) {
@@ -561,7 +563,7 @@
 	setAttrib(indx, R_UseNamesSymbol, indexnames);
     if (canstretch)
 	*stretch = extra;
-    UNPROTECT(4);
+    UNPROTECT(5);
     return indx;
 }

Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

Very slow subsetting by name

Thread (3 messages)