Skip to content

non-consing count

7 messages · Sam Steingold, Ista Zahn, jim holtman +3 more

#
Hi,
to count vector elements with some property, the standard idiom seems to
be length(which):
--8<---------------cut here---------------start------------->8---
x <- c(1,1,0,0,0)
count.0 <- length(which(x == 0))
--8<---------------cut here---------------end--------------->8---
however, this approach allocates and discards 2 vectors: a logical
vector of length=length(x) and an integer vector in which.
is there a cheaper alternative?
Thanks!
#
Hi Sam,

Here is one alternative, which is at least faster:

system.time(count.0 <- length(which(x == 0)))
system.time(count.1 <- sum(x == 0))
all.equal(count.0, count.1)

Best,
Ista
On Fri, Jan 4, 2013 at 10:30 AM, Sam Steingold <sds at gnu.org> wrote:
#
What is the concern if it works?  you can also do

sum(x==0)

Is performance a concern?  How often are you going to do it and what
other parts of your script also take longer?  Why are you concerned
about allocating/discarding two vectors?
On Fri, Jan 4, 2013 at 10:30 AM, Sam Steingold <sds at gnu.org> wrote:

  
    
#
My 2 cents:

AFAIK both which and length are from C compiled code:

http://cran.r-project.org/doc/manuals/r-release/R-ints.html#g_t_002eInternal-vs-_002ePrimitive

so they must be quite efficient ie .Primitive and .Internal. Probably
combination
of this with a pattern in C would be more memory efficient to count
patterns, but
would that make sense? Because in general if you look for a pattern in
a vector, you
need to know where it is, hence which operation, at least for debugging/testing
purposes...
On 4 January 2013 16:30, Sam Steingold <sds at gnu.org> wrote:
#
On 4 January 2013 16:53, jim holtman <jholtman at gmail.com> wrote:
I think Sam's question was about additional memory introduced by which.

For example:
56 bytes
104 bytes


If you have very large vector, a time series for example. This would
make a lot of
difference. I am not sure how 'sum' internally handles, but  As I said
earlier, a special function
in C might be faster then length-which couple or sum, that counts
occurrences as it goes, so it could
get the result in one go, maybe like  x %count% 0. One can implement a
recursive function
to do this in R interpreter level, but not sure about recursion depth
memory requirement.
#
On Jan 4, 2013, at 7:30 AM, Sam Steingold wrote:
Hi,

     to count vector elements with some property, the standard idiom seems to

     be length(which):

     --8<---------------cut here---------------start------------->8---

     x <- c(1,1,0,0,0)

     count.0 <- length(which(x == 0))

     --8<---------------cut here---------------end--------------->8---

     however, this approach allocates and discards 2 vectors: a logical

     vector of length=length(x) and an integer vector in which.

     is there a cheaper alternative?

   I don't know if it is "cheaper", but the way I "learned to count" was:
   sum(x==8, na.rm=TRUE)
   --
   David Winsemius
   Alameda, CA, USA