Skip to content
Prev 286477 / 398502 Next

loop for a large database

On Sun, Feb 26, 2012 at 04:13:49AM -0800, mari681 wrote:
Hi.

As David pointed out, you probably want to compute 

  sum (MyTable== myvector[i])

and not sum (MyTable== i).

Also, i would expect storing the results somewhere, for example

  numOccur <- rep(NA, times=length(myvector))
  for (i in 1:length(myvector)) numOccur[i] <- sum(MyTable == myvector[i])

What do you see on the crashing computer? I would expect it to run for
a long time, but not crashing.

Try to run your code on a smaller part of the data to test efficiency
of different approaches.

How many different strings are in your data? If there is a lot of
repeated strings, then it may be better to first compute the
frequency table of them and search the strings from "myvector"
in this table and sum the frequencies.

Does your data frame consist of character vectors or from factors?
This may be seen by testing class(MyTable[[1]]).

Petr Savicky.