On 4 Jun 2017, at 06:35 , Bert Gunter <bgunter.4567 at gmail.com> wrote:
I'll go just a bit "fer-er." It appears the anomaly -- I hesitate to
call it a bug -- is in the C code for duplicated.default():
duplicated(letters[1:10],nmax=10)
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
duplicated(letters[1:10],nmax=9)
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
duplicated(letters[1:10],nmax=8) ## for all nmax <9
Error in duplicated.default(letters[1:10], nmax = 8) : hash table is full
Cleverer folks than I must now explain (and possibly correct me).
Cheers,
Bert
Bert Gunter
"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Sat, Jun 3, 2017 at 9:11 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote:
Well, you won't like this, but it is kind of wimpily (is that a word?)
documented:
If you check the code of factor(), you will see that nmax appears as
an argument in a call to unique(). ?unique says for nmax, "... see
duplicated" . And ?duplicated says:
"If nmax is set too small there is liable to be an error: nmax = 1 is
silently ignored."
So sometimes you get an error when nmax is too small with the hash
table error message; and sometimes you just apparently get the nmax
argument ignored:
identical(factor(letters,nmax = 25), factor(letters,nmax=26))
[1] TRUE
and that, to paraphrase what Roger Hammerstein said about Kansas City,
is about "as fer as I can go."
(http://lyricsplayground.com/alpha/songs/e/everythingsuptodateinkansascity.shtml)
Cheers,
Bert
Bert Gunter
"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Sat, Jun 3, 2017 at 6:14 PM, Ramnik Bansal <ramnik.bansal at gmail.com> wrote:
I have been trying to understand how the argument 'nmax' works in
'factor' function. R-Documentation states - "Since factors typically
have quite a small number of levels, for large vectors x it is helpful
to supply nmax as an upper bound on the number of unique values."
In the code below what is the reason for error when value of nmax is
24. Why did the same error not occur with nmax = 25 and also how come
there are 26 levels when nmax = 25 ?
factor(x = letters, nmax = 26)
[1] a b c d e f g h i j k l m n o p q r s t u v w x y z
Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
factor(x = letters, nmax = 25)
[1] a b c d e f g h i j k l m n o p q r s t u v w x y z
Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
factor(x = letters, nmax = 24)
Error in unique.default(x, nmax = nmax) : hash table is full