An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130123/a2e82736/attachment.pl>
problems with coercing a factor to be numeric
11 messages · Francesco Sarracino, Dimitris Rizopoulos, David Winsemius +3 more
Check R FAQ 7.10: How do I convert factors to numeric? I hope it helps. Best, Dimitris
On 1/23/2013 10:33 AM, Francesco Sarracino wrote:
Dear R listers,
I am trying to compute the mean of a dummy variable that is encoded as a
factor. However, even though the levels of my factor are 0 - 1, when I
compute the mean (after coercing the factor to be
numeric), R changes 0 into 1 and 1 into yes, thus altering my expected
result.
Please, consider the following working example:
pp <- rep(0:1, 10)
pp <- factor(pp, levels=(0:1), labels=c("no","yes"))
mean(pp) #this won't work because the argument is not numeric or logical
mean(as.integer(pp)) # this computes the average, but not on the range 0-1,
but 1-2. Indeed, the result is 1.5 and not 0.5 as expected.
What am I doing wrong?
Thanks in advance for your kind support,
f.
Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014 Web: http://www.erasmusmc.nl/biostatistiek/
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130123/5cc8b507/attachment.pl>
check also
pp <- rep(0:1, 10)
pp <- factor(pp, levels=(0:1), labels=c("no","yes"))
unclass(pp)
unclass(pp) - 1
Best,
Dimitris
On 1/23/2013 10:48 AM, Francesco Sarracino wrote:
Dear Dimitris,
thanks for your quick reply. I've tried the solutions proposed in 7.10
How do I convert factors to numeric?
as.numeric(as.character(pp))
and
as.numeric(levels(pp))[as.integer(pp)]
However, whatever I do, I get "Warning message: NAs introduced by coercion"
and the output is a vector of NA.
Any ideas?
f.
On 23 January 2013 10:39, D. Rizopoulos <d.rizopoulos at erasmusmc.nl
<mailto:d.rizopoulos at erasmusmc.nl>> wrote:
Check R FAQ 7.10: How do I convert factors to numeric?
I hope it helps.
Best,
Dimitris
On 1/23/2013 10:33 AM, Francesco Sarracino wrote:
> Dear R listers,
>
> I am trying to compute the mean of a dummy variable that is
encoded as a
> factor. However, even though the levels of my factor are 0 - 1,
when I
> compute the mean (after coercing the factor to be
> numeric), R changes 0 into 1 and 1 into yes, thus altering my
expected
> result.
>
> Please, consider the following working example:
> pp <- rep(0:1, 10)
> pp <- factor(pp, levels=(0:1), labels=c("no","yes"))
> mean(pp) #this won't work because the argument is not numeric or
logical
> mean(as.integer(pp)) # this computes the average, but not on the
range 0-1,
> but 1-2. Indeed, the result is 1.5 and not 0.5 as expected.
>
> What am I doing wrong?
> Thanks in advance for your kind support,
> f.
>
>
--
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center
Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478 <tel:%2B31%2F%280%2910%2F7043478>
Fax: +31/(0)10/7043014 <tel:%2B31%2F%280%2910%2F7043014>
Web: http://www.erasmusmc.nl/biostatistiek/
--
Francesco Sarracino, Ph.D.
https://sites.google.com/site/fsarracino/
Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014 Web: http://www.erasmusmc.nl/biostatistiek/
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130123/4b4fead7/attachment.pl>
On Jan 23, 2013, at 1:58 AM, Francesco Sarracino wrote:
Thanks, this works! but I am surprised that R has such a strange behavior and that there is no way to control it. BTW, also as.integer(pp)-1 works! Still, it doesn't look to me as a first best. At any rate, thanks a lot for your help.
I think it is rather strange that you are criticising R because the mean or sum functions won't coerce factors to numeric class. R is already very loosely typed. It has a fairly limited number of object classes and there is widespread class coercion when it is appropriate. Can you explain why you believed factors or by logical extension character classed variables should get implicitly coerced by all mathematical functions?
David.
> f.
>
>
> On 23 January 2013 10:53, D. Rizopoulos <d.rizopoulos at erasmusmc.nl>
> wrote:
>
>> check also
>>
>> pp <- rep(0:1, 10)
>> pp <- factor(pp, levels=(0:1), labels=c("no","yes"))
>>
>> unclass(pp)
>> unclass(pp) - 1
>>
>>
>> Best,
>> Dimitris
>>
>>
>> On 1/23/2013 10:48 AM, Francesco Sarracino wrote:
>>> Dear Dimitris,
>>>
>>> thanks for your quick reply. I've tried the solutions proposed in
>>> 7.10
>>> How do I convert factors to numeric?
>>>
>>> as.numeric(as.character(pp))
>>> and
>>> as.numeric(levels(pp))[as.integer(pp)]
>>>
>>> However, whatever I do, I get "Warning message: NAs introduced by
>> coercion"
>>> and the output is a vector of NA.
>>>
>>> Any ideas?
>>> f.
>>>
>>>
>>>
>>> On 23 January 2013 10:39, D. Rizopoulos <d.rizopoulos at erasmusmc.nl
>>> <mailto:d.rizopoulos at erasmusmc.nl>> wrote:
>>>
>>> Check R FAQ 7.10: How do I convert factors to numeric?
>>>
>>>
>>> I hope it helps.
>>>
>>> Best,
>>> Dimitris
>>>
>>>
>>> On 1/23/2013 10:33 AM, Francesco Sarracino wrote:
>>>> Dear R listers,
>>>>
>>>> I am trying to compute the mean of a dummy variable that is
>>> encoded as a
>>>> factor. However, even though the levels of my factor are 0 - 1,
>>> when I
>>>> compute the mean (after coercing the factor to be
>>>> numeric), R changes 0 into 1 and 1 into yes, thus altering my
>>> expected
>>>> result.
>>>>
>>>> Please, consider the following working example:
>>>> pp <- rep(0:1, 10)
>>>> pp <- factor(pp, levels=(0:1), labels=c("no","yes"))
>>>> mean(pp) #this won't work because the argument is not numeric or
>>> logical
>>>> mean(as.integer(pp)) # this computes the average, but not on the
>>> range 0-1,
>>>> but 1-2. Indeed, the result is 1.5 and not 0.5 as expected.
>>>>
>>>> What am I doing wrong?
>>>> Thanks in advance for your kind support,
>>>> f.
>>>>
>>>>
>>>
>>> --
>>> Dimitris Rizopoulos
>>> Assistant Professor
>>> Department of Biostatistics
>>> Erasmus University Medical Center
>>>
>>> Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
>>> Tel: +31/(0)10/7043478 <tel:%2B31%2F%280%2910%2F7043478>
>>> Fax: +31/(0)10/7043014 <tel:%2B31%2F%280%2910%2F7043014>
>>> Web: http://www.erasmusmc.nl/biostatistiek/
>>>
>>>
>>>
>>>
>>> --
>>> Francesco Sarracino, Ph.D.
>>> https://sites.google.com/site/fsarracino/
>>
>> --
>> Dimitris Rizopoulos
>> Assistant Professor
>> Department of Biostatistics
>> Erasmus University Medical Center
>>
>> Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
>> Tel: +31/(0)10/7043478
>> Fax: +31/(0)10/7043014
>> Web: http://www.erasmusmc.nl/biostatistiek/
>>
>
>
>
> --
> Francesco Sarracino, Ph.D.
> https://sites.google.com/site/fsarracino/
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
Alameda, CA, USA
To find the proportion of "yes"s in pp you can use mean(pp == "yes") and avoid the conversion of a factor to integer (and subtracting 1). The above works for character and factor pp. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com
-----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Francesco Sarracino Sent: Wednesday, January 23, 2013 1:59 AM To: D. Rizopoulos Cc: R help Subject: Re: [R] problems with coercing a factor to be numeric Thanks, this works! but I am surprised that R has such a strange behavior and that there is no way to control it. BTW, also as.integer(pp)-1 works! Still, it doesn't look to me as a first best. At any rate, thanks a lot for your help. f. On 23 January 2013 10:53, D. Rizopoulos <d.rizopoulos at erasmusmc.nl> wrote:
check also
pp <- rep(0:1, 10)
pp <- factor(pp, levels=(0:1), labels=c("no","yes"))
unclass(pp)
unclass(pp) - 1
Best,
Dimitris
On 1/23/2013 10:48 AM, Francesco Sarracino wrote:
Dear Dimitris, thanks for your quick reply. I've tried the solutions proposed in 7.10 How do I convert factors to numeric? as.numeric(as.character(pp)) and as.numeric(levels(pp))[as.integer(pp)] However, whatever I do, I get "Warning message: NAs introduced by
coercion"
and the output is a vector of NA.
Any ideas?
f.
On 23 January 2013 10:39, D. Rizopoulos <d.rizopoulos at erasmusmc.nl
<mailto:d.rizopoulos at erasmusmc.nl>> wrote:
Check R FAQ 7.10: How do I convert factors to numeric?
I hope it helps.
Best,
Dimitris
On 1/23/2013 10:33 AM, Francesco Sarracino wrote:
> Dear R listers,
>
> I am trying to compute the mean of a dummy variable that is
encoded as a
> factor. However, even though the levels of my factor are 0 - 1,
when I
> compute the mean (after coercing the factor to be
> numeric), R changes 0 into 1 and 1 into yes, thus altering my
expected
> result.
>
> Please, consider the following working example:
> pp <- rep(0:1, 10)
> pp <- factor(pp, levels=(0:1), labels=c("no","yes"))
> mean(pp) #this won't work because the argument is not numeric or
logical
> mean(as.integer(pp)) # this computes the average, but not on the
range 0-1,
> but 1-2. Indeed, the result is 1.5 and not 0.5 as expected.
>
> What am I doing wrong?
> Thanks in advance for your kind support,
> f.
>
>
--
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center
Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478 <tel:%2B31%2F%280%2910%2F7043478>
Fax: +31/(0)10/7043014 <tel:%2B31%2F%280%2910%2F7043014>
Web: http://www.erasmusmc.nl/biostatistiek/
--
Francesco Sarracino, Ph.D.
https://sites.google.com/site/fsarracino/
-- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014 Web: http://www.erasmusmc.nl/biostatistiek/
-- Francesco Sarracino, Ph.D. https://sites.google.com/site/fsarracino/ [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Given that your labels are "no" and "yes", what do you expect R to
do? To quote a well-known fortune, "R is lacking a mind_read() function!"
cheers,
Rolf Turner
On 01/23/2013 10:58 PM, Francesco Sarracino wrote:
Thanks, this works! but I am surprised that R has such a strange behavior and that there is no way to control it. BTW, also as.integer(pp)-1 works! Still, it doesn't look to me as a first best. At any rate, thanks a lot for your help. f. On 23 January 2013 10:53, D. Rizopoulos <d.rizopoulos at erasmusmc.nl> wrote:
check also
pp <- rep(0:1, 10)
pp <- factor(pp, levels=(0:1), labels=c("no","yes"))
unclass(pp)
unclass(pp) - 1
Best,
Dimitris
On 1/23/2013 10:48 AM, Francesco Sarracino wrote:
Dear Dimitris, thanks for your quick reply. I've tried the solutions proposed in 7.10 How do I convert factors to numeric? as.numeric(as.character(pp)) and as.numeric(levels(pp))[as.integer(pp)] However, whatever I do, I get "Warning message: NAs introduced by
coercion"
and the output is a vector of NA.
Any ideas?
f.
On 23 January 2013 10:39, D. Rizopoulos <d.rizopoulos at erasmusmc.nl
<mailto:d.rizopoulos at erasmusmc.nl>> wrote:
Check R FAQ 7.10: How do I convert factors to numeric?
I hope it helps.
Best,
Dimitris
On 1/23/2013 10:33 AM, Francesco Sarracino wrote:
> Dear R listers,
>
> I am trying to compute the mean of a dummy variable that is
encoded as a
> factor. However, even though the levels of my factor are 0 - 1,
when I
> compute the mean (after coercing the factor to be
> numeric), R changes 0 into 1 and 1 into yes, thus altering my
expected
> result.
>
> Please, consider the following working example:
> pp <- rep(0:1, 10)
> pp <- factor(pp, levels=(0:1), labels=c("no","yes"))
> mean(pp) #this won't work because the argument is not numeric or
logical
> mean(as.integer(pp)) # this computes the average, but not on the
range 0-1,
> but 1-2. Indeed, the result is 1.5 and not 0.5 as expected.
>
> What am I doing wrong?
> Thanks in advance for your kind support,
> f.
>
>
--
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center
Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478 <tel:%2B31%2F%280%2910%2F7043478>
Fax: +31/(0)10/7043014 <tel:%2B31%2F%280%2910%2F7043014>
Web: http://www.erasmusmc.nl/biostatistiek/
--
Francesco Sarracino, Ph.D.
https://sites.google.com/site/fsarracino/
-- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014 Web: http://www.erasmusmc.nl/biostatistiek/
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130123/dd656060/attachment.pl>
On 23 Jan 2013, at 21:36, "Francesco Sarracino" <f.sarracino at gmail.com> wrote:
.... what I meant refers to the fact that I've read on "an R and
S-plus companion to applied regression" about methods to alter the encoding
of factors when using contrasts in regressions. These are options (for
contrasts) that can be easily set as "option('contrasts')". This command
changes the way R creates the dummies out of a factor and various methods
are available.
I was expecting that R might have had something similar that applied to my
case, thus changing the way R attaches numeric values to my dummy variable.
I am just surprised that such option doesn't exist. I was having wrong
expectations.
Such options do exist, but at modelling time, not factor creation/conversion time.
When created, by calls to 'factor' or in functions like 'read.table', factors are stored internally as integers with a list of labels (what you see as factor levels) that go with each integer. Those internal integers start at 1 and go up. You can set the ordering of those labels (by specifying the "levels" argument in factor()) so that, for example, yes and no can be associated with (numeric) factor levels 1 and 2 respectively instead of the default ordering which would put 'no' alphabetically before 'yes'. (I find this choice particularly useful for orderings like "high", "medium", "low" for which the alphabetic ordering is not exactly intuitive; similarly alphabetic ordering puts '1', '2', '10' in the order '1', '10', '2' and so on, so that often needs specifying manually. It's also useful to specify levels if you want things like boxplots to come out in a particular order, as boxplots by default use the order of the factor levels).
The internal integer values are returned by 'as numeric'. If your factor level labels - which are always character - are also interpretable as numbers, you need 'as.character' to return the character strings and then 'as.numeric' to convert those.
Now, up to this point you just have more or less arbitrary integers asociated with the original factor levels (the degree of arbitrariness depends on whether you specified the level order or let R use its default). These integers are not the contrasts used in model fitting. Contrasts are set at model matrix building time; they are not a fixed attribute of the factor. The internal numbering of levels affects contrasts only to the extent that the numerical values used in setting contrasts are usually in the same order as the factor levels. You can inspect the functions used to associate contrasts with factor levels by using options("contrasts"). You can inspect the numerical values that would currently be used for a given factor with a call to contrasts(). You can change the contrast asignments globally using options() or explicitly in some model calls (lm, for example, has a contrasts argument) and if you like you can write your own contrast functions to set any values you like. The most common are probably treatment contrasts, which set the first factor level as intercept and the rest as (unit) differences from that, and sum to zero contrasts which do what they say, setting contrasts that sum to zero by choosing a set like (-1, 0, 1).
So you actually have a great deal of control over both the order in which labels are associated with factor levels and the (separate) values of contrasts associated with those factor levels at modelling time.
The cost of that control is some complexity, and the time needed to learn what's going on to use it all properly.
Hope that helps ...
S Ellison
*******************************************************************
This email and any attachments are confidential. Any use...{{dropped:8}}
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130124/927a6886/attachment.pl>