Regression stars
On 02/12/2013 08:20 AM, peter dalgaard wrote:
On Feb 12, 2013, at 17:05 , Brian Lee Yung Rowe wrote:
I thought that the default was the way it was for performance reasons. For large data.frames or repeated applications, using factors should be faster for non-trivial strings.
I think not. Historically, it's more like "In statistics we have two kinds of variables, numerical and categorical. OK, so we have the occasional truly character-type variables like name and address, let's handle those as a special case".
<sarcasm> Since character vectors are sooooo bad and people use them where they should instead use a factor, I propose to go all the way and by adding the stringsAsFactors arg to character() too. That way people are put on the right track from the very start. </sarcasm> No seriously, if my variable is categorical, it's already in a factor and that's how I pass it to data.frame(). But if I have it in a character vector, it's because that's how I want it. It's my choice. How could anybody ever think that having data.frame() alter his/her data is a good thing? Please *remove* the stringsAsFactors arg of data.frame() in R 3.0. You'll do a big favor to your user base. Thanks, H.
Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319