Very confused with class
Hi Robin, You haven't said where you're getting the data from. But if the answer is that you're using read.table, read.csv or similar to read the data into R, then I advise you to go back to that stage and get it right from the outset. It's very, very common to see people who are relatively new to R splattering their code with calls to as.numeric, just because they haven't read the data in properly in the first place. It's also common in those who aren't new to R... So e.g. if you are using read.table, then use the colClasses argument to specify the classes of your columns, and use str() on the result until you're happy with the data frame produced. It's not entirely clear why you would have ended up with factors if your data are numeric. That often happens when people mix characters with numbers. Perhaps you have mixed the header row up with the data? Anyway, what you are seeing are the integer encodings of the factors. E.g.
f <- factor(11:20) str(f)
Factor w/ 10 levels "11","12","13",..: 1 2 3 4 5 6 7 8 9 10
as.numeric(f)
[1] 1 2 3 4 5 6 7 8 9 10 But don't mess with them. Just make sure that things which shouldn't be factors never become factors. Dan
On Thu, Aug 21, 2008 at 03:40:58PM +0100, Williams, Robin wrote:
Hi all, I am very confused with class. I am looking at some weather data which I want to use as explanatory variables in an lm. R has treated these variables as factors (i.e. with different levels), whereas I want them treated as discretely measured continuous variables. So I need to reassign the class of these variables, right? Indeed, doing class(southwest$pressure) (pressure being air pressure), I get #> factor. Now what class should I use to reassign them so that my model fitting process goes as I want it to? I have obviously done something wrong. I did southwest$pressure <- as(southwest$pressure,"numeric") numeric seeming like a reasonable class to assign to this variable. However, doing some summary stats like mean(southwest$pressure) #> 341, max(southwest$pressure) #> 761, which is clearly nonsense, as my maximum value is around 1040. Something similar has happened to maxtemp (maximum temperature), which I also reassigned from a factor to class numeric, which now apparently has a maximum value of 147! Clearly it must be the reassignment of class that has caused these problems, as summary stats on the data before I reassigned the classes were fine. What is wrong with the class numeric? Reading the numeric help page didn't reveal anything to me. Can someone suggest the correct class? Many thanks for any help. Robin Williams Met Office summer intern - Health Forecasting robin.williams at metoffice.gov.uk [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.