R 1.2.1 - read.table - factors problem or is it a data.frame problem - R-help

Wed, Jan 31, 2001 1:11 AM #

Patrick Connolly refers to the read.table help manual page to show how to
coerce input columns to character or to numeric. Indeed coercion with a logical
vector will set the mode regardless of the column content. He also notes one
can set factors with factor(). 

However, the problem encountered is not one of setting factors but of unsetting
them. The manual states that variables of mode or type character will become
factors. My data input efforts showed no relationship between type and factor.
With no evident reason, most character variables did not become factors while
many real variables did. It is a bit disconcerting to get an output with
thousands of floating point factor levels or error messages that one's data are
of the wrong mode for any analysis whatsoever.

How does one unset mode assignment of factor and how does one avoid the problem
of automatic misassignment with other datasets?

Gordon

|> 
|> R-1.2.1 Suse 7.0 binary
|> 
|> > fooframe <- read.table("foo", header=FALSE, as.is=c(1:22,398),
|> col.names=foo.colheads)
|> 
|> cols 1-9 are alphabetic, 10-22 and 398 are numbers but unordered
categorical |>      23-375 are numeric with and without decimal points
|> 
|> As I read the description the "as.is" index numbers should force those
columns |> to be "character" and "factor". However only the 1-9 alpha
become "character" |> but they did not become "factor". Everything else
shows mode "numeric" but 

Here is your explanation:

   as.is: the default behavior of `read.table' is to convert
          non-numeric variables to factors.  The variable `as.is'
          controls this conversion.  Its value is either a vector of
          logicals (values are recycled if necessary), or a vector of
          numeric indices which specify which columns should be left as
          character strings.

Since your column 10, etc are not character, as.is will not have an
effect on them.  I think it is simple enough to convert numeric
columns into factors (as distinct from continuous variables) with
factor().


|> "is.factor" distributes TRUE to various variables in no pattern
discernible to |> me either in distribution or in the data content of the
columns. (I tried |> giving as.is a type vector but that just made
everything "numeric" with no |> pattern to factors.) No "as.is" parameter
still leaves the odd distribution of |> factors.
|> 
|> The main effects are that for some statistical functions on data
subsets, one |> is warned one cannot perform the operations on categorical
data while others |> stop for NA's. There are no NA's in the dataset!
Running "unique" on each |> variate and collecting outside the frame shows
adequate dispersion for analysis |> with no zero variances. "cor" will only
run "pairwise" though "complete.cases" |> finds no NA's.
|> 
|> What am I missing?

My guess is that something unplanned is happening when you try as.is
on numeric columns.

Gordon M. Harrington		Mail:	3720 Village Place, #6308
Professor Emeritus			Waterloo, IA 50702-5848
University of Northern Iowa 	Phone:	319-291-8535
gordon.harrington at uni.edu	Fax:	319-291-8491
dryfly at aya.yale.edu			319-291-8324

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Thomas Lumley

Wed, Jan 31, 2001 7:41 AM #

On Wed, 31 Jan 2001 gordon.harrington at uni.edu wrote:

You can convert a factor to the correct numeric values with
   as.numeric(as.character(the.factor))

We don't have enough information to tell what happened in your case but in
my experience the most common reason for a numeric variable to read as a
factor has been misspecifying the missing value codes in the na.strings
argument. This argument lists the strings that should be converted to NAs;
any other strings will trigger a conversion to factor.

	-thomas

Thomas Lumley			Asst. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._