RFC: type conversion in read.table
Currently read.table is rather limited in its type conversion. The algorithm is 0) Read as character 1) Try to convert to numeric. If that works, quit 2) Convert to factor unless !as.is. I am thinking about adding more flexibility and more classes by the following two changes. A) Anticipating the arrival of classes for all R objects, add an argument say `colClasses' that allows the user to specify the desired class for every column. This could default to "auto", or NA if people think "auto" might be a relevant class name one day. The effect would be equivalent to running data[[i]] <- as(data[[i]], colClasses[i]) instead of data[[i]] <- type.convert(data[[i]], as.is = as.is[i], dec = dec) except that standard classes such as "numeric", "factor", "logical", "character" would be dispatched directly, and argument "dec" would be consulted where appropriate. colClasses = "character" would suppress all conversions, which cannot currently be done. B) Make the default "auto" option somewhat cleverer. I am thinking of trying the following in turn logical integer numeric complex factor (only if !as.is[i] for backwards compatibility). The `dec' option needs to be used for numeric/complex. This would be done by a documented typeConvert function, and should normally be fast (just look at the first item to rule out much of the list). This does mean that data frames would be much more likely to end up containing integer or logical variables (although they can now). I have already fixed model.frame/matrix to handle logical variables, and would need to check that they do handle integer variables. Questions: 1) Is this desirable? 2) Are the names sensible? 3) Is there any need to allow users to specify either the set of classes used by "auto" or lists of classes on a column-specific basis? 4) Currently the default is to get something without much information loss, and that would remain. My intention is that if a class is specified and conversion is not possible that the result would be (mainly?) NAs. Any problem with that? Brian
Brian D. Ripley, ripley@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._