Skip to content

Dealing with factors ???

5 messages · eric, Jeff Newmiller, Berend Hasselman +1 more

#
I have a data frame x that came from read.csv. It seemed to read in ok but
then I tried doing some plotting of the values and ran into difficulties. 
The plot command seems to be plotting factors instead of the values. How do
I get rid of these factors ? The plot command I use is : plot (x$dat, x$TX,
type='l'). I also tried  ...plot(x$dat, levels(x$TX), type='l) but got an
error :

What am I doing wrong here ?

Error in plot.window(...) : need finite 'ylim' values
In addition: Warning messages:
1: In xy.coords(x, y, xlabel, ylabel, log) : NAs introduced by coercion
2: In min(x) : no non-missing arguments to min; returning Inf
3: In max(x) : no non-missing arguments to max; returning -Inf

 head(x,4)
  Period         PA           NJ                 MD            TX                   
All        dat
1 200812  903,231   1,985,460   905,422   3,312,088   7,106,201  2008-12-31
2 200901  880,491   1,924,111   892,980   3,006,050   6,703,631  2009-01-31
3 200902  883,994   1,926,169   890,021   3,247,530   6,947,714  2009-03-03
4 200903  888,021   1,901,182   892,593   3,216,730   6,898,526  2009-03-31
'data.frame':	41 obs. of  7 variables:
 $ Period: int  200812 200901 200902 200903 200904 200905 200906 200907
200908 200909 ...
 $ PA  : Factor w/ 41 levels " 818,037 "," 823,191 ",..: 26 22 23 25 19 7 10
2 1 12 ...
 $ NJ   : Factor w/ 41 levels " 1,599,113 ",..: 31 28 29 27 22 19 20 17 14
16 ...
 $ MD   : Factor w/ 41 levels " 800,827 "," 807,154 ",..: 27 25 23 24 15 13
11 6 5 3 ...
 $ TX   : Factor w/ 41 levels " 2,472,690 ",..: 41 23 40 39 35 34 32 21 18
27 ...
 $ All   : Factor w/ 41 levels " 6,111,993 ",..: 40 27 38 36 25 21 19 13 11
16 ...
 $ dat   :Class 'Date'  num [1:41] 14244 14275 14306 14334 14365 ...





--
View this message in context: http://r.789695.n4.nabble.com/Dealing-with-factors-tp4649686.html
Sent from the R help mailing list archive at Nabble.com.
#
The table is much bigger than what was shown. I just displayed a few rows.
Seems like there should be a better way that the approach you are proposing.
What is also not clear to me is why the factors are coming at all. I do a
read.csv on a table full of numbers from excel and I'm seeing factors
everywhere.



--
View this message in context: http://r.789695.n4.nabble.com/Dealing-with-factors-tp4649686p4649689.html
Sent from the R help mailing list archive at Nabble.com.
#
Your numeric data appears to have commas (thousands separators) in it. You don't say where you got the data, but Excel does this, and if this is the case then a straightforward way to fix it is to load it in Excel and set the formatting of all numeric columns to "general" before saving again.

You can also fix it in R using gsub to replace commas with empty strings and as.numeric to convert to numeric form.  There are examples of this in the mailing list archives.
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.
eric <ericstrom at aol.com> wrote:

            
#
On 16-11-2012, at 03:18, eric wrote:

            
?read.csv

Use the dec argument of read.csv and/or friends to set the decimal separator for the input numbers.

Berend
#
Hi

Please include context.

Your "numbers" are not numbers. They are strings in csv file e.g. "1,200,300" and are converted to factors during reading. 

First do not convert them to factors by stringsAsfactors=FALSE option in read.table.

If you are sure that all commas are thousands separators (in my country comma is used as decimal point) you can do

as.numeric(paste(unlist(strsplit("1,200,300", ",")), collapse=""))

or

tonum <- fumction (x) as.numeric(paste(unlist(strsplit(x, ",")), collapse=""))

tonum(some column of values)

Regards
Petr