I tried to do this all at once and failed:
Data1 Data2 Data3 Weight
101 Sam Red Banana 1.1
102 Sam Green Banana 2.1
103 Sam Blue Orange 2.1
104 Fred Red Orange 2.1
105 Fred Green Guava 2.1
106 Fred Blue Guava 2.1
107 <NA> Red Pear 50.1
108 <NA> Green Pear 50.1
109 <NA> Blue <NA> 1000.2
ToyData <- factor(ToyData, levels(c(levels(ToyData), NA), exclude=NULL, na.action=na.pass))
Error in levels(c(levels(ToyData), NA), exclude = NULL, na.action = na.pass) :
unused argument(s) (exclude = NULL, na.action = function (object, ...)
ToyData <- factor(ToyData, levels(c(levels(ToyData), NA)))
ToyData
Data1 Data2 Data3 Weight
<NA> <NA> <NA> <NA>
Levels:
But it didn't work. Don't I need to do this separately for each variable?
Is there a way to get read.spss to insert "NA" levels for each variable when I create the data frame? Is this because SPSS (and STATA) allow "NA" as an "undeclared level" and R does not?
Will this be a problem with read.dta as well?
Robert Farley
Metro
www.Metro.net
-----Original Message-----
From: William Dunlap [mailto:wdunlap at tibco.com]
Sent: Thursday, May 28, 2009 20:39
To: Farley, Robert
Subject: RE: [R] Still can't find missing data
In R factors don't save space over character vectors - only
one copy of any given string is kept in memory in either case.
Factors do let you order the levels in the way you want and
that is often important in presentations.
You can add NA to the list of levels of a factor by doing
x <- factor(x, levels(c(levels(x), NA), exclude=NULL)
where 'x' represents each factor in your dataset. After
doing that is.na(x) will be all FALSE and you may not
want that for other situations.
Bill Dunlap
TIBCO Software Inc - Spotfire Division
wdunlap tibco.com
-----Original Message-----
From: r-help-bounces at r-project.org
[mailto:r-help-bounces at r-project.org] On Behalf Of Farley, Robert
Sent: Thursday, May 28, 2009 5:27 PM
To: R-help
Subject: Re: [R] Still can't find missing data
That seems to work for the toy data. How do I implement this
change with my real data, which are read from very large
Stata and SPSS files and keep the factor definitions? Won't
I be losing information (and creating a larger dataset) by
not using the factor levels?
How do I recover the factor values? I read my datafile
(read.spss using use.value.labels = FALSE,) and got this:
connector
Mode_orig_only 1 9
1 17.814338 0.000000
3 49.128982 0.000000
4 525.978899 0.000000
5 913.295370 0.000000
6 114.302764 0.000000
7 298.151438 0.000000
8 93.088049 0.000000
9 233.794168 0.000000
10 20.764539 0.000000
11 424.120506 0.000000
12 8.054528 0.000000
13 6.010790 0.000000
14 1832.748525 0.000000
15 10191.284139 0.000000
16 2099.771923 0.000000
17 1630.148576 0.000000
<NA> 0.000000 9491.013249
which does have the "NA" row, but not the factor labels. If
I read the file with use.value.labels=TRUE I can see what I'm
summarizing, but not the NAs. Can't I have both?
The top summary will also omit all 0 value factors (of
course) in the variable summarized.
The same summary using factors:
connector
Mode_orig_only
OD Passenger Connector
Walked/Biked
17.814338 0.000000
I flew in from another a place/connected
0.000000 0.000000
Amtrak
49.128982 0.000000
Bus - Chartered bus or van
525.978899 0.000000
Bus - Hotel Courtesy van
913.295370 0.000000
Bus - MTA (Metro) or other public transit bus
114.302764 0.000000
Bus - Scheduled airport bus or van (e.g. Airport bus or
Disn 298.151438 0.000000
Bus - Union Station Flyaway
93.088049 0.000000
Bus - Van Nuys Flyaway
233.794168 0.000000
Green line/light rail
20.764539 0.000000
Limousine/town car
424.120506 0.000000
Metrolink
8.054528 0.000000
Motorcycle
6.010790 0.000000
On-call shuttle/van (e.g. Super Shuttle, Prime Time)
1832.748525 0.000000
Car/truck/van - Private
10191.284139 0.000000
Car/truck/van - Rental
2099.771923 0.000000
Taxi
1630.148576 0.000000
..Refused
0.000000 0.000000
Robert Farley
Metro
www.Metro.net
-----Original Message-----
From: William Dunlap [mailto:wdunlap at tibco.com]
Sent: Thursday, May 28, 2009 16:26
To: Farley, Robert
Subject: RE: [R] Still can't find missing data
Try reading it in with read.table's argument stringsAsFactors=FALSE.
I think the underlying problem is that exclude= is used only if
the classifying variables are not already factors. I haven't studied
the help file well enough to see if that is what is is documented
to do, but it seems misleading.
Bill Dunlap
TIBCO Software Inc - Spotfire Division
wdunlap tibco.com
-----Original Message-----
From: r-help-bounces at r-project.org
[mailto:r-help-bounces at r-project.org] On Behalf Of Farley, Robert
Sent: Thursday, May 28, 2009 4:10 PM
To: R-help
Subject: Re: [R] Still can't find missing data
In this toy data, each of the tables should sum to 1111
None of the tables shows NA columns or rows.
################################
ToyData <- read.table("C:/Data/R/Toy.csv", header=TRUE,
sep=",", na.strings="NA", dec=".", row.names="ID_Num")
Data1 Data2 Data3 Weight
101 Sam Red Banana 1
102 Sam Green Banana 2
103 Sam Blue Orange 2
104 Fred Red Orange 2
105 Fred Green Guava 2
106 Fred Blue Guava 2
107 <NA> Red Pear 50
108 <NA> Green Pear 50
109 <NA> Blue <NA> 1000
xtabs(Weight ~ Data1 + Data2, exclude=NULL,
na.action=na.pass, ToyData)
Data2
Data1 Blue Green Red
Fred 2 2 2
Sam 2 2 1
xtabs(Weight ~ Data1 + Data2, exclude=NULL,
na.action=na.pass,drop.unused.levels = FALSE, ToyData)
Data2
Data1 Blue Green Red
Fred 2 2 2
Sam 2 2 1
xtabs(Weight ~ Data1 + Data3, exclude=NULL,
na.action=na.pass,drop.unused.levels = FALSE, ToyData)
Data3
Data1 Banana Guava Orange Pear
Fred 0 4 2 0
Sam 3 0 2 0
Robert Farley
Metro
www.Metro.net
-----Original Message-----
From: r-help-bounces at r-project.org
[mailto:r-help-bounces at r-project.org] On Behalf Of Dieter Menne
Sent: Thursday, May 28, 2009 05:46
To: r-help at r-project.org
Subject: Re: [R] Still can't find missing data
Farley, Robert wrote:
I can't get the syntax that will allow me to show NA values
xtabs.
lengthy non-reproducible example removed