Still can't find missing data
That seems to work for the toy data. How do I implement this change with my real data, which are read from very large Stata and SPSS files and keep the factor definitions? Won't I be losing information (and creating a larger dataset) by not using the factor levels?
How do I recover the factor values? I read my datafile (read.spss using use.value.labels = FALSE,) and got this:
connector
Mode_orig_only 1 9
1 17.814338 0.000000
3 49.128982 0.000000
4 525.978899 0.000000
5 913.295370 0.000000
6 114.302764 0.000000
7 298.151438 0.000000
8 93.088049 0.000000
9 233.794168 0.000000
10 20.764539 0.000000
11 424.120506 0.000000
12 8.054528 0.000000
13 6.010790 0.000000
14 1832.748525 0.000000
15 10191.284139 0.000000
16 2099.771923 0.000000
17 1630.148576 0.000000
<NA> 0.000000 9491.013249
which does have the "NA" row, but not the factor labels. If I read the file with use.value.labels=TRUE I can see what I'm summarizing, but not the NAs. Can't I have both?
The top summary will also omit all 0 value factors (of course) in the variable summarized.
The same summary using factors:
connector
Mode_orig_only OD Passenger Connector
Walked/Biked 17.814338 0.000000
I flew in from another a place/connected 0.000000 0.000000
Amtrak 49.128982 0.000000
Bus - Chartered bus or van 525.978899 0.000000
Bus - Hotel Courtesy van 913.295370 0.000000
Bus - MTA (Metro) or other public transit bus 114.302764 0.000000
Bus - Scheduled airport bus or van (e.g. Airport bus or Disn 298.151438 0.000000
Bus - Union Station Flyaway 93.088049 0.000000
Bus - Van Nuys Flyaway 233.794168 0.000000
Green line/light rail 20.764539 0.000000
Limousine/town car 424.120506 0.000000
Metrolink 8.054528 0.000000
Motorcycle 6.010790 0.000000
On-call shuttle/van (e.g. Super Shuttle, Prime Time) 1832.748525 0.000000
Car/truck/van - Private 10191.284139 0.000000
Car/truck/van - Rental 2099.771923 0.000000
Taxi 1630.148576 0.000000
..Refused 0.000000 0.000000
Robert Farley
Metro
www.Metro.net
-----Original Message-----
From: William Dunlap [mailto:wdunlap at tibco.com]
Sent: Thursday, May 28, 2009 16:26
To: Farley, Robert
Subject: RE: [R] Still can't find missing data
Try reading it in with read.table's argument stringsAsFactors=FALSE.
I think the underlying problem is that exclude= is used only if
the classifying variables are not already factors. I haven't studied
the help file well enough to see if that is what is is documented
to do, but it seems misleading.
Bill Dunlap
TIBCO Software Inc - Spotfire Division
wdunlap tibco.com
-----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Farley, Robert Sent: Thursday, May 28, 2009 4:10 PM To: R-help Subject: Re: [R] Still can't find missing data In this toy data, each of the tables should sum to 1111 None of the tables shows NA columns or rows.
################################
ToyData <- read.table("C:/Data/R/Toy.csv", header=TRUE,
sep=",", na.strings="NA", dec=".", row.names="ID_Num")
ToyData
Data1 Data2 Data3 Weight 101 Sam Red Banana 1 102 Sam Green Banana 2 103 Sam Blue Orange 2 104 Fred Red Orange 2 105 Fred Green Guava 2 106 Fred Blue Guava 2 107 <NA> Red Pear 50 108 <NA> Green Pear 50 109 <NA> Blue <NA> 1000
xtabs(Weight ~ Data1 + Data2, exclude=NULL,
na.action=na.pass, ToyData)
Data2
Data1 Blue Green Red
Fred 2 2 2
Sam 2 2 1
xtabs(Weight ~ Data1 + Data2, exclude=NULL,
na.action=na.pass,drop.unused.levels = FALSE, ToyData)
Data2
Data1 Blue Green Red
Fred 2 2 2
Sam 2 2 1
xtabs(Weight ~ Data1 + Data3, exclude=NULL,
na.action=na.pass,drop.unused.levels = FALSE, ToyData)
Data3
Data1 Banana Guava Orange Pear
Fred 0 4 2 0
Sam 3 0 2 0
Robert Farley Metro www.Metro.net -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Dieter Menne Sent: Thursday, May 28, 2009 05:46 To: r-help at r-project.org Subject: Re: [R] Still can't find missing data Farley, Robert wrote:
I can't get the syntax that will allow me to show NA values
(rows) in the
xtabs. lengthy non-reproducible example removed
If you want a reproducible answer, prepare a reproducible result. And check that the syntax is na.action=na.pass Dieter -- View this message in context: http://www.nabble.com/Still-can%27t-find-missing-data-tp237306 27p23761006.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.