Skip to content
Prev 181776 / 398502 Next

Still can't find missing data

That seems to work for the toy data.  How do I implement this change with my real data, which are read from very large Stata and SPSS files and keep the factor definitions?  Won't I be losing information (and creating a larger dataset) by not using the factor levels?


How do I recover the factor values?  I read my datafile (read.spss using   use.value.labels = FALSE,) and got this:

              connector
Mode_orig_only            1            9
          1       17.814338     0.000000
          3       49.128982     0.000000
          4      525.978899     0.000000
          5      913.295370     0.000000
          6      114.302764     0.000000
          7      298.151438     0.000000
          8       93.088049     0.000000
          9      233.794168     0.000000
          10      20.764539     0.000000
          11     424.120506     0.000000
          12       8.054528     0.000000
          13       6.010790     0.000000
          14    1832.748525     0.000000
          15   10191.284139     0.000000
          16    2099.771923     0.000000
          17    1630.148576     0.000000
          <NA>     0.000000  9491.013249

which does have the "NA" row, but not the factor labels.  If I read the file with use.value.labels=TRUE I can see what I'm summarizing, but not the NAs.  Can't I have both?

The top summary will also omit all 0 value factors (of course) in the variable summarized.


The same summary using factors:
                                                             connector

Mode_orig_only                                                 OD Passenger    Connector

  Walked/Biked                                                    17.814338     0.000000

   I flew in from another a place/connected                        0.000000     0.000000

  Amtrak                                                          49.128982     0.000000

  Bus - Chartered bus or van                                     525.978899     0.000000

  Bus - Hotel Courtesy van                                       913.295370     0.000000

  Bus - MTA (Metro) or other public transit bus                  114.302764     0.000000

  Bus - Scheduled airport bus or van (e.g. Airport bus or Disn   298.151438     0.000000

  Bus - Union Station Flyaway                                     93.088049     0.000000

  Bus - Van Nuys Flyaway                                         233.794168     0.000000

  Green line/light rail                                           20.764539     0.000000

  Limousine/town car                                             424.120506     0.000000

  Metrolink                                                        8.054528     0.000000

  Motorcycle                                                       6.010790     0.000000

  On-call shuttle/van (e.g. Super Shuttle, Prime Time)          1832.748525     0.000000

  Car/truck/van - Private                                      10191.284139     0.000000

  Car/truck/van - Rental                                        2099.771923     0.000000

  Taxi                                                          1630.148576     0.000000

  ..Refused                                                        0.000000     0.000000







Robert Farley
Metro
www.Metro.net


-----Original Message-----
From: William Dunlap [mailto:wdunlap at tibco.com]
Sent: Thursday, May 28, 2009 16:26
To: Farley, Robert
Subject: RE: [R] Still can't find missing data

Try reading it in with read.table's argument stringsAsFactors=FALSE.

I think the underlying problem is that exclude= is used only if
the classifying variables are not already factors.  I haven't studied
the help file well enough to see if that is what is is documented
to do, but it seems misleading.

Bill Dunlap
TIBCO Software Inc - Spotfire Division
wdunlap tibco.com
Message-ID: <8452CFD6AC58614FA9F87C8ADC2E418904CED8C743@exchange01.lacmta.net>
In-Reply-To: <77EB52C6DD32BA4D87471DCD70C8D700013DC127@NA-PA-VBE03.na.tibco.com>