Skip to content

Consistant test for NAs in a factor when exclude = NULL?

6 messages · Andrew Hoerner, Jeff Newmiller, David Winsemius +1 more

#
Dear folks?

Is there a function to correctly find (and count) the NAs in a factor when
exclude=NULL, regardless of whether their origin is in the original data or
by subsequent assignment?

In example number 1 below, where NAs are assigned by is.na()<-, testing the
factor with is.na() finds the correct number of NAs.  In example number 2,
where the NAs are from the data, neither is.na(), ==NA, nor =="NA" correctly
identifies the NAs.  In example number 3, which mixes NAs from assignment
with NAs from data, is.na does not even find the NAs created by assignment,
as it did in example 1.

I'm running R 2.13.2 on Windows XP with ServicePack 3

Any assistance would be greatly appreciated.

Appreciatively, andrewH


Example #1
[1] C C
Levels: A B C
[1] A    A    B    B    <NA> <NA>
Levels: A B C
[1] "A" "B" "C"
[1] "A" "A" "B" "B" NA  NA
[1] NA NA NA NA NA NA
[1] NA
[1] FALSE FALSE FALSE FALSE    NA    NA
[1] NA
[1] FALSE FALSE FALSE FALSE  TRUE  TRUE
[1] 2

Example #2
[1] A    A    B    B    <NA> <NA>
Levels: A B <NA>
[1] "A" "B" NA
[1] "A" "A" "B" "B" NA  NA
[1] NA NA NA NA NA NA
[1] NA
[1] FALSE FALSE FALSE FALSE FALSE FALSE
[1] 0
[1] FALSE FALSE FALSE FALSE FALSE FALSE
Example #3.
[1] A    A    B    B    <NA> <NA> <NA>
Levels: A B C <NA>
[1] "A" "B" "C" NA
[1] "A" "A" "B" "B" NA  NA  NA
[1] NA NA NA NA NA NA NA
[1] NA
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[1] 0
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[1] 0

--
View this message in context: http://r.789695.n4.nabble.com/Consistant-test-for-NAs-in-a-factor-when-exclude-NULL-tp3942755p3942755.html
Sent from the R help mailing list archive at Nabble.com.
#
Thanks Jeff! I appreciate you sharing your experience.

My data set is survey data, 13,209 records over nine years, collected by
someone else, converted from SPSS format. It includes missing values,
identified however SPSS does so, and translated to NAs by the import
process. It also includes values along the lines of "none of your business"
or "beats me" that are missing so far as I am concerned. I have assigned NAs
to these values.  Now I am trying to figure out some things about where
these missing values are -- whether they are disproportionately located in
any period or group.  I have been trying to get counts for subsets, but I
have not been able to make the subset counts add up to the total counts that
I get from, e.g. summary.  

So I wrote these simplified versions, and even for the simplest examples, I
could not find a function that correctly identified the NAs that I knew were
there because I put them there myself. That is why I am looking for help.
Does this make sense?

Warmest regards, andrewH


--
View this message in context: http://r.789695.n4.nabble.com/Consistant-test-for-NAs-in-a-factor-when-exclude-NULL-tp3942755p3943157.html
Sent from the R help mailing list archive at Nabble.com.
#
On Oct 27, 2011, at 12:21 AM, andrewH wrote:

            
You might consider looking at the Hmisc package. I think it provides  
facilities for multiple missing attributes imported from SAS datasets.  
The help page to consult is sas.get {Hmisc},  I see no indication that  
a direct spss read facility was contmeplated, so it may take some  
extra work to get use out of this application of R attributes to store  
type-of-missingness-information in sequence with R NA's.
#
Note that for factors with NA in the levels, is.na(f)[2] <- TRUE
and is.na(f[2])<-TRUE give different results:

  > f <- factor(c("A","A",NA), levels=c(NA, "A"), exclude=NULL)
  > str(f)
   Factor w/ 2 levels NA,"A": 2 2 1
  > is.na(f)
  [1] FALSE FALSE FALSE

  > is.na(f[2]) <- TRUE
  > str(f)
   Factor w/ 2 levels NA,"A": 2 1 1
  > is.na(f)
  [1] FALSE FALSE FALSE

  > is.na(f)[2] <- TRUE
  > str(f)
   Factor w/ 2 levels NA,"A": 2 NA 1
  > is.na(f)
  [1] FALSE  TRUE FALSE

  > f[2] <- NA
  > str(f)
   Factor w/ 2 levels NA,"A": 2 1 1
  > is.na(f)
  [1] FALSE FALSE FALSE

You may find it easiest to change the NA's to strings
with a different name.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com