Skip to content

Advice on recoding a variable depending on another which contains NAs

4 messages · Anthony Staines, David Winsemius, Jeff Newmiller

#
Dear colleagues,

I would be very grateful for your help with the following. I 
have banged my head off this question several times in the 
past, and repeatedly over the last week. I have looked in 
the usual places and found no obvious solution. I fear that 
this just means I didn't recognize it, but I'd be very 
grateful for your help.

I am scoring 8000 psychometric tests - the SCQ, if you have 
heard of it. On this test the scoring rules depends on one 
variable SCQ1 - if this is answered yes, the final score is 
a function of 39 variables, and if no, of 31 variables.

I've calculated both of these scores (SCQScore1 and 
SCQScore2)for all the children in my study, and I wish to 
create a final score, which is SCQScore1 when SCQ1 is 1, and 
SCQScore2 when SCQ1 is 2. There are also missing values for 
SCQ1, and I have chosen, for the moment, to set the final 
score to SCQScore1 for these. [[This is a debatable choice, 
but I am not asking your advice on that choice!]]

d$SCQScore <- 99
	##Distinct value for any other values I've missed

d$SCQScore[SCQ1 == 1] <- d$SCQScore1[SCQ1 == 1]
	## Talks using phrases/sentences, so sum S2CQ:SCQ40

d$SCQScore[SCQ1 == 2] <- d$SCQScore2[SCQ1 == 2]
	## Can't do this, so sum SCQ8:SCQ40

d$SCQScore[is.na(d$SCQ1)] <- d$SCQScore1 [is.na(d$SCQ1)]
	## SCQ1 is missing

This fails on line 2
(d$SCQScore[SCQ1 == 1] <- d$SCQScore1[SCQ1 == 1])
  with the error message
"NAs are not allowed in subscripted assignments",
presumably because SCQ1 does indeed contain missing values.

This can be fixed, got around, or otherwise bypassed, by 
creating a new variable SCQ1, with no missing values, as 
shown :-

SCQ1 <- d$SCQ1
SCQ1[is.na(SCQ1)] <- 3

d$SCQScore[SCQ1 == 1] <- d$SCQScore1[SCQ1 == 1]
	## Talks using phrases/sentences so sum S2CQ:SCQ40
d$SCQScore[SCQ1 == 2] <- d$SCQScore2[SCQ1 == 2]
	## Can't do this, so sum SCQ8:SCQ40
d$SCQScore[SCQ1 == 3] <- d$SCQScore1[SCQ1 == 3]
	## We don't know if he/she can talk, so guess - sum S2:S40

This type of thing is a common problem in my little world. 
Is there a better/less klutzy/smarter way of solving it than 
creating a new variable each time? Please bear in mind that 
it is critical, for later analysis, to keep the missing 
values in SCQ1.

Best wishes,
Anthony Staines
#
On Nov 19, 2011, at 6:31 PM, Anthony Staines wrote:

            
This would seem to be an obvious task for ifelse()

SCQScore <- NA
d$SCQScore <- ifelse( SCQ1 == 1, d$SCQScore1, d$SCOScore2)

(And don't use 99 for missing. Use NA. It will protect you better than  
"99".)


I suppose you could enforce the two level testing with:

d$SCQScore <- ifelse( SCQ1 == 1, d$SCQScore1,
                               ifelse(SCQ1 ==2,  d$SCOScore2, NA))
David Winsemius, MD
West Hartford, CT
#
Ah,
so that's how ifelse gets used...

Presumably if I had more than 2 non-missing values in the 
control variable, I could use it several times.

Thank you very much, for a really useful answer, and thanks 
for getting back so quickly!

All the best,
Anthony Staines
On 11/19/11 23:55, David Winsemius wrote:

  
    
#
No, you use it once and it does the whole vector at once.
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.
Anthony Staines <anthony.staines at dcu.ie> wrote: