Skip to content

set dataframe field value from lookup table

12 messages · Jon Erik Ween, David Winsemius, William Dunlap

#
Hi

This is (hopefully) a bit more cogent phrasing of a previous post. I'm
trying to compute a z-score to rows in a large dataframe based on values in
another dataframe. Here's the script (that does not work). 2 questons,

1) Anyone know of a more elegant way to calculate the "rounded" age value
than the nested ifelse's I've used?

2) how to reference the lookup table based on computed indices?

Thanks

Jon

# Define tables
DSTzlook <-
read.table("/Users/jween/Documents/ResearchProjects/ABC/data/DSTz.txt",
header=TRUE, sep="\t", na.strings="NA", dec=".", strip.white=TRUE)
df<-stroke

# Compute rounded age.
df$Agetmp
<-ifelse(df$Age>=89,89,ifelse(df$Age>=84,84,ifelse(df$Age>=79,79,ifelse(df$Age>=74,74,ifelse(df$Age>=69,69,ifelse(df$Age>=64,64,ifelse(df$Age>=54,54,ifelse(df$Age>=44,44,ifelse(df$Age>=34,34,ifelse(df$Age>=29,29,ifelse(df$Age>=24,24,ifelse(df$Age>=19,19,17))))))))))))

# Reference the lookup table based on computed indices
df$DSTz
<-DSTzlook[which(DSTzlook[,1]==df$Agetmp),which(DSTzlook[1,]==df$DSF+df$DSB)]

# Cleanup
#rm(df)
#df$Agetmp<-NULL
#
On Dec 9, 2010, at 9:34 AM, Jon Erik Ween wrote:

            
Ew, painful. If you want categorized ages (since what the above coding  
is producing is not "rounded" in any sense of that word as I  
understand it, then why not findInterval() as an index into the ages  
you wnat to label these case with?

df$Agetmp <- c(17,19,24,29,34,44,54,64,69,74,79,84)[  # note Extract  
operation
              findInterval(runif(100,0,100),  
c(17,19,24,29,34,44,54,64,69,74,79,84,110) )
             ]  # close extraction


The other option, of course, and a more "honest" one in this instance  
would be

cut(vec, breaks=c(...), labels=c(...) )

(It's not clear why you are not picking midpoint ages within those  
brackets to me.)
I have not been able to figure out what you are trying to do here.  
Trying to use a 2d lookup looks promising a a way to emulate what an  
Excel user might attempt, but an example (as requested in the message  
at the bottom of every posting) would really be of great help in  
making this more concrete for those of us with insufficient  
abstractive abilities.
#
Thanks David

What I am trying to do is set up a script that assigns z-scores to a large dataframe (2500x300, but has Age in years and test scores as columns.) from a published table of age-corrected standard scores on this cognitive test.

1) The age intervals in the lookup table are given and not my choice.

2) Sorry I didn't post an example table, it looks something like this ("Age" is in the first row, standard scores in the first column):

        17   19   24   29   34   44 ....
30   2.6  2.6  2.6  2.6  2.6  2.6
29  1.8  1.8  1.8  2.0  2.6  2.6
28  1.0  1.0  1.8  1.8  2.6  2.6
27  0.0  0.5  1.0  1.8  2.6  2.6
26   -.5  0.0  0.0  1.0  1.8  2.6
.
.
.
.

So, if a subject (row) has age==29 and a standard score of 28, the value should be 1.8, etc.

Thanks


Jon

Soli Deo Gloria

Jon Erik Ween, MD, MS
Scientist, Kunin-Lunenfeld Applied Research Unit 
Director, Stroke Clinic, Brain Health Clinic, Baycrest Centre
Assistant Professor, Dept. of Medicine, Div. of Neurology
    University of Toronto Faculty of Medicine

Kimel Family Building, 6th Floor, Room 644 
Baycrest Centre
3560 Bathurst Street 
Toronto, Ontario M6A 2E1
Canada 

Phone: 416-785-2500 x3648
Fax: 416-785-2484
Email: jween at klaru-baycrest.on.ca


Confidential: This communication and any attachment(s) may contain confidential or privileged information and is intended solely for the address(es) or the entity representing the recipient(s). If you have received this information in error, you are hereby advised to destroy the document and any attachment(s), make no copies of same and inform the sender immediately of the error. Any unauthorized use or disclosure of this information is strictly prohibited.
On 2010-12-09, at 10:33 AM, David Winsemius wrote:

            
#
On Dec 9, 2010, at 10:33 AM, David Winsemius wrote:

            
c(17, 17, 19, 24, 29, 34, 44, 54, 64, 69, 74, 79, 84, 89)[
        findInterval( 19, c(0, 17, 19, 24, 29, 34, 44, 54, 64, 69, 74,  
79, 84, 89) ) ]

I see my pasting of a test vector didn't get properly replaced. Also  
added the missing "89" and fixed the mis-registration of "label  
vector" with interval vector. (This reply was better tested than the  
first, but you clearly need to check my work here.)
#
On Dec 9, 2010, at 10:51 AM, Jon Erik Ween wrote:

            
You may want to skip the intermediate translation to the row and  
column labels and just use the results of findInterval:

 >  findInterval( 16, c(0, 17, 19, 24, 29, 34, 44, 54, 64, 69, 74, 79,  
84, 89) )
[1] 1
 >  findInterval( 90, c(0, 17, 19, 24, 29, 34, 44, 54, 64, 69, 74, 79,  
84, 89) )
[1] 14

Those look like appropriate indices for the column argument
Looks like a job for two findInterval indices to be used used with  
"[ r , c ] ".
#
David

I see how findInterval is a more elegant way of doing 1). I'd need to change the indices in the lookup table, as
[1] 6

should be 7, not 6. The age range for the 7th column 35-44. But that's easy.

I can't see how findInterval will help me for 2), though. The standard score is integer and not a range. So it maps 1 to 1. The real problem, though, is setting the value in the main dataframe (df) with the value from the lookup table based on the identified age and score indices.

My initial guess was: 

df$DSTz <-DSTzlook[which(DSTzlook[,1]==df$Agetmp),which(DSTzlook[1,]==df$DSF+df$DSB)]

which could be rewritten:

df$DSTz <-DSTzlook[which(DSTzlook[,1]== findInterval(df$Age, c(0, 17, 19, 24, 29, 34, 44, 54, 64, 69, 74, 79, 84, 89))),which(DSTzlook[1,]==df$DSF+df$DSB)]

But it is the indirect referencing of the lookup in the main table that causes me trouble.

Jon

Soli Deo Gloria

Jon Erik Ween, MD, MS
Scientist, Kunin-Lunenfeld Applied Research Unit 
Director, Stroke Clinic, Brain Health Clinic, Baycrest Centre
Assistant Professor, Dept. of Medicine, Div. of Neurology
    University of Toronto Faculty of Medicine

Kimel Family Building, 6th Floor, Room 644 
Baycrest Centre
3560 Bathurst Street 
Toronto, Ontario M6A 2E1
Canada 

Phone: 416-785-2500 x3648
Fax: 416-785-2484
Email: jween at klaru-baycrest.on.ca


Confidential: This communication and any attachment(s) may contain confidential or privileged information and is intended solely for the address(es) or the entity representing the recipient(s). If you have received this information in error, you are hereby advised to destroy the document and any attachment(s), make no copies of same and inform the sender immediately of the error. Any unauthorized use or disclosure of this information is strictly prohibited.
On 2010-12-09, at 11:06 AM, David Winsemius wrote:

            
#
Sorry, I should have included the error I get when using the initial vesion of step 2):

Error in `$<-.data.frame`(`*tmp*`, "DSTz", value = list(Age7 = c(-1.55,  : 
  replacement has 20 rows, data has 955
In addition: Warning message:
In DSTzlook[, 1] == df$DSF + df$DSB :
  longer object length is not a multiple of shorter object length

So, regardless of how you calculate [r,c], the step

df$DSTz<-DSTzlook[r,c]

doesn't work. I've tried various permutations with "apply", but that didn't work either. Any suggestions?

Jon

Soli Deo Gloria

Jon Erik Ween, MD, MS
Scientist, Kunin-Lunenfeld Applied Research Unit 
Director, Stroke Clinic, Brain Health Clinic, Baycrest Centre
Assistant Professor, Dept. of Medicine, Div. of Neurology
    University of Toronto Faculty of Medicine

Kimel Family Building, 6th Floor, Room 644 
Baycrest Centre
3560 Bathurst Street 
Toronto, Ontario M6A 2E1
Canada 

Phone: 416-785-2500 x3648
Fax: 416-785-2484
Email: jween at klaru-baycrest.on.ca


Confidential: This communication and any attachment(s) may contain confidential or privileged information and is intended solely for the address(es) or the entity representing the recipient(s). If you have received this information in error, you are hereby advised to destroy the document and any attachment(s), make no copies of same and inform the sender immediately of the error. Any unauthorized use or disclosure of this information is strictly prohibited.
On 2010-12-09, at 11:06 AM, David Winsemius wrote:

            
#
On Dec 9, 2010, at 11:27 AM, Jon Erik Ween wrote:

            
Possible that mapply would offer a mechanism. The devil is in the  
details.
I have yet to see a fully responsive reply to the request for a  
reproducible example. That is my suggestion at this point.
#
Offlist comments  No reply needed .. This is just for emphasis and  
clarification.
On Dec 9, 2010, at 11:19 AM, Jon Erik Ween wrote:

            
As you say, changing the output to agree with your expectations is  
"easy" but to be clear, R _is_ delivering the correct response to the  
question "which interval is 36 located in ... the 6th. Any ambiguity  
is due to your not formulating a good question (and my errors).
"2)" was never very clear. I do think findInterval must be what is  
needed, but I am repeating my call for you to post a full example and  
more complete explanation to the list.
Which is _not_ how statisticians usually think of a "z score". So it  
may need some further background or use of less misleading  
terminology. You are probably tasked with using a table handed to you  
that at one time was a "z-score" for <something> but has been recast  
in tabular form.
#
You probably want to use [cbind(r,c)], where r
and c are vectors of row and column numbers.

Supplying an example that helpers could copy and
paste into an R session would really help.  E.g.,
instead of showing the usual printout of the table
of zscores, show the output of dput(thatTable) or
the command you used to build it.  Here is my
guess, given your printout

  ZScoreTable <- matrix(byrow=TRUE,
     c( 2.6, 2.6, 2.6, 2.6, 2.6, 2.6,
        1.8, 1.8, 1.8, 2.0, 2.6, 2.6,
        1.0, 1.0, 1.8, 1.8, 2.6, 2.6,
        0.0, 0.5, 1.0, 1.8, 2.6, 2.6,
        -.5, 0.0, 0.0, 1.0, 1.8, 2.6),
     nrow=5,
     ncol=6,
     dimnames = list(
        StdScore=c("30", "29", "28", "27", "26"),
        AgeClass=c("17", "19", "24", "29", "34", "44")
     )
  )

Your structure may be different, but given that
that is your table of that encodes the mapping of
the order pair (StdScore,AgeClass) to a z-score
here is some code to do the mapping:

  ZScore <- function(age, stdScore) {
     AgeToColumnNumber <- function(age,
        ageClassBottoms=as.numeric(colnames(ZScoreTable)))
     {
        retval <- findInterval(age, c(ageClassBottoms, Inf))
        retval[retval==0] <- NA
        retval
     }

     StdScoreToRowNumber <- function(stdScore,
        knownScores = as.numeric(rownames(ZScoreTable)))
     {
        match(stdScore, knownScores)
     }

     ZScoreTable[cbind(StdScoreToRowNumber(stdScore),
AgeToColumnNumber(age))]
  }
 
where a typical usage would be

  > ZScore(age=c(29,44, 10), stdScore=c(28,29,30))
  [1] 1.8 2.6  NA

(Age 10 is not in the table so it gets an NA for a z-score).

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
79,ifelse(df$Age>=74,74,ifelse(df$Age>=69,69,ifelse(df$Age>=>
64,64,ifelse(df$Age>=54,54,ifelse(df$Age>=44,44,ifelse(df$Age>
df$DSF+df$DSB)]
ookup-table-tp3080245p3080245.html
#
Thanks to David and William for helpful comments. I'm not sure if the list will accept attachments, but am trying with this. Z_example.txt is a pared-down sample of the target table in text format. What is needed is a z-score based on the DSF and DSB fields relative to the age field. The second table is the full z-score table (DSTz.txt), age group in first row and score (2*DSF + 2*DSB) in the first column.

Jon

Soli Deo Gloria

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: Z_example.txt
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20101209/b23a3b23/attachment.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: DSTz.txt
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20101209/b23a3b23/attachment-0001.txt>
-------------- next part --------------
On 2010-12-09, at 12:53 PM, William Dunlap wrote:

            
#
Just for clarification: This all has to do with the Wechsler Memory Scale revised (WMS-R). The task is digit span forward and backwards (digit span total). A raw score is recorded based on subject performance and translated into an age-adjusted standard score in one table and the standard score is then translated into a z-score in a second table. My DSTz table combines these two steps, going from the raw score to the tabulated age-adjusted z-score that was calculated based on a normative sample by the WMS-R folks.

Jon

Soli Deo Gloria

Jon Erik Ween, MD, MS
Scientist, Kunin-Lunenfeld Applied Research Unit 
Director, Stroke Clinic, Brain Health Clinic, Baycrest Centre
Assistant Professor, Dept. of Medicine, Div. of Neurology
    University of Toronto Faculty of Medicine

Kimel Family Building, 6th Floor, Room 644 
Baycrest Centre
3560 Bathurst Street 
Toronto, Ontario M6A 2E1
Canada 

Phone: 416-785-2500 x3648
Fax: 416-785-2484
Email: jween at klaru-baycrest.on.ca


Confidential: This communication and any attachment(s) may contain confidential or privileged information and is intended solely for the address(es) or the entity representing the recipient(s). If you have received this information in error, you are hereby advised to destroy the document and any attachment(s), make no copies of same and inform the sender immediately of the error. Any unauthorized use or disclosure of this information is strictly prohibited.
On 2010-12-09, at 12:53 PM, William Dunlap wrote: