-----Original Message-----
From: r-help-bounces at r-project.org
[mailto:r-help-bounces at r-project.org] On Behalf Of Jon Erik Ween
Sent: Thursday, December 09, 2010 8:27 AM
To: David Winsemius
Cc: r-help at r-project.org
Subject: Re: [R] set dataframe field value from lookup table
Sorry, I should have included the error I get when using the
initial vesion of step 2):
Error in `$<-.data.frame`(`*tmp*`, "DSTz", value = list(Age7
= c(-1.55, :
replacement has 20 rows, data has 955
In addition: Warning message:
In DSTzlook[, 1] == df$DSF + df$DSB :
longer object length is not a multiple of shorter object length
So, regardless of how you calculate [r,c], the step
df$DSTz<-DSTzlook[r,c]
You probably want to use [cbind(r,c)], where r
and c are vectors of row and column numbers.
Supplying an example that helpers could copy and
paste into an R session would really help. E.g.,
instead of showing the usual printout of the table
of zscores, show the output of dput(thatTable) or
the command you used to build it. Here is my
guess, given your printout
ZScoreTable <- matrix(byrow=TRUE,
c( 2.6, 2.6, 2.6, 2.6, 2.6, 2.6,
1.8, 1.8, 1.8, 2.0, 2.6, 2.6,
1.0, 1.0, 1.8, 1.8, 2.6, 2.6,
0.0, 0.5, 1.0, 1.8, 2.6, 2.6,
-.5, 0.0, 0.0, 1.0, 1.8, 2.6),
nrow=5,
ncol=6,
dimnames = list(
StdScore=c("30", "29", "28", "27", "26"),
AgeClass=c("17", "19", "24", "29", "34", "44")
)
)
Your structure may be different, but given that
that is your table of that encodes the mapping of
the order pair (StdScore,AgeClass) to a z-score
here is some code to do the mapping:
ZScore <- function(age, stdScore) {
AgeToColumnNumber <- function(age,
ageClassBottoms=as.numeric(colnames(ZScoreTable)))
{
retval <- findInterval(age, c(ageClassBottoms, Inf))
retval[retval==0] <- NA
retval
}
StdScoreToRowNumber <- function(stdScore,
knownScores = as.numeric(rownames(ZScoreTable)))
{
match(stdScore, knownScores)
}
ZScoreTable[cbind(StdScoreToRowNumber(stdScore),
AgeToColumnNumber(age))]
}
where a typical usage would be
ZScore(age=c(29,44, 10), stdScore=c(28,29,30))
[1] 1.8 2.6 NA
(Age 10 is not in the table so it gets an NA for a z-score).
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
doesn't work. I've tried various permutations with "apply",
but that didn't work either. Any suggestions?
Jon
Soli Deo Gloria
Jon Erik Ween, MD, MS
Scientist, Kunin-Lunenfeld Applied Research Unit
Director, Stroke Clinic, Brain Health Clinic, Baycrest Centre
Assistant Professor, Dept. of Medicine, Div. of Neurology
University of Toronto Faculty of Medicine
Kimel Family Building, 6th Floor, Room 644
Baycrest Centre
3560 Bathurst Street
Toronto, Ontario M6A 2E1
Canada
Phone: 416-785-2500 x3648
Fax: 416-785-2484
Email: jween at klaru-baycrest.on.ca
Confidential: This communication and any attachment(s) may
contain confidential or privileged information and is
intended solely for the address(es) or the entity
representing the recipient(s). If you have received this
information in error, you are hereby advised to destroy the
document and any attachment(s), make no copies of same and
inform the sender immediately of the error. Any unauthorized
use or disclosure of this information is strictly prohibited.
On 2010-12-09, at 11:06 AM, David Winsemius wrote:
On Dec 9, 2010, at 10:51 AM, Jon Erik Ween wrote:
Thanks David
What I am trying to do is set up a script that assigns
z-scores to a large dataframe (2500x300, but has Age in years
and test scores as columns.) from a published table of
age-corrected standard scores on this cognitive test.
1) The age intervals in the lookup table are given and not
You may want to skip the intermediate translation to the
row and column labels and just use the results of findInterval:
findInterval( 16, c(0, 17, 19, 24, 29, 34, 44, 54, 64,
findInterval( 90, c(0, 17, 19, 24, 29, 34, 44, 54, 64,
[1] 14
Those look like appropriate indices for the column argument
2) Sorry I didn't post an example table, it looks
something like this ("Age" is in the first row, standard
scores in the first column):
17 19 24 29 34 44 ....
30 2.6 2.6 2.6 2.6 2.6 2.6
29 1.8 1.8 1.8 2.0 2.6 2.6
28 1.0 1.0 1.8 1.8 2.6 2.6
27 0.0 0.5 1.0 1.8 2.6 2.6
26 -.5 0.0 0.0 1.0 1.8 2.6
.
.
.
.
So, if a subject (row) has age==29 and a standard score of
28, the value should be 1.8, etc.
Looks like a job for two findInterval indices to be used
Thanks
Jon
Soli Deo Gloria
Jon Erik Ween, MD, MS
Scientist, Kunin-Lunenfeld Applied Research Unit
Director, Stroke Clinic, Brain Health Clinic, Baycrest Centre
Assistant Professor, Dept. of Medicine, Div. of Neurology
University of Toronto Faculty of Medicine
Kimel Family Building, 6th Floor, Room 644
Baycrest Centre
3560 Bathurst Street
Toronto, Ontario M6A 2E1
Canada
Phone: 416-785-2500 x3648
Fax: 416-785-2484
Email: jween at klaru-baycrest.on.ca
Confidential: This communication and any attachment(s) may
contain confidential or privileged information and is
intended solely for the address(es) or the entity
representing the recipient(s). If you have received this
information in error, you are hereby advised to destroy the
document and any attachment(s), make no copies of same and
inform the sender immediately of the error. Any unauthorized
use or disclosure of this information is strictly prohibited.
On 2010-12-09, at 10:33 AM, David Winsemius wrote:
On Dec 9, 2010, at 9:34 AM, Jon Erik Ween wrote:
Hi
This is (hopefully) a bit more cogent phrasing of a
trying to compute a z-score to rows in a large dataframe
another dataframe. Here's the script (that does not
1) Anyone know of a more elegant way to calculate the
than the nested ifelse's I've used?
2) how to reference the lookup table based on computed indices?
Thanks
Jon
# Define tables
DSTzlook <-
read.table("/Users/jween/Documents/ResearchProjects/ABC/data/D
STz.txt",
header=TRUE, sep="\t", na.strings="NA", dec=".",
df<-stroke
# Compute rounded age.
df$Agetmp
<-ifelse(df$Age>=89,89,ifelse(df$Age>=84,84,ifelse(df$Age>=79,
79,ifelse(df$Age>=74,74,ifelse(df$Age>=69,69,ifelse(df$Age>=>
64,64,ifelse(df$Age>=54,54,ifelse(df$Age>=44,44,ifelse(df$Age>
=34,34,ifelse(df$Age>=29,29,ifelse(df$Age>=24,24,ifelse(df$Age
=19,19,17))))))))))))
Ew, painful. If you want categorized ages (since what the
above coding is producing is not "rounded" in any sense of
that word as I understand it, then why not findInterval() as
an index into the ages you wnat to label these case with?
df$Agetmp <- c(17,19,24,29,34,44,54,64,69,74,79,84)[ #
findInterval(runif(100,0,100),
c(17,19,24,29,34,44,54,64,69,74,79,84,110) )
] # close extraction
The other option, of course, and a more "honest" one in
cut(vec, breaks=c(...), labels=c(...) )
(It's not clear why you are not picking midpoint ages
within those brackets to me.)
# Reference the lookup table based on computed indices
df$DSTz
<-DSTzlook[which(DSTzlook[,1]==df$Agetmp),which(DSTzlook[1,]==
I have not been able to figure out what you are trying to
do here. Trying to use a 2d lookup looks promising a a way to
emulate what an Excel user might attempt, but an example (as
requested in the message at the bottom of every posting)
would really be of great help in making this more concrete
for those of us with insufficient abstractive abilities.
# Cleanup
#rm(df)
#df$Agetmp<-NULL
--
View this message in context:
ookup-table-tp3080245p3080245.html
Sent from the R help mailing list archive at Nabble.com.
David Winsemius, MD
West Hartford, CT
David Winsemius, MD
West Hartford, CT