An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20101022/9be33798/attachment.pl>
Conditional looping over a set of variables in R
10 messages · Adrienne Wootten, William Dunlap, David Herzberg +3 more
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20101022/70c1a2b1/attachment.pl>
You were a bit vague about the format of your data.
I'm assuming all columns were numeric and the entries
are one of 0, 1, and NA (missing value). I made a
little function to generate random data of that format
for testing purposes:
makeData <- function (nrow = 1500, ncol = 140, pMissing = 0.1)
{
# pMissing if proportion of missing values
m <- matrix(sample(c(1, 0), size = nrow * ncol, replace = TRUE),
nrow, ncol)
m[runif(nrow * ncol) < pMissing] <- NA
data.frame(m)
}
E.g.,
> set.seed(168)
> d <- makeData(15,3)
> d
X1 X2 X3
1 1 1 1
2 0 0 NA
3 0 1 0
4 0 0 NA
5 0 1 1
6 0 0 NA
7 1 0 0
8 0 1 1
9 0 0 1
10 1 1 NA
11 0 0 1
12 0 0 0
13 NA NA NA
14 0 0 0
15 1 0 0
I think the following function does what you want.
The algorithm is pretty similar to what you showed.
columnOfFirstOne <- function(data) {
# col will be return value, one entry per row of data.
# Fill it with NA's: NA in output will mean there were no 1's in
row
col <- rep(as.integer(NA), nrow(data))
for (j in seq_len(ncol(data))) { # loop over columns
# For each entry in 'col', if it has not been set yet
# and this entry the j'th column of data is 1 (and not
missing)
# then set to the column number.
col[is.na(col) & !is.na(data[, j]) & data[, j] == 1] <- j
}
col # return this from function
}
With the above data we get
> columnOfFirstOne(d)
[1] 1 NA 2 NA 2 NA 1 2 3 1 3 NA NA NA 1
It seems quick enough for a dataset of your size
> dd <- makeData(nrow=1500, ncol=140)
> system.time(columnOfFirstOne(dd)) # time in seconds
user system elapsed
0.08 0.00 0.08
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
-----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of David Herzberg Sent: Friday, October 22, 2010 8:34 AM To: r-help at r-project.org Subject: [R] Conditional looping over a set of variables in R Here's the problem I'm trying to solve in R: I have a data frame that consists of about 1500 cases (rows) of data from kids who took a test of listening comprehension. The columns are their scores (1 = correct, 0 = incorrect, . = missing) on 140 test items. The items are numbered sequentially and are ordered by increasing difficulty as you go from left to right across the columns. I want R to go through the data and find the first correct response for each case. Because of basal and ceiling rules, many cases have missing data on many items before the first correct response appears. For each case, I want R to evaluate the item responses sequentially starting with item 1. If the score is 0 or missing, proceed to the next item and evaluate it. If the score is 1, stop the operation for that case, record the item number of that first correct response in a new variable, proceed to the next case, and restart the operation. In SPSS, this operation would be carried out with LOOP, VECTOR, and DO IF, as follows (assuming the data set is already loaded): * DECLARE A NEW VARIABLE TO HOLD THE ITEM NUMBER OF THE FIRST CORRECT RESPONSE, SET IT EQUAL TO 0. numeric LCfirst1. comp LCfirst1 = 0 * DECLARE A VECTOR TO HOLD THE 140 ITEM RESPONSE VARIABLES. vector x=LC1a_score to LC140a_score. * SET UP A LOOP THAT WILL RUN FROM 1 TO 140, AS LONG AS LCfirst1 = 0. "#i" IS AN INDEX VARIABLE THAT INCREASES BY 1 EACH TIME THE LOOP RUNS. loop #i=1 to 140 if (LCfirst1 = 0). * SET UP A CONDITIONAL TRANSFORMATION THAT IS EVALUATED FOR EACH ELEMENT OF THE VECTOR. THUS, WHEN #i = 1, THE EXPRESSION EVALUATES THE FIRST ELEMENT OF THE VECTOR (THAT IS, THE FIRST OF THE 140 ITEM RESPONSES). AS THE LOOP RUNS AND #i INCREASES, SUBSEQUENT VECTOR ELELMENTS ARE EVALUATED. THE do if STATEMENT RETAINS CONTROL AND KEEPS LOOPING THROUGH THE VECTOR UNTIL A '1' IS ENCOUNTERED. + do if x(#i) = 1. * WHEN A '1' IS ENCOUNTERED, CONTROL PASSES TO THE NEXT STATEMENT, WHICH RECODES THE VALUE OF THAT VECTOR ELEMENT TO '99'. + comp x(#i) = 99. * AND THEN CONTROL PASSES TO THE NEXT STATEMENT, WHICH RECODES THE VALUE OF LCfirst1 TO THE CURRENT INDEX VALUE, THUS CAPTURING THE ITEM NUMBER OF THE FIRST CORRECT RESPONSE FOR THAT CASE. CHANGING THE VALUE OF LCfirst1 ALSO CAUSE S THE LOOP TO STOP EXECUTING FOR THAT CASE, AND THE PROGRAM MOVES TO THE NEXT CASE AND RESTARTS THE LOOP. + comp LCfirst1 = #i. + end if. end loop. exe. After several hours of trying to translate this procedure to R, I'm stumped. I played around with creating a list to hold the item responses variables (analogous to 'vector' in SPSS), but when I tried to use the list in an R procedure, I kept getting a warning along the lines of 'the list contains > 1 element, only the first element will be used'. So perhaps a list is not the appropriate class to 'hold' these variables? It seems that some nested arrangement of 'for' 'while' and/or 'lapply' will allow me to recreate the operation described above? How do I set up the indexing operation analogous to 'loop #i' in SPSS? Any help is appreciated, and I'm happy to provide more information if needed. David S. Herzberg, Ph.D. Vice President, Research and Development Western Psychological Services 12031 Wilshire Blvd. Los Angeles, CA 90025-1251 Phone: (310)478-2061 x144 FAX: (310)478-7838 email: davidh at wpspublish.com [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Bill, thanks so much for this. I'll get a chance to test it later today, and will post the outcome.
David S. Herzberg, Ph.D.
Vice President, Research and Development
Western Psychological Services
12031 Wilshire Blvd.
Los Angeles, CA 90025-1251
Phone: (310)478-2061 x144
FAX: (310)478-7838
email: davidh at wpspublish.com
-----Original Message-----
From: William Dunlap [mailto:wdunlap at tibco.com]
Sent: Friday, October 22, 2010 9:52 AM
To: David Herzberg; r-help at r-project.org
Subject: RE: [R] Conditional looping over a set of variables in R
You were a bit vague about the format of your data.
I'm assuming all columns were numeric and the entries are one of 0, 1, and NA (missing value). I made a little function to generate random data of that format for testing purposes:
makeData <- function (nrow = 1500, ncol = 140, pMissing = 0.1) {
# pMissing if proportion of missing values
m <- matrix(sample(c(1, 0), size = nrow * ncol, replace = TRUE),
nrow, ncol)
m[runif(nrow * ncol) < pMissing] <- NA
data.frame(m)
}
E.g.,
> set.seed(168)
> d <- makeData(15,3)
> d
X1 X2 X3
1 1 1 1
2 0 0 NA
3 0 1 0
4 0 0 NA
5 0 1 1
6 0 0 NA
7 1 0 0
8 0 1 1
9 0 0 1
10 1 1 NA
11 0 0 1
12 0 0 0
13 NA NA NA
14 0 0 0
15 1 0 0
I think the following function does what you want.
The algorithm is pretty similar to what you showed.
columnOfFirstOne <- function(data) {
# col will be return value, one entry per row of data.
# Fill it with NA's: NA in output will mean there were no 1's in row
col <- rep(as.integer(NA), nrow(data))
for (j in seq_len(ncol(data))) { # loop over columns
# For each entry in 'col', if it has not been set yet
# and this entry the j'th column of data is 1 (and not
missing)
# then set to the column number.
col[is.na(col) & !is.na(data[, j]) & data[, j] == 1] <- j
}
col # return this from function
}
With the above data we get
> columnOfFirstOne(d)
[1] 1 NA 2 NA 2 NA 1 2 3 1 3 NA NA NA 1
It seems quick enough for a dataset of your size
> dd <- makeData(nrow=1500, ncol=140)
> system.time(columnOfFirstOne(dd)) # time in seconds
user system elapsed
0.08 0.00 0.08
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
-----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of David Herzberg Sent: Friday, October 22, 2010 8:34 AM To: r-help at r-project.org Subject: [R] Conditional looping over a set of variables in R Here's the problem I'm trying to solve in R: I have a data frame that consists of about 1500 cases (rows) of data from kids who took a test of listening comprehension. The columns are their scores (1 = correct, 0 = incorrect, . = missing) on 140 test items. The items are numbered sequentially and are ordered by increasing difficulty as you go from left to right across the columns. I want R to go through the data and find the first correct response for each case. Because of basal and ceiling rules, many cases have missing data on many items before the first correct response appears. For each case, I want R to evaluate the item responses sequentially starting with item 1. If the score is 0 or missing, proceed to the next item and evaluate it. If the score is 1, stop the operation for that case, record the item number of that first correct response in a new variable, proceed to the next case, and restart the operation. In SPSS, this operation would be carried out with LOOP, VECTOR, and DO IF, as follows (assuming the data set is already loaded): * DECLARE A NEW VARIABLE TO HOLD THE ITEM NUMBER OF THE FIRST CORRECT RESPONSE, SET IT EQUAL TO 0. numeric LCfirst1. comp LCfirst1 = 0 * DECLARE A VECTOR TO HOLD THE 140 ITEM RESPONSE VARIABLES. vector x=LC1a_score to LC140a_score. * SET UP A LOOP THAT WILL RUN FROM 1 TO 140, AS LONG AS LCfirst1 = 0. "#i" IS AN INDEX VARIABLE THAT INCREASES BY 1 EACH TIME THE LOOP RUNS. loop #i=1 to 140 if (LCfirst1 = 0). * SET UP A CONDITIONAL TRANSFORMATION THAT IS EVALUATED FOR EACH ELEMENT OF THE VECTOR. THUS, WHEN #i = 1, THE EXPRESSION EVALUATES THE FIRST ELEMENT OF THE VECTOR (THAT IS, THE FIRST OF THE 140 ITEM RESPONSES). AS THE LOOP RUNS AND #i INCREASES, SUBSEQUENT VECTOR ELELMENTS ARE EVALUATED. THE do if STATEMENT RETAINS CONTROL AND KEEPS LOOPING THROUGH THE VECTOR UNTIL A '1' IS ENCOUNTERED. + do if x(#i) = 1. * WHEN A '1' IS ENCOUNTERED, CONTROL PASSES TO THE NEXT STATEMENT, WHICH RECODES THE VALUE OF THAT VECTOR ELEMENT TO '99'. + comp x(#i) = 99. * AND THEN CONTROL PASSES TO THE NEXT STATEMENT, WHICH RECODES THE VALUE OF LCfirst1 TO THE CURRENT INDEX VALUE, THUS CAPTURING THE ITEM NUMBER OF THE FIRST CORRECT RESPONSE FOR THAT CASE. CHANGING THE VALUE OF LCfirst1 ALSO CAUSE S THE LOOP TO STOP EXECUTING FOR THAT CASE, AND THE PROGRAM MOVES TO THE NEXT CASE AND RESTARTS THE LOOP. + comp LCfirst1 = #i. + end if. end loop. exe. After several hours of trying to translate this procedure to R, I'm stumped. I played around with creating a list to hold the item responses variables (analogous to 'vector' in SPSS), but when I tried to use the list in an R procedure, I kept getting a warning along the lines of 'the list contains > 1 element, only the first element will be used'. So perhaps a list is not the appropriate class to 'hold' these variables? It seems that some nested arrangement of 'for' 'while' and/or 'lapply' will allow me to recreate the operation described above? How do I set up the indexing operation analogous to 'loop #i' in SPSS? Any help is appreciated, and I'm happy to provide more information if needed. David S. Herzberg, Ph.D. Vice President, Research and Development Western Psychological Services 12031 Wilshire Blvd. Los Angeles, CA 90025-1251 Phone: (310)478-2061 x144 FAX: (310)478-7838 email: davidh at wpspublish.com [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
1 day later
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20101023/1c8f1f56/attachment.pl>
This won't be as quick as Bill's elegant solution, but it's a one-liner: apply(d, 1, function(x), match(1, x)) See ?match. -Peter Ehlers
On 2010-10-22 10:36, David Herzberg wrote:
Bill, thanks so much for this. I'll get a chance to test it later today, and will post the outcome.
David S. Herzberg, Ph.D.
Vice President, Research and Development
Western Psychological Services
12031 Wilshire Blvd.
Los Angeles, CA 90025-1251
Phone: (310)478-2061 x144
FAX: (310)478-7838
email: davidh at wpspublish.com
-----Original Message-----
From: William Dunlap [mailto:wdunlap at tibco.com]
Sent: Friday, October 22, 2010 9:52 AM
To: David Herzberg; r-help at r-project.org
Subject: RE: [R] Conditional looping over a set of variables in R
You were a bit vague about the format of your data.
I'm assuming all columns were numeric and the entries are one of 0, 1, and NA (missing value). I made a little function to generate random data of that format for testing purposes:
makeData<- function (nrow = 1500, ncol = 140, pMissing = 0.1) {
# pMissing if proportion of missing values
m<- matrix(sample(c(1, 0), size = nrow * ncol, replace = TRUE),
nrow, ncol)
m[runif(nrow * ncol)< pMissing]<- NA
data.frame(m)
}
E.g.,
> set.seed(168) > d<- makeData(15,3) > d
X1 X2 X3
1 1 1 1
2 0 0 NA
3 0 1 0
4 0 0 NA
5 0 1 1
6 0 0 NA
7 1 0 0
8 0 1 1
9 0 0 1
10 1 1 NA
11 0 0 1
12 0 0 0
13 NA NA NA
14 0 0 0
15 1 0 0
I think the following function does what you want.
The algorithm is pretty similar to what you showed.
columnOfFirstOne<- function(data) {
# col will be return value, one entry per row of data.
# Fill it with NA's: NA in output will mean there were no 1's in row
col<- rep(as.integer(NA), nrow(data))
for (j in seq_len(ncol(data))) { # loop over columns
# For each entry in 'col', if it has not been set yet
# and this entry the j'th column of data is 1 (and not
missing)
# then set to the column number.
col[is.na(col)& !is.na(data[, j])& data[, j] == 1]<- j
}
col # return this from function
}
With the above data we get
> columnOfFirstOne(d)
[1] 1 NA 2 NA 2 NA 1 2 3 1 3 NA NA NA 1 It seems quick enough for a dataset of your size
> dd<- makeData(nrow=1500, ncol=140) > system.time(columnOfFirstOne(dd)) # time in seconds
user system elapsed
0.08 0.00 0.08
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
-----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of David Herzberg Sent: Friday, October 22, 2010 8:34 AM To: r-help at r-project.org Subject: [R] Conditional looping over a set of variables in R Here's the problem I'm trying to solve in R: I have a data frame that consists of about 1500 cases (rows) of data from kids who took a test of listening comprehension. The columns are their scores (1 = correct, 0 = incorrect, . = missing) on 140 test items. The items are numbered sequentially and are ordered by increasing difficulty as you go from left to right across the columns. I want R to go through the data and find the first correct response for each case. Because of basal and ceiling rules, many cases have missing data on many items before the first correct response appears. For each case, I want R to evaluate the item responses sequentially starting with item 1. If the score is 0 or missing, proceed to the next item and evaluate it. If the score is 1, stop the operation for that case, record the item number of that first correct response in a new variable, proceed to the next case, and restart the operation. In SPSS, this operation would be carried out with LOOP, VECTOR, and DO IF, as follows (assuming the data set is already loaded): * DECLARE A NEW VARIABLE TO HOLD THE ITEM NUMBER OF THE FIRST CORRECT RESPONSE, SET IT EQUAL TO 0. numeric LCfirst1. comp LCfirst1 = 0 * DECLARE A VECTOR TO HOLD THE 140 ITEM RESPONSE VARIABLES. vector x=LC1a_score to LC140a_score. * SET UP A LOOP THAT WILL RUN FROM 1 TO 140, AS LONG AS LCfirst1 = 0. "#i" IS AN INDEX VARIABLE THAT INCREASES BY 1 EACH TIME THE LOOP RUNS. loop #i=1 to 140 if (LCfirst1 = 0). * SET UP A CONDITIONAL TRANSFORMATION THAT IS EVALUATED FOR EACH ELEMENT OF THE VECTOR. THUS, WHEN #i = 1, THE EXPRESSION EVALUATES THE FIRST ELEMENT OF THE VECTOR (THAT IS, THE FIRST OF THE 140 ITEM RESPONSES). AS THE LOOP RUNS AND #i INCREASES, SUBSEQUENT VECTOR ELELMENTS ARE EVALUATED. THE do if STATEMENT RETAINS CONTROL AND KEEPS LOOPING THROUGH THE VECTOR UNTIL A '1' IS ENCOUNTERED. + do if x(#i) = 1. * WHEN A '1' IS ENCOUNTERED, CONTROL PASSES TO THE NEXT STATEMENT, WHICH RECODES THE VALUE OF THAT VECTOR ELEMENT TO '99'. + comp x(#i) = 99. * AND THEN CONTROL PASSES TO THE NEXT STATEMENT, WHICH RECODES THE VALUE OF LCfirst1 TO THE CURRENT INDEX VALUE, THUS CAPTURING THE ITEM NUMBER OF THE FIRST CORRECT RESPONSE FOR THAT CASE. CHANGING THE VALUE OF LCfirst1 ALSO CAUSE S THE LOOP TO STOP EXECUTING FOR THAT CASE, AND THE PROGRAM MOVES TO THE NEXT CASE AND RESTARTS THE LOOP. + comp LCfirst1 = #i. + end if. end loop. exe. After several hours of trying to translate this procedure to R, I'm stumped. I played around with creating a list to hold the item responses variables (analogous to 'vector' in SPSS), but when I tried to use the list in an R procedure, I kept getting a warning along the lines of 'the list contains> 1 element, only the first element will be used'. So perhaps a list is not the appropriate class to 'hold' these variables? It seems that some nested arrangement of 'for' 'while' and/or 'lapply' will allow me to recreate the operation described above? How do I set up the indexing operation analogous to 'loop #i' in SPSS? Any help is appreciated, and I'm happy to provide more information if needed. David S. Herzberg, Ph.D. Vice President, Research and Development Western Psychological Services 12031 Wilshire Blvd. Los Angeles, CA 90025-1251 Phone: (310)478-2061 x144 FAX: (310)478-7838 email: davidh at wpspublish.com
Whoops, got an extra comma in there somehow; should be: apply(d, 1, function(x) match(1, x)) -Peter Ehlers
On 2010-10-24 08:17, Peter Ehlers wrote:
This won't be as quick as Bill's elegant solution, but it's a one-liner:
apply(d, 1, function(x), match(1, x))
See ?match.
-Peter Ehlers
On 2010-10-22 10:36, David Herzberg wrote:
Bill, thanks so much for this. I'll get a chance to test it later today, and will post the outcome.
David S. Herzberg, Ph.D.
Vice President, Research and Development
Western Psychological Services
12031 Wilshire Blvd.
Los Angeles, CA 90025-1251
Phone: (310)478-2061 x144
FAX: (310)478-7838
email: davidh at wpspublish.com
-----Original Message-----
From: William Dunlap [mailto:wdunlap at tibco.com]
Sent: Friday, October 22, 2010 9:52 AM
To: David Herzberg; r-help at r-project.org
Subject: RE: [R] Conditional looping over a set of variables in R
You were a bit vague about the format of your data.
I'm assuming all columns were numeric and the entries are one of 0, 1, and NA (missing value). I made a little function to generate random data of that format for testing purposes:
makeData<- function (nrow = 1500, ncol = 140, pMissing = 0.1) {
# pMissing if proportion of missing values
m<- matrix(sample(c(1, 0), size = nrow * ncol, replace = TRUE),
nrow, ncol)
m[runif(nrow * ncol)< pMissing]<- NA
data.frame(m)
}
E.g.,
> set.seed(168)
> d<- makeData(15,3)
> d
X1 X2 X3
1 1 1 1
2 0 0 NA
3 0 1 0
4 0 0 NA
5 0 1 1
6 0 0 NA
7 1 0 0
8 0 1 1
9 0 0 1
10 1 1 NA
11 0 0 1
12 0 0 0
13 NA NA NA
14 0 0 0
15 1 0 0
I think the following function does what you want.
The algorithm is pretty similar to what you showed.
columnOfFirstOne<- function(data) {
# col will be return value, one entry per row of data.
# Fill it with NA's: NA in output will mean there were no 1's in row
col<- rep(as.integer(NA), nrow(data))
for (j in seq_len(ncol(data))) { # loop over columns
# For each entry in 'col', if it has not been set yet
# and this entry the j'th column of data is 1 (and not
missing)
# then set to the column number.
col[is.na(col)& !is.na(data[, j])& data[, j] == 1]<- j
}
col # return this from function
}
With the above data we get
> columnOfFirstOne(d)
[1] 1 NA 2 NA 2 NA 1 2 3 1 3 NA NA NA 1 It seems quick enough for a dataset of your size
> dd<- makeData(nrow=1500, ncol=140)
> system.time(columnOfFirstOne(dd)) # time in seconds
user system elapsed
0.08 0.00 0.08
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
-----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of David Herzberg Sent: Friday, October 22, 2010 8:34 AM To: r-help at r-project.org Subject: [R] Conditional looping over a set of variables in R Here's the problem I'm trying to solve in R: I have a data frame that consists of about 1500 cases (rows) of data from kids who took a test of listening comprehension. The columns are their scores (1 = correct, 0 = incorrect, . = missing) on 140 test items. The items are numbered sequentially and are ordered by increasing difficulty as you go from left to right across the columns. I want R to go through the data and find the first correct response for each case. Because of basal and ceiling rules, many cases have missing data on many items before the first correct response appears. For each case, I want R to evaluate the item responses sequentially starting with item 1. If the score is 0 or missing, proceed to the next item and evaluate it. If the score is 1, stop the operation for that case, record the item number of that first correct response in a new variable, proceed to the next case, and restart the operation. In SPSS, this operation would be carried out with LOOP, VECTOR, and DO IF, as follows (assuming the data set is already loaded): * DECLARE A NEW VARIABLE TO HOLD THE ITEM NUMBER OF THE FIRST CORRECT RESPONSE, SET IT EQUAL TO 0. numeric LCfirst1. comp LCfirst1 = 0 * DECLARE A VECTOR TO HOLD THE 140 ITEM RESPONSE VARIABLES. vector x=LC1a_score to LC140a_score. * SET UP A LOOP THAT WILL RUN FROM 1 TO 140, AS LONG AS LCfirst1 = 0. "#i" IS AN INDEX VARIABLE THAT INCREASES BY 1 EACH TIME THE LOOP RUNS. loop #i=1 to 140 if (LCfirst1 = 0). * SET UP A CONDITIONAL TRANSFORMATION THAT IS EVALUATED FOR EACH ELEMENT OF THE VECTOR. THUS, WHEN #i = 1, THE EXPRESSION EVALUATES THE FIRST ELEMENT OF THE VECTOR (THAT IS, THE FIRST OF THE 140 ITEM RESPONSES). AS THE LOOP RUNS AND #i INCREASES, SUBSEQUENT VECTOR ELELMENTS ARE EVALUATED. THE do if STATEMENT RETAINS CONTROL AND KEEPS LOOPING THROUGH THE VECTOR UNTIL A '1' IS ENCOUNTERED. + do if x(#i) = 1. * WHEN A '1' IS ENCOUNTERED, CONTROL PASSES TO THE NEXT STATEMENT, WHICH RECODES THE VALUE OF THAT VECTOR ELEMENT TO '99'. + comp x(#i) = 99. * AND THEN CONTROL PASSES TO THE NEXT STATEMENT, WHICH RECODES THE VALUE OF LCfirst1 TO THE CURRENT INDEX VALUE, THUS CAPTURING THE ITEM NUMBER OF THE FIRST CORRECT RESPONSE FOR THAT CASE. CHANGING THE VALUE OF LCfirst1 ALSO CAUSE S THE LOOP TO STOP EXECUTING FOR THAT CASE, AND THE PROGRAM MOVES TO THE NEXT CASE AND RESTARTS THE LOOP. + comp LCfirst1 = #i. + end if. end loop. exe. After several hours of trying to translate this procedure to R, I'm stumped. I played around with creating a list to hold the item responses variables (analogous to 'vector' in SPSS), but when I tried to use the list in an R procedure, I kept getting a warning along the lines of 'the list contains> 1 element, only the first element will be used'. So perhaps a list is not the appropriate class to 'hold' these variables? It seems that some nested arrangement of 'for' 'while' and/or 'lapply' will allow me to recreate the operation described above? How do I set up the indexing operation analogous to 'loop #i' in SPSS? Any help is appreciated, and I'm happy to provide more information if needed. David S. Herzberg, Ph.D. Vice President, Research and Development Western Psychological Services 12031 Wilshire Blvd. Los Angeles, CA 90025-1251 Phone: (310)478-2061 x144 FAX: (310)478-7838 email: davidh at wpspublish.com
On Sun, Oct 24, 2010 at 2:54 PM, Peter Ehlers <ehlers at ucalgary.ca> wrote:
Whoops, got an extra comma in there somehow; should be: ?apply(d, 1, function(x) match(1, x))
A slight variation on this would be: apply(d, 1, match, x = 1)
Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20101025/99e86f01/attachment.pl>
Hi r-help-bounces at r-project.org napsal dne 25.10.2010 20:41:55:
Adrienne, there's one glitch when I implement your solution below. When
the
loop encounters a case with no data at all (that is, all 140 item
responses
are missing), it aborts and prints this error message: " ERROR: argument
is
of length zero". I wonder if there's a logical condition I could add that would enable R
to
skip these empty cases and continue executing on the next case that
contains data.
Thanks, Dave David S. Herzberg, Ph.D. Vice President, Research and Development Western Psychological Services 12031 Wilshire Blvd. Los Angeles, CA 90025-1251 Phone: (310)478-2061 x144 FAX: (310)478-7838 email: davidh at wpspublish.com From: wootten.adrienne at gmail.com [mailto:wootten.adrienne at gmail.com] On
Behalf
Of Adrienne Wootten Sent: Friday, October 22, 2010 9:09 AM To: David Herzberg Cc: r-help at r-project.org Subject: Re: [R] Conditional looping over a set of variables in R David, here I'm referring to your data as testmat, a matrix of 140 columns and
1500
rows, but the same or similar notation can be applied to data frames in
R. If
I understand correctly, you are looking for the first response (column)
where
you got a value of 1. I'm assuming also that since your missing values
are
characters then your two numeric values are also characters. keeping
all this
in mind, try something like this.
If you really only want to know which column in each row has first occurrence of 1 (or any other value) you can get rid of looping and use other R capabilities.
set.seed(111) mat<-matrix(sample(1:3, 20, replace=T),5,4) mat
[,1] [,2] [,3] [,4] [1,] 2 2 2 2 [2,] 3 1 2 1 [3,] 2 2 1 3 [4,] 2 2 1 1 [5,] 2 1 1 2
mat.w<-which(mat==1, arr.ind=T) tapply(mat.w[,2], mat.w[,1], min)
2 3 4 5 2 3 3 2
mat[2, ]<-NA mat
[,1] [,2] [,3] [,4] [1,] 2 2 2 2 [2,] NA NA NA NA [3,] 2 2 1 3 [4,] 2 2 1 1 [5,] 2 1 1 2 and this approach smoothly works with NA values too
mat.w<-which(mat==1, arr.ind=T) tapply(mat.w[,2], mat.w[,1], min)
3 4 5 3 3 2 You can then use modify such output as you have info about columns and rows. I am sure there are other maybe better options, e.g. lll<-as.list(as.data.frame(t(mat)))
unlist(lapply(lll, function(x) min(which(x==1))))
V1 V2 V3 V4 V5 Inf Inf 3 3 2 Regards Petr
first = c() # your extra variable which will eventually contain the
first
correct response for each case
for(i in 1:nrow(testmat)){
c = 1
while( c<=ncol(testmat) | testmat[i,c] != "1" ){
if( testmat[i,c] == "1"){
first[i] = c
break # will exit the while loop once it finds the first correct answer,
and
then jump to the next case
} else {
c=c+1 # procede to the next column if not
}
}
}
Hope this helps you out a bit.
Adrienne Wootten
NCSU
On Fri, Oct 22, 2010 at 11:33 AM, David Herzberg <davidh at wpspublish.com<
mailto:davidh at wpspublish.com>> wrote:
Here's the problem I'm trying to solve in R: I have a data frame that
consists
of about 1500 cases (rows) of data from kids who took a test of
listening
comprehension. The columns are their scores (1 = correct, 0 = incorrect,
. =
missing) on 140 test items. The items are numbered sequentially and are ordered by increasing difficulty as you go from left to right across the
columns. I want R to go through the data and find the first correct
response
for each case. Because of basal and ceiling rules, many cases have
missing
data on many items before the first correct response appears. For each case, I want R to evaluate the item responses sequentially
starting
with item 1. If the score is 0 or missing, proceed to the next item and evaluate it. If the score is 1, stop the operation for that case, record
the
item number of that first correct response in a new variable, proceed to
the
next case, and restart the operation. In SPSS, this operation would be carried out with LOOP, VECTOR, and DO
IF, as
follows (assuming the data set is already loaded): * DECLARE A NEW VARIABLE TO HOLD THE ITEM NUMBER OF THE FIRST CORRECT RESPONSE, SET IT EQUAL TO 0. numeric LCfirst1. comp LCfirst1 = 0 * DECLARE A VECTOR TO HOLD THE 140 ITEM RESPONSE VARIABLES. vector x=LC1a_score to LC140a_score. * SET UP A LOOP THAT WILL RUN FROM 1 TO 140, AS LONG AS LCfirst1 = 0.
"#i" IS
AN INDEX VARIABLE THAT INCREASES BY 1 EACH TIME THE LOOP RUNS. loop #i=1 to 140 if (LCfirst1 = 0). * SET UP A CONDITIONAL TRANSFORMATION THAT IS EVALUATED FOR EACH ELEMENT
OF
THE VECTOR. THUS, WHEN #i = 1, THE EXPRESSION EVALUATES THE FIRST
ELEMENT OF
THE VECTOR (THAT IS, THE FIRST OF THE 140 ITEM RESPONSES). AS THE LOOP
RUNS
AND #i INCREASES, SUBSEQUENT VECTOR ELELMENTS ARE EVALUATED. THE do if STATEMENT RETAINS CONTROL AND KEEPS LOOPING THROUGH THE VECTOR UNTIL A
'1' IS
ENCOUNTERED. + do if x(#i) = 1. * WHEN A '1' IS ENCOUNTERED, CONTROL PASSES TO THE NEXT STATEMENT, WHICH
RECODES THE VALUE OF THAT VECTOR ELEMENT TO '99'. + comp x(#i) = 99. * AND THEN CONTROL PASSES TO THE NEXT STATEMENT, WHICH RECODES THE VALUE
OF
LCfirst1 TO THE CURRENT INDEX VALUE, THUS CAPTURING THE ITEM NUMBER OF
THE
FIRST CORRECT RESPONSE FOR THAT CASE. CHANGING THE VALUE OF LCfirst1
ALSO
CAUSE S THE LOOP TO STOP EXECUTING FOR THAT CASE, AND THE PROGRAM MOVES
TO THE
NEXT CASE AND RESTARTS THE LOOP. + comp LCfirst1 = #i. + end if. end loop. exe. After several hours of trying to translate this procedure to R, I'm
stumped. I
played around with creating a list to hold the item responses variables (analogous to 'vector' in SPSS), but when I tried to use the list in an
R
procedure, I kept getting a warning along the lines of 'the list
contains > 1
element, only the first element will be used'. So perhaps a list is not
the
appropriate class to 'hold' these variables? It seems that some nested arrangement of 'for' 'while' and/or 'lapply'
will
allow me to recreate the operation described above? How do I set up the indexing operation analogous to 'loop #i' in SPSS? Any help is appreciated, and I'm happy to provide more information if
needed.
David S. Herzberg, Ph.D.
Vice President, Research and Development
Western Psychological Services
12031 Wilshire Blvd.
Los Angeles, CA 90025-1251
Phone: (310)478-2061 x144
FAX: (310)478-7838
email: davidh at wpspublish.com<mailto:davidh at wpspublish.com>
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org<mailto:R-help at r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.