Conditional looping over a set of variables in R

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20101022/9be33798/attachment.pl>
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20101022/70c1a2b1/attachment.pl>
You were a bit vague about the format of your data.
I'm assuming all columns were numeric and the entries
are one of 0, 1, and NA (missing value).  I made a
little function to generate random data of that format
for testing purposes:

makeData <- function (nrow = 1500, ncol = 140, pMissing = 0.1) 
{
    # pMissing if proportion of missing values
    m <- matrix(sample(c(1, 0), size = nrow * ncol, replace = TRUE), 
        nrow, ncol)
    m[runif(nrow * ncol) < pMissing] <- NA
    data.frame(m)
}

E.g.,

  > set.seed(168)
  > d <- makeData(15,3)
  > d
      X1 X2 X3
   1   1  1  1
   2   0  0 NA
   3   0  1  0
   4   0  0 NA
   5   0  1  1
   6   0  0 NA
   7   1  0  0
   8   0  1  1
   9   0  0  1
  10   1  1 NA
  11   0  0  1
  12   0  0  0
  13  NA NA NA
  14   0  0  0
  15   1  0  0

I think the following function does what you want.
The algorithm is pretty similar to what you showed.

  columnOfFirstOne <- function(data) {
      # col will be return value, one entry per row of data.
      # Fill it with NA's: NA in output will mean there were no 1's in
row
      col <- rep(as.integer(NA), nrow(data))
      for (j in seq_len(ncol(data))) { # loop over columns
          # For each entry in 'col', if it has not been set yet
          # and this entry the j'th column of data is 1 (and not
missing)
          # then set to the column number.
          col[is.na(col) & !is.na(data[, j]) & data[, j] == 1] <- j
      }
      col # return this from function
  }

With the above data we get
  > columnOfFirstOne(d)
   [1]  1 NA  2 NA  2 NA  1  2  3  1  3 NA NA NA  1

It seems quick enough for a dataset of your size
  > dd <- makeData(nrow=1500, ncol=140)
  > system.time(columnOfFirstOne(dd)) # time in seconds
     user  system elapsed 
     0.08    0.00    0.08

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
-----Original Message-----
From: r-help-bounces at r-project.org 
[mailto:r-help-bounces at r-project.org] On Behalf Of David Herzberg
Sent: Friday, October 22, 2010 8:34 AM
To: r-help at r-project.org
Subject: [R] Conditional looping over a set of variables in R

Here's the problem I'm trying to solve in R: I have a data 
frame that consists of about 1500 cases (rows) of data from 
kids who took a test of listening comprehension. The columns 
are their scores (1 = correct, 0 = incorrect,  . = missing) 
on 140 test items. The items are numbered sequentially and 
are ordered by increasing difficulty as you go from left to 
right across the columns. I want R to go through the data and 
find the first correct response for each case. Because of 
basal and ceiling rules, many cases have missing data on many 
items before the first correct response appears.

For each case, I want R to evaluate the item responses 
sequentially starting with item 1. If the score is 0 or 
missing, proceed to the next item and evaluate it. If the 
score is 1, stop the operation for that case, record the item 
number of that first correct response in a new variable, 
proceed to the next case, and restart the operation.

In SPSS, this operation would be carried out with LOOP, 
VECTOR, and DO IF, as follows (assuming the data set is 
already loaded):

* DECLARE A NEW VARIABLE TO HOLD THE ITEM NUMBER OF THE FIRST 
CORRECT RESPONSE, SET IT EQUAL TO 0.
numeric LCfirst1.
comp LCfirst1 = 0

* DECLARE A VECTOR TO HOLD THE 140 ITEM RESPONSE VARIABLES.
vector x=LC1a_score to LC140a_score.

* SET UP A LOOP THAT WILL RUN FROM 1 TO 140, AS LONG AS 
LCfirst1 = 0. "#i" IS AN INDEX VARIABLE THAT INCREASES BY 1 
EACH TIME THE LOOP RUNS.
loop #i=1 to 140 if (LCfirst1 = 0).

* SET UP A CONDITIONAL TRANSFORMATION THAT IS EVALUATED FOR 
EACH ELEMENT OF THE VECTOR.  THUS, WHEN #i = 1, THE 
EXPRESSION EVALUATES THE FIRST ELEMENT OF THE VECTOR (THAT 
IS, THE FIRST OF THE 140 ITEM RESPONSES). AS THE LOOP RUNS 
AND #i INCREASES, SUBSEQUENT VECTOR ELELMENTS ARE EVALUATED. 
THE do if STATEMENT RETAINS CONTROL AND KEEPS LOOPING THROUGH 
THE VECTOR UNTIL A '1' IS ENCOUNTERED.
+ do if x(#i) = 1.

* WHEN A '1' IS ENCOUNTERED, CONTROL PASSES TO THE NEXT 
STATEMENT, WHICH RECODES THE VALUE OF THAT VECTOR ELEMENT TO '99'.
+ comp x(#i) = 99.

* AND THEN CONTROL PASSES TO THE NEXT STATEMENT, WHICH 
RECODES THE VALUE OF LCfirst1 TO THE CURRENT INDEX VALUE, 
THUS CAPTURING THE ITEM NUMBER OF THE FIRST CORRECT RESPONSE 
FOR THAT CASE. CHANGING THE VALUE OF LCfirst1 ALSO CAUSE S 
THE LOOP TO STOP EXECUTING FOR THAT CASE, AND THE PROGRAM 
MOVES TO THE NEXT CASE AND RESTARTS THE LOOP.
+ comp LCfirst1 = #i.
+ end if.
end loop.
exe.

After several hours of trying to translate this procedure to 
R, I'm stumped. I played around with creating a list to hold 
the item responses variables (analogous to 'vector' in SPSS), 
but when I tried to use the list in an R procedure, I kept 
getting a warning along the lines of  'the list contains > 1 
element, only the first element will be used'. So perhaps a 
list is not the appropriate class to 'hold' these variables?

It seems that some nested arrangement of 'for' 'while' and/or 
'lapply' will allow me to recreate the operation described 
above? How do I set up the indexing operation analogous to 
'loop #i' in SPSS?

Any help is appreciated, and I'm happy to provide more 
information if needed.

David S. Herzberg, Ph.D.
Vice President, Research and Development
Western Psychological Services
12031 Wilshire Blvd.
Los Angeles, CA 90025-1251
Phone: (310)478-2061 x144
FAX: (310)478-7838
email: davidh at wpspublish.com

	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Bill, thanks so much for this. I'll get a chance to test it later today, and will post the outcome.

David S. Herzberg, Ph.D.
Vice President, Research and Development 
Western Psychological Services
12031 Wilshire Blvd.
Los Angeles, CA 90025-1251
Phone: (310)478-2061 x144
FAX: (310)478-7838
email: davidh at wpspublish.com

-----Original Message-----
From: William Dunlap [mailto:wdunlap at tibco.com] 
Sent: Friday, October 22, 2010 9:52 AM
To: David Herzberg; r-help at r-project.org
Subject: RE: [R] Conditional looping over a set of variables in R

You were a bit vague about the format of your data.
I'm assuming all columns were numeric and the entries are one of 0, 1, and NA (missing value).  I made a little function to generate random data of that format for testing purposes:

makeData <- function (nrow = 1500, ncol = 140, pMissing = 0.1) {
    # pMissing if proportion of missing values
    m <- matrix(sample(c(1, 0), size = nrow * ncol, replace = TRUE), 
        nrow, ncol)
    m[runif(nrow * ncol) < pMissing] <- NA
    data.frame(m)
}

E.g.,

  > set.seed(168)
  > d <- makeData(15,3)
  > d
      X1 X2 X3
   1   1  1  1
   2   0  0 NA
   3   0  1  0
   4   0  0 NA
   5   0  1  1
   6   0  0 NA
   7   1  0  0
   8   0  1  1
   9   0  0  1
  10   1  1 NA
  11   0  0  1
  12   0  0  0
  13  NA NA NA
  14   0  0  0
  15   1  0  0

I think the following function does what you want.
The algorithm is pretty similar to what you showed.

  columnOfFirstOne <- function(data) {
      # col will be return value, one entry per row of data.
      # Fill it with NA's: NA in output will mean there were no 1's in row
      col <- rep(as.integer(NA), nrow(data))
      for (j in seq_len(ncol(data))) { # loop over columns
          # For each entry in 'col', if it has not been set yet
          # and this entry the j'th column of data is 1 (and not
missing)
          # then set to the column number.
          col[is.na(col) & !is.na(data[, j]) & data[, j] == 1] <- j
      }
      col # return this from function
  }

With the above data we get
  > columnOfFirstOne(d)
   [1]  1 NA  2 NA  2 NA  1  2  3  1  3 NA NA NA  1

It seems quick enough for a dataset of your size
  > dd <- makeData(nrow=1500, ncol=140)
  > system.time(columnOfFirstOne(dd)) # time in seconds
     user  system elapsed 
     0.08    0.00    0.08

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
-----Original Message-----
From: r-help-bounces at r-project.org
[mailto:r-help-bounces at r-project.org] On Behalf Of David Herzberg
Sent: Friday, October 22, 2010 8:34 AM
To: r-help at r-project.org
Subject: [R] Conditional looping over a set of variables in R

Here's the problem I'm trying to solve in R: I have a data frame that 
consists of about 1500 cases (rows) of data from kids who took a test 
of listening comprehension. The columns are their scores (1 = correct, 
0 = incorrect,  . = missing) on 140 test items. The items are numbered 
sequentially and are ordered by increasing difficulty as you go from 
left to right across the columns. I want R to go through the data and 
find the first correct response for each case. Because of basal and 
ceiling rules, many cases have missing data on many items before the 
first correct response appears.

For each case, I want R to evaluate the item responses sequentially 
starting with item 1. If the score is 0 or missing, proceed to the 
next item and evaluate it. If the score is 1, stop the operation for 
that case, record the item number of that first correct response in a 
new variable, proceed to the next case, and restart the operation.

In SPSS, this operation would be carried out with LOOP, VECTOR, and DO 
IF, as follows (assuming the data set is already loaded):

* DECLARE A NEW VARIABLE TO HOLD THE ITEM NUMBER OF THE FIRST CORRECT 
RESPONSE, SET IT EQUAL TO 0.
numeric LCfirst1.
comp LCfirst1 = 0

* DECLARE A VECTOR TO HOLD THE 140 ITEM RESPONSE VARIABLES.
vector x=LC1a_score to LC140a_score.

* SET UP A LOOP THAT WILL RUN FROM 1 TO 140, AS LONG AS
LCfirst1 = 0. "#i" IS AN INDEX VARIABLE THAT INCREASES BY 1 EACH TIME 
THE LOOP RUNS.
loop #i=1 to 140 if (LCfirst1 = 0).

* SET UP A CONDITIONAL TRANSFORMATION THAT IS EVALUATED FOR EACH 
ELEMENT OF THE VECTOR.  THUS, WHEN #i = 1, THE EXPRESSION EVALUATES 
THE FIRST ELEMENT OF THE VECTOR (THAT IS, THE FIRST OF THE 140 ITEM 
RESPONSES). AS THE LOOP RUNS AND #i INCREASES, SUBSEQUENT VECTOR 
ELELMENTS ARE EVALUATED.
THE do if STATEMENT RETAINS CONTROL AND KEEPS LOOPING THROUGH THE 
VECTOR UNTIL A '1' IS ENCOUNTERED.
+ do if x(#i) = 1.

* WHEN A '1' IS ENCOUNTERED, CONTROL PASSES TO THE NEXT STATEMENT, 
WHICH RECODES THE VALUE OF THAT VECTOR ELEMENT TO '99'.
+ comp x(#i) = 99.

* AND THEN CONTROL PASSES TO THE NEXT STATEMENT, WHICH RECODES THE 
VALUE OF LCfirst1 TO THE CURRENT INDEX VALUE, THUS CAPTURING THE ITEM 
NUMBER OF THE FIRST CORRECT RESPONSE FOR THAT CASE. CHANGING THE VALUE 
OF LCfirst1 ALSO CAUSE S THE LOOP TO STOP EXECUTING FOR THAT CASE, AND 
THE PROGRAM MOVES TO THE NEXT CASE AND RESTARTS THE LOOP.
+ comp LCfirst1 = #i.
+ end if.
end loop.
exe.

After several hours of trying to translate this procedure to R, I'm 
stumped. I played around with creating a list to hold the item 
responses variables (analogous to 'vector' in SPSS), but when I tried 
to use the list in an R procedure, I kept getting a warning along the 
lines of  'the list contains > 1 element, only the first element will 
be used'. So perhaps a list is not the appropriate class to 'hold' 
these variables?

It seems that some nested arrangement of 'for' 'while' and/or 'lapply' 
will allow me to recreate the operation described above? How do I set 
up the indexing operation analogous to 'loop #i' in SPSS?

Any help is appreciated, and I'm happy to provide more information if 
needed.

David S. Herzberg, Ph.D.
Vice President, Research and Development Western Psychological 
Services
12031 Wilshire Blvd.
Los Angeles, CA 90025-1251
Phone: (310)478-2061 x144
FAX: (310)478-7838
email: davidh at wpspublish.com

	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20101023/1c8f1f56/attachment.pl>
This won't be as quick as Bill's elegant solution, but it's a one-liner:

  apply(d, 1, function(x), match(1, x))

See ?match.

   -Peter Ehlers
Bill, thanks so much for this. I'll get a chance to test it later today, and will post the outcome.

David S. Herzberg, Ph.D.
Vice President, Research and Development
Western Psychological Services
12031 Wilshire Blvd.
Los Angeles, CA 90025-1251
Phone: (310)478-2061 x144
FAX: (310)478-7838
email: davidh at wpspublish.com

-----Original Message-----
From: William Dunlap [mailto:wdunlap at tibco.com]
Sent: Friday, October 22, 2010 9:52 AM
To: David Herzberg; r-help at r-project.org
Subject: RE: [R] Conditional looping over a set of variables in R

You were a bit vague about the format of your data.
I'm assuming all columns were numeric and the entries are one of 0, 1, and NA (missing value).  I made a little function to generate random data of that format for testing purposes:

makeData<- function (nrow = 1500, ncol = 140, pMissing = 0.1) {
     # pMissing if proportion of missing values
     m<- matrix(sample(c(1, 0), size = nrow * ncol, replace = TRUE),
         nrow, ncol)
     m[runif(nrow * ncol)<  pMissing]<- NA
     data.frame(m)
}

E.g.,

   >  set.seed(168)
   >  d<- makeData(15,3)
   >  d
       X1 X2 X3
    1   1  1  1
    2   0  0 NA
    3   0  1  0
    4   0  0 NA
    5   0  1  1
    6   0  0 NA
    7   1  0  0
    8   0  1  1
    9   0  0  1
   10   1  1 NA
   11   0  0  1
   12   0  0  0
   13  NA NA NA
   14   0  0  0
   15   1  0  0

I think the following function does what you want.
The algorithm is pretty similar to what you showed.

   columnOfFirstOne<- function(data) {
       # col will be return value, one entry per row of data.
       # Fill it with NA's: NA in output will mean there were no 1's in row
       col<- rep(as.integer(NA), nrow(data))
       for (j in seq_len(ncol(data))) { # loop over columns
           # For each entry in 'col', if it has not been set yet
           # and this entry the j'th column of data is 1 (and not
missing)
           # then set to the column number.
           col[is.na(col)&  !is.na(data[, j])&  data[, j] == 1]<- j
       }
       col # return this from function
   }

With the above data we get
   >  columnOfFirstOne(d)
    [1]  1 NA  2 NA  2 NA  1  2  3  1  3 NA NA NA  1

It seems quick enough for a dataset of your size
   >  dd<- makeData(nrow=1500, ncol=140)
   >  system.time(columnOfFirstOne(dd)) # time in seconds
      user  system elapsed
      0.08    0.00    0.08

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

-----Original Message-----
From: r-help-bounces at r-project.org
[mailto:r-help-bounces at r-project.org] On Behalf Of David Herzberg
Sent: Friday, October 22, 2010 8:34 AM
To: r-help at r-project.org
Subject: [R] Conditional looping over a set of variables in R

Here's the problem I'm trying to solve in R: I have a data frame that
consists of about 1500 cases (rows) of data from kids who took a test
of listening comprehension. The columns are their scores (1 = correct,
0 = incorrect,  . = missing) on 140 test items. The items are numbered
sequentially and are ordered by increasing difficulty as you go from
left to right across the columns. I want R to go through the data and
find the first correct response for each case. Because of basal and
ceiling rules, many cases have missing data on many items before the
first correct response appears.

For each case, I want R to evaluate the item responses sequentially
starting with item 1. If the score is 0 or missing, proceed to the
next item and evaluate it. If the score is 1, stop the operation for
that case, record the item number of that first correct response in a
new variable, proceed to the next case, and restart the operation.

In SPSS, this operation would be carried out with LOOP, VECTOR, and DO
IF, as follows (assuming the data set is already loaded):

* DECLARE A NEW VARIABLE TO HOLD THE ITEM NUMBER OF THE FIRST CORRECT
RESPONSE, SET IT EQUAL TO 0.
numeric LCfirst1.
comp LCfirst1 = 0

* DECLARE A VECTOR TO HOLD THE 140 ITEM RESPONSE VARIABLES.
vector x=LC1a_score to LC140a_score.

* SET UP A LOOP THAT WILL RUN FROM 1 TO 140, AS LONG AS
LCfirst1 = 0. "#i" IS AN INDEX VARIABLE THAT INCREASES BY 1 EACH TIME
THE LOOP RUNS.
loop #i=1 to 140 if (LCfirst1 = 0).

* SET UP A CONDITIONAL TRANSFORMATION THAT IS EVALUATED FOR EACH
ELEMENT OF THE VECTOR.  THUS, WHEN #i = 1, THE EXPRESSION EVALUATES
THE FIRST ELEMENT OF THE VECTOR (THAT IS, THE FIRST OF THE 140 ITEM
RESPONSES). AS THE LOOP RUNS AND #i INCREASES, SUBSEQUENT VECTOR
ELELMENTS ARE EVALUATED.
THE do if STATEMENT RETAINS CONTROL AND KEEPS LOOPING THROUGH THE
VECTOR UNTIL A '1' IS ENCOUNTERED.
+ do if x(#i) = 1.

* WHEN A '1' IS ENCOUNTERED, CONTROL PASSES TO THE NEXT STATEMENT,
WHICH RECODES THE VALUE OF THAT VECTOR ELEMENT TO '99'.
+ comp x(#i) = 99.

* AND THEN CONTROL PASSES TO THE NEXT STATEMENT, WHICH RECODES THE
VALUE OF LCfirst1 TO THE CURRENT INDEX VALUE, THUS CAPTURING THE ITEM
NUMBER OF THE FIRST CORRECT RESPONSE FOR THAT CASE. CHANGING THE VALUE
OF LCfirst1 ALSO CAUSE S THE LOOP TO STOP EXECUTING FOR THAT CASE, AND
THE PROGRAM MOVES TO THE NEXT CASE AND RESTARTS THE LOOP.
+ comp LCfirst1 = #i.
+ end if.
end loop.
exe.

After several hours of trying to translate this procedure to R, I'm
stumped. I played around with creating a list to hold the item
responses variables (analogous to 'vector' in SPSS), but when I tried
to use the list in an R procedure, I kept getting a warning along the
lines of  'the list contains>  1 element, only the first element will
be used'. So perhaps a list is not the appropriate class to 'hold'
these variables?

It seems that some nested arrangement of 'for' 'while' and/or 'lapply'
will allow me to recreate the operation described above? How do I set
up the indexing operation analogous to 'loop #i' in SPSS?

Any help is appreciated, and I'm happy to provide more information if
needed.

David S. Herzberg, Ph.D.
Vice President, Research and Development Western Psychological
Services
12031 Wilshire Blvd.
Los Angeles, CA 90025-1251
Phone: (310)478-2061 x144
FAX: (310)478-7838
email: davidh at wpspublish.com

Whoops, got an extra comma in there somehow; should be:

   apply(d, 1, function(x) match(1, x))

   -Peter Ehlers
This won't be as quick as Bill's elegant solution, but it's a one-liner:

   apply(d, 1, function(x), match(1, x))

See ?match.

    -Peter Ehlers

On 2010-10-22 10:36, David Herzberg wrote:
Bill, thanks so much for this. I'll get a chance to test it later today, and will post the outcome.

David S. Herzberg, Ph.D.
Vice President, Research and Development
Western Psychological Services
12031 Wilshire Blvd.
Los Angeles, CA 90025-1251
Phone: (310)478-2061 x144
FAX: (310)478-7838
email: davidh at wpspublish.com

-----Original Message-----
From: William Dunlap [mailto:wdunlap at tibco.com]
Sent: Friday, October 22, 2010 9:52 AM
To: David Herzberg; r-help at r-project.org
Subject: RE: [R] Conditional looping over a set of variables in R

You were a bit vague about the format of your data.
I'm assuming all columns were numeric and the entries are one of 0, 1, and NA (missing value).  I made a little function to generate random data of that format for testing purposes:

makeData<- function (nrow = 1500, ncol = 140, pMissing = 0.1) {
      # pMissing if proportion of missing values
      m<- matrix(sample(c(1, 0), size = nrow * ncol, replace = TRUE),
          nrow, ncol)
      m[runif(nrow * ncol)<   pMissing]<- NA
      data.frame(m)
}

E.g.,

    >   set.seed(168)
    >   d<- makeData(15,3)
    >   d
        X1 X2 X3
     1   1  1  1
     2   0  0 NA
     3   0  1  0
     4   0  0 NA
     5   0  1  1
     6   0  0 NA
     7   1  0  0
     8   0  1  1
     9   0  0  1
    10   1  1 NA
    11   0  0  1
    12   0  0  0
    13  NA NA NA
    14   0  0  0
    15   1  0  0

I think the following function does what you want.
The algorithm is pretty similar to what you showed.

    columnOfFirstOne<- function(data) {
        # col will be return value, one entry per row of data.
        # Fill it with NA's: NA in output will mean there were no 1's in row
        col<- rep(as.integer(NA), nrow(data))
        for (j in seq_len(ncol(data))) { # loop over columns
            # For each entry in 'col', if it has not been set yet
            # and this entry the j'th column of data is 1 (and not
missing)
            # then set to the column number.
            col[is.na(col)&   !is.na(data[, j])&   data[, j] == 1]<- j
        }
        col # return this from function
    }

With the above data we get
    >   columnOfFirstOne(d)
     [1]  1 NA  2 NA  2 NA  1  2  3  1  3 NA NA NA  1

It seems quick enough for a dataset of your size
    >   dd<- makeData(nrow=1500, ncol=140)
    >   system.time(columnOfFirstOne(dd)) # time in seconds
       user  system elapsed
       0.08    0.00    0.08

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

-----Original Message-----
From: r-help-bounces at r-project.org
[mailto:r-help-bounces at r-project.org] On Behalf Of David Herzberg
Sent: Friday, October 22, 2010 8:34 AM
To: r-help at r-project.org
Subject: [R] Conditional looping over a set of variables in R

Here's the problem I'm trying to solve in R: I have a data frame that
consists of about 1500 cases (rows) of data from kids who took a test
of listening comprehension. The columns are their scores (1 = correct,
0 = incorrect,  . = missing) on 140 test items. The items are numbered
sequentially and are ordered by increasing difficulty as you go from
left to right across the columns. I want R to go through the data and
find the first correct response for each case. Because of basal and
ceiling rules, many cases have missing data on many items before the
first correct response appears.

For each case, I want R to evaluate the item responses sequentially
starting with item 1. If the score is 0 or missing, proceed to the
next item and evaluate it. If the score is 1, stop the operation for
that case, record the item number of that first correct response in a
new variable, proceed to the next case, and restart the operation.

In SPSS, this operation would be carried out with LOOP, VECTOR, and DO
IF, as follows (assuming the data set is already loaded):

* DECLARE A NEW VARIABLE TO HOLD THE ITEM NUMBER OF THE FIRST CORRECT
RESPONSE, SET IT EQUAL TO 0.
numeric LCfirst1.
comp LCfirst1 = 0

* DECLARE A VECTOR TO HOLD THE 140 ITEM RESPONSE VARIABLES.
vector x=LC1a_score to LC140a_score.

* SET UP A LOOP THAT WILL RUN FROM 1 TO 140, AS LONG AS
LCfirst1 = 0. "#i" IS AN INDEX VARIABLE THAT INCREASES BY 1 EACH TIME
THE LOOP RUNS.
loop #i=1 to 140 if (LCfirst1 = 0).

* SET UP A CONDITIONAL TRANSFORMATION THAT IS EVALUATED FOR EACH
ELEMENT OF THE VECTOR.  THUS, WHEN #i = 1, THE EXPRESSION EVALUATES
THE FIRST ELEMENT OF THE VECTOR (THAT IS, THE FIRST OF THE 140 ITEM
RESPONSES). AS THE LOOP RUNS AND #i INCREASES, SUBSEQUENT VECTOR
ELELMENTS ARE EVALUATED.
THE do if STATEMENT RETAINS CONTROL AND KEEPS LOOPING THROUGH THE
VECTOR UNTIL A '1' IS ENCOUNTERED.
+ do if x(#i) = 1.

* WHEN A '1' IS ENCOUNTERED, CONTROL PASSES TO THE NEXT STATEMENT,
WHICH RECODES THE VALUE OF THAT VECTOR ELEMENT TO '99'.
+ comp x(#i) = 99.

* AND THEN CONTROL PASSES TO THE NEXT STATEMENT, WHICH RECODES THE
VALUE OF LCfirst1 TO THE CURRENT INDEX VALUE, THUS CAPTURING THE ITEM
NUMBER OF THE FIRST CORRECT RESPONSE FOR THAT CASE. CHANGING THE VALUE
OF LCfirst1 ALSO CAUSE S THE LOOP TO STOP EXECUTING FOR THAT CASE, AND
THE PROGRAM MOVES TO THE NEXT CASE AND RESTARTS THE LOOP.
+ comp LCfirst1 = #i.
+ end if.
end loop.
exe.

After several hours of trying to translate this procedure to R, I'm
stumped. I played around with creating a list to hold the item
responses variables (analogous to 'vector' in SPSS), but when I tried
to use the list in an R procedure, I kept getting a warning along the
lines of  'the list contains>   1 element, only the first element will
be used'. So perhaps a list is not the appropriate class to 'hold'
these variables?

It seems that some nested arrangement of 'for' 'while' and/or 'lapply'
will allow me to recreate the operation described above? How do I set
up the indexing operation analogous to 'loop #i' in SPSS?

Any help is appreciated, and I'm happy to provide more information if
needed.

David S. Herzberg, Ph.D.
Vice President, Research and Development Western Psychological
Services
12031 Wilshire Blvd.
Los Angeles, CA 90025-1251
Phone: (310)478-2061 x144
FAX: (310)478-7838
email: davidh at wpspublish.com

Whoops, got an extra comma in there somehow; should be:

?apply(d, 1, function(x) match(1, x))

A slight variation on this would be:

   apply(d, 1, match, x = 1)
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20101025/99e86f01/attachment.pl>
Hi

r-help-bounces at r-project.org napsal dne 25.10.2010 20:41:55:
Adrienne, there's one glitch when I implement your solution below. When 
the
loop encounters a case with no data at all (that is, all 140 item 
responses
are missing), it aborts and prints this error message: " ERROR: argument 
is
of length zero".

I wonder if there's a logical condition I could add that would enable R 
to
skip these empty cases and continue executing on the next case that 
contains data.
Thanks, Dave

David S. Herzberg, Ph.D.
Vice President, Research and Development
Western Psychological Services
12031 Wilshire Blvd.
Los Angeles, CA 90025-1251
Phone: (310)478-2061 x144
FAX: (310)478-7838
email: davidh at wpspublish.com

From: wootten.adrienne at gmail.com [mailto:wootten.adrienne at gmail.com] On 
Behalf
Of Adrienne Wootten
Sent: Friday, October 22, 2010 9:09 AM
To: David Herzberg
Cc: r-help at r-project.org
Subject: Re: [R] Conditional looping over a set of variables in R

David,

here I'm referring to your data as testmat, a matrix of 140 columns and 
1500
rows, but the same or similar notation can be applied to data frames in 
R.  If
I understand correctly, you are looking for the first response (column) 
where
you got a value of 1.  I'm assuming also that since your missing values 
are
characters then your two numeric values are also characters.  keeping 
all this
in mind, try something like this.
If you really only want to know which column in each row has first 
occurrence of 1 (or any other value)  you can get rid of looping and use 
other R capabilities.
set.seed(111)
mat<-matrix(sample(1:3, 20, replace=T),5,4)
mat
[,1] [,2] [,3] [,4]
[1,]    2    2    2    2
[2,]    3    1    2    1
[3,]    2    2    1    3
[4,]    2    2    1    1
[5,]    2    1    1    2
mat.w<-which(mat==1, arr.ind=T)
tapply(mat.w[,2], mat.w[,1], min)
2 3 4 5 
2 3 3 2
mat[2, ]<-NA
mat
[,1] [,2] [,3] [,4]
[1,]    2    2    2    2
[2,]   NA   NA   NA   NA
[3,]    2    2    1    3
[4,]    2    2    1    1
[5,]    2    1    1    2

and this approach smoothly works with NA values too
mat.w<-which(mat==1, arr.ind=T)
tapply(mat.w[,2], mat.w[,1], min)
3 4 5 
3 3 2 

You can then use modify such output as you have info about columns and 
rows. I am sure there are other maybe better options, e.g.

lll<-as.list(as.data.frame(t(mat)))
unlist(lapply(lll, function(x) min(which(x==1))))
V1  V2  V3  V4  V5 
Inf Inf   3   3   2

Regards
Petr
first = c() # your extra variable which will eventually contain the 
first
correct response for each case

for(i in 1:nrow(testmat)){

c = 1

while( c<=ncol(testmat) | testmat[i,c] != "1" ){

if( testmat[i,c] == "1"){

first[i] = c
break # will exit the while loop once it finds the first correct answer, 
and
then jump to the next case

 } else {

c=c+1 # procede to the next column if not

}

}

}

Hope this helps you out a bit.

Adrienne Wootten
NCSU

On Fri, Oct 22, 2010 at 11:33 AM, David Herzberg <davidh at wpspublish.com<
mailto:davidh at wpspublish.com>> wrote:
Here's the problem I'm trying to solve in R: I have a data frame that 
consists
of about 1500 cases (rows) of data from kids who took a test of 
listening
comprehension. The columns are their scores (1 = correct, 0 = incorrect, 
. =
missing) on 140 test items. The items are numbered sequentially and are 
ordered by increasing difficulty as you go from left to right across the 
columns. I want R to go through the data and find the first correct 
response
for each case. Because of basal and ceiling rules, many cases have 
missing
data on many items before the first correct response appears.

For each case, I want R to evaluate the item responses sequentially 
starting
with item 1. If the score is 0 or missing, proceed to the next item and 
evaluate it. If the score is 1, stop the operation for that case, record 
the
item number of that first correct response in a new variable, proceed to 
the
next case, and restart the operation.

In SPSS, this operation would be carried out with LOOP, VECTOR, and DO 
IF, as
follows (assuming the data set is already loaded):

* DECLARE A NEW VARIABLE TO HOLD THE ITEM NUMBER OF THE FIRST CORRECT 
RESPONSE, SET IT EQUAL TO 0.
numeric LCfirst1.
comp LCfirst1 = 0

* DECLARE A VECTOR TO HOLD THE 140 ITEM RESPONSE VARIABLES.
vector x=LC1a_score to LC140a_score.

* SET UP A LOOP THAT WILL RUN FROM 1 TO 140, AS LONG AS LCfirst1 = 0. 
"#i" IS
AN INDEX VARIABLE THAT INCREASES BY 1 EACH TIME THE LOOP RUNS.
loop #i=1 to 140 if (LCfirst1 = 0).

* SET UP A CONDITIONAL TRANSFORMATION THAT IS EVALUATED FOR EACH ELEMENT 
OF
THE VECTOR.  THUS, WHEN #i = 1, THE EXPRESSION EVALUATES THE FIRST 
ELEMENT OF
THE VECTOR (THAT IS, THE FIRST OF THE 140 ITEM RESPONSES). AS THE LOOP 
RUNS
AND #i INCREASES, SUBSEQUENT VECTOR ELELMENTS ARE EVALUATED. THE do if 
STATEMENT RETAINS CONTROL AND KEEPS LOOPING THROUGH THE VECTOR UNTIL A 
'1' IS
ENCOUNTERED.
+ do if x(#i) = 1.

* WHEN A '1' IS ENCOUNTERED, CONTROL PASSES TO THE NEXT STATEMENT, WHICH 
RECODES THE VALUE OF THAT VECTOR ELEMENT TO '99'.
+ comp x(#i) = 99.

* AND THEN CONTROL PASSES TO THE NEXT STATEMENT, WHICH RECODES THE VALUE 
OF
LCfirst1 TO THE CURRENT INDEX VALUE, THUS CAPTURING THE ITEM NUMBER OF 
THE
FIRST CORRECT RESPONSE FOR THAT CASE. CHANGING THE VALUE OF LCfirst1 
ALSO
CAUSE S THE LOOP TO STOP EXECUTING FOR THAT CASE, AND THE PROGRAM MOVES 
TO THE
NEXT CASE AND RESTARTS THE LOOP.
+ comp LCfirst1 = #i.
+ end if.
end loop.
exe.

After several hours of trying to translate this procedure to R, I'm 
stumped. I
played around with creating a list to hold the item responses variables 
(analogous to 'vector' in SPSS), but when I tried to use the list in an 
R
procedure, I kept getting a warning along the lines of  'the list 
contains > 1
element, only the first element will be used'. So perhaps a list is not 
the
appropriate class to 'hold' these variables?

It seems that some nested arrangement of 'for' 'while' and/or 'lapply' 
will
allow me to recreate the operation described above? How do I set up the 
indexing operation analogous to 'loop #i' in SPSS?

Any help is appreciated, and I'm happy to provide more information if 
needed.
David S. Herzberg, Ph.D.
Vice President, Research and Development
Western Psychological Services
12031 Wilshire Blvd.
Los Angeles, CA 90025-1251
Phone: (310)478-2061 x144
FAX: (310)478-7838
email: davidh at wpspublish.com<mailto:davidh at wpspublish.com>

       [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org<mailto:R-help at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

   [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.