Skip to content

Conditional looping over a set of variables in R

10 messages · Adrienne Wootten, William Dunlap, David Herzberg +3 more

#
You were a bit vague about the format of your data.
I'm assuming all columns were numeric and the entries
are one of 0, 1, and NA (missing value).  I made a
little function to generate random data of that format
for testing purposes:

makeData <- function (nrow = 1500, ncol = 140, pMissing = 0.1) 
{
    # pMissing if proportion of missing values
    m <- matrix(sample(c(1, 0), size = nrow * ncol, replace = TRUE), 
        nrow, ncol)
    m[runif(nrow * ncol) < pMissing] <- NA
    data.frame(m)
}

E.g.,

  > set.seed(168)
  > d <- makeData(15,3)
  > d
      X1 X2 X3
   1   1  1  1
   2   0  0 NA
   3   0  1  0
   4   0  0 NA
   5   0  1  1
   6   0  0 NA
   7   1  0  0
   8   0  1  1
   9   0  0  1
  10   1  1 NA
  11   0  0  1
  12   0  0  0
  13  NA NA NA
  14   0  0  0
  15   1  0  0

I think the following function does what you want.
The algorithm is pretty similar to what you showed.

  columnOfFirstOne <- function(data) {
      # col will be return value, one entry per row of data.
      # Fill it with NA's: NA in output will mean there were no 1's in
row
      col <- rep(as.integer(NA), nrow(data))
      for (j in seq_len(ncol(data))) { # loop over columns
          # For each entry in 'col', if it has not been set yet
          # and this entry the j'th column of data is 1 (and not
missing)
          # then set to the column number.
          col[is.na(col) & !is.na(data[, j]) & data[, j] == 1] <- j
      }
      col # return this from function
  }

With the above data we get
  > columnOfFirstOne(d)
   [1]  1 NA  2 NA  2 NA  1  2  3  1  3 NA NA NA  1

It seems quick enough for a dataset of your size
  > dd <- makeData(nrow=1500, ncol=140)
  > system.time(columnOfFirstOne(dd)) # time in seconds
     user  system elapsed 
     0.08    0.00    0.08
 
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
#
Bill, thanks so much for this. I'll get a chance to test it later today, and will post the outcome.


David S. Herzberg, Ph.D.
Vice President, Research and Development 
Western Psychological Services
12031 Wilshire Blvd.
Los Angeles, CA 90025-1251
Phone: (310)478-2061 x144
FAX: (310)478-7838
email: davidh at wpspublish.com



-----Original Message-----
From: William Dunlap [mailto:wdunlap at tibco.com] 
Sent: Friday, October 22, 2010 9:52 AM
To: David Herzberg; r-help at r-project.org
Subject: RE: [R] Conditional looping over a set of variables in R

You were a bit vague about the format of your data.
I'm assuming all columns were numeric and the entries are one of 0, 1, and NA (missing value).  I made a little function to generate random data of that format for testing purposes:

makeData <- function (nrow = 1500, ncol = 140, pMissing = 0.1) {
    # pMissing if proportion of missing values
    m <- matrix(sample(c(1, 0), size = nrow * ncol, replace = TRUE), 
        nrow, ncol)
    m[runif(nrow * ncol) < pMissing] <- NA
    data.frame(m)
}

E.g.,

  > set.seed(168)
  > d <- makeData(15,3)
  > d
      X1 X2 X3
   1   1  1  1
   2   0  0 NA
   3   0  1  0
   4   0  0 NA
   5   0  1  1
   6   0  0 NA
   7   1  0  0
   8   0  1  1
   9   0  0  1
  10   1  1 NA
  11   0  0  1
  12   0  0  0
  13  NA NA NA
  14   0  0  0
  15   1  0  0

I think the following function does what you want.
The algorithm is pretty similar to what you showed.

  columnOfFirstOne <- function(data) {
      # col will be return value, one entry per row of data.
      # Fill it with NA's: NA in output will mean there were no 1's in row
      col <- rep(as.integer(NA), nrow(data))
      for (j in seq_len(ncol(data))) { # loop over columns
          # For each entry in 'col', if it has not been set yet
          # and this entry the j'th column of data is 1 (and not
missing)
          # then set to the column number.
          col[is.na(col) & !is.na(data[, j]) & data[, j] == 1] <- j
      }
      col # return this from function
  }

With the above data we get
  > columnOfFirstOne(d)
   [1]  1 NA  2 NA  2 NA  1  2  3  1  3 NA NA NA  1

It seems quick enough for a dataset of your size
  > dd <- makeData(nrow=1500, ncol=140)
  > system.time(columnOfFirstOne(dd)) # time in seconds
     user  system elapsed 
     0.08    0.00    0.08
 
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
1 day later
#
This won't be as quick as Bill's elegant solution, but it's a one-liner:

  apply(d, 1, function(x), match(1, x))

See ?match.

   -Peter Ehlers
On 2010-10-22 10:36, David Herzberg wrote:
#
Whoops, got an extra comma in there somehow; should be:

   apply(d, 1, function(x) match(1, x))

   -Peter Ehlers
On 2010-10-24 08:17, Peter Ehlers wrote:
#
On Sun, Oct 24, 2010 at 2:54 PM, Peter Ehlers <ehlers at ucalgary.ca> wrote:
A slight variation on this would be:

   apply(d, 1, match, x = 1)
#
Hi

r-help-bounces at r-project.org napsal dne 25.10.2010 20:41:55:
the
responses
is
to
contains data.
Behalf
1500
R.  If
where
are
all this
If you really only want to know which column in each row has first 
occurrence of 1 (or any other value)  you can get rid of looping and use 
other R capabilities.
[,1] [,2] [,3] [,4]
[1,]    2    2    2    2
[2,]    3    1    2    1
[3,]    2    2    1    3
[4,]    2    2    1    1
[5,]    2    1    1    2
2 3 4 5 
2 3 3 2
[,1] [,2] [,3] [,4]
[1,]    2    2    2    2
[2,]   NA   NA   NA   NA
[3,]    2    2    1    3
[4,]    2    2    1    1
[5,]    2    1    1    2

and this approach smoothly works with NA values too
3 4 5 
3 3 2 

You can then use modify such output as you have info about columns and 
rows. I am sure there are other maybe better options, e.g.

lll<-as.list(as.data.frame(t(mat)))
V1  V2  V3  V4  V5 
Inf Inf   3   3   2

Regards
Petr
first
and
consists
listening
. =
response
missing
starting
the
the
IF, as
"#i" IS
OF
ELEMENT OF
RUNS
'1' IS
OF
THE
ALSO
TO THE
stumped. I
R
contains > 1
the
will
needed.
http://www.R-project.org/posting-guide.html
http://www.R-project.org/posting-guide.html