Efficient way to use data frame of indices to initialize matrix

5 messages · gene, Whit Armstrong, Greg Snow +1 more

Original

1

5

gene

Tue, Dec 7, 2010 10:31 AM #

I have a data frame with three columns, x, y, and a.  I want to create a matrix from these values such that for matrix m:
m[x,y] == a

Obviously, I can go row by row through the data frame and insert the value a at the correct x,y location in the matrix.  I can make that slightly more efficient (perhaps), by doing something like this:

But I feel that there must be a more efficient, or at least more elegant way to do this.

--
Gene

Tue, Dec 7, 2010 10:40 AM #

index m as a vector and do the assignment in one step

i <- df$row + (df$col-1)*nrow(m)
m[i] <- df$a

or something along those lines.

-Whit

On Tue, Dec 7, 2010 at 1:31 PM, Cutler, Gene <gcutler at amgen.com> wrote:

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Greg Snow

Tue, Dec 7, 2010 10:49 AM #

tmpdf <- data.frame( x = c(1,2,3), y=c(2,3,1), a=c(10,20,30) )
mymat <- matrix(0, ncol=3, nrow=3)
mymat[ as.matrix(tmpdf[,c('x','y')]) ] <- tmpdf$a

Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Cutler, Gene
> Sent: Tuesday, December 07, 2010 11:31 AM
> To: r-help at r-project.org
> Subject: [R] Efficient way to use data frame of indices to initialize
> matrix
> 
> I have a data frame with three columns, x, y, and a.  I want to create
> a matrix from these values such that for matrix m:
> m[x,y] == a
> 
> Obviously, I can go row by row through the data frame and insert the
> value a at the correct x,y location in the matrix.  I can make that
> slightly more efficient (perhaps), by doing something like this:
> > for (each.x in unique(df$x)) m[each.x, df$y[df$x == each.x]] <-
> df$a[df$x == each.x]
> 
> But I feel that there must be a more efficient, or at least more
> elegant way to do this.
> 
> --
> Gene
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius

Tue, Dec 7, 2010 10:59 AM #

On Dec 7, 2010, at 1:49 PM, Greg Snow wrote:

cbind is also useful for assembly of arguments to the  matrix-`[<-`  
function:

tmpdf <- data.frame( x = c(1,2,3), y=c(2,3,1), a=c(10,20,30) )
  mymat <- matrix(NA, ncol=max(tmpdf$y), nrow=max(tmpdf$x))
  mymat[ cbind(tmpdf$x,tmpdf$y) ] <- tmpdf$a

  mymat
      [,1] [,2] [,3]
[1,]   NA   10   NA
[2,]   NA   NA   20
[3,]   30   NA   NA

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-
guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT

gene

Wed, Dec 8, 2010 9:51 AM #

Thanks for the three great answers!  For those who are curious, I timed the three approaches:

nr <- 15812
nc <- 64636
mymat <- matrix(nrow=nr, ncol=nc)
mymat[1,1] <- 1 # see note below

# mydf is created elsewhere
dim(mydf)
# 10910263        3
colnames(mydf)
# "x" "y" "a"

# approach 1:
# mymat[ mydf$x + (mydf$y-1) * nc ] <- mydf$a

# approach 2:
# mymat[ as.matrix(mydf[,2:1]) ] <- mydf$a

# approach 3:
# mymat[ cbind(mydf$x, mydf$y) ] <- mydf$a


system.time( for (i in 1:10) mymat[ mydf$x + (mydf$y-1) * nc ] <- mydf$a )
system.time( for (i in 1:10) mymat[ as.matrix(mydf$x, mydf$y) ] <- mydf$a )
system.time( for (i in 1:10) mymat[ cbind(mydf$x, mydf$y) ] <- mydf$a )


#   user  system elapsed 
# 10.478   3.837  14.317 <- #1
#  9.064   1.711  10.777 <- #2
# 10.747   2.702  13.450 <- #3

So you can see that approach #2 is the fastest.  Note that I found that initializing the new matrix with its first value takes about 8 elapsed seconds all on its own, which is why I have that initialization line above.

--
Gene

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-
guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-

and provide commented, minimal, self-contained, reproducible code.