Skip to content

Performance problems to fill up a dataframe

3 messages · Florian Jansen, Adrian Dusa, Wayne.W.Jones at shell.com

#
Dear Listmembers,

I'm trying to fill up a dataframe depending on an arbitrary list of 
references:

Here is my code, which works:

dat <- data.frame(c(60001,60001,60050,60050,60050),c(27,129,618,27,1579))
LR <- sort(unique(dat[,1]))
LC <- sort(unique(dat[,2]))
m <- as.data.frame(matrix(data=NA, nrow=length(LR), ncol=length(LC), 
dimnames=list(LR,LC)))

for(i in 1:nrow(dat)){
  m[as.character(dat[i,1]), as.character(dat[i,2])] <- 1
  }
m[is.na(m)] <- 0

Now I'm trying to prevent the loop, because it take ages for a list of 
20000 entries, but I run out of ideas.
Should I inflate my list beforehand and how? Can I adress the dataframe 
fields more effieciently?

Thanks for your help.
#
Try to assign some names to your initial variables:
dat <- data.frame(A=c(60001,60001,60050,60050,60050), B=c(27,129,618,27,1579))

And what you want is simply:
B
A       27 129 618 1579
  60001  1   1   0    0
  60050  1   0   1    1

Why do you need it as a dataframe anyway?
Hth,
Adrian
On Monday 24 September 2007, Florian Jansen wrote:

  
    
#
Use table: 

dat <- data.frame(c(60001,60001,60050,60050,60050),c(27,129,618,27,1579))
table(dat[,1],dat[,2])

       
        27 129 618 1579
  60001  1   1   0    0
  60050  1   0   1    1

Regards

Wayne


-----Original Message-----
From: r-help-bounces at r-project.org
[mailto:r-help-bounces at r-project.org]On Behalf Of Florian Jansen
Sent: 24 September 2007 10:37
To: r-help at r-project.org
Subject: [R] Performance problems to fill up a dataframe


Dear Listmembers,

I'm trying to fill up a dataframe depending on an arbitrary list of 
references:

Here is my code, which works:

dat <- data.frame(c(60001,60001,60050,60050,60050),c(27,129,618,27,1579))
LR <- sort(unique(dat[,1]))
LC <- sort(unique(dat[,2]))
m <- as.data.frame(matrix(data=NA, nrow=length(LR), ncol=length(LC), 
dimnames=list(LR,LC)))

for(i in 1:nrow(dat)){
  m[as.character(dat[i,1]), as.character(dat[i,2])] <- 1
  }
m[is.na(m)] <- 0

Now I'm trying to prevent the loop, because it take ages for a list of 
20000 entries, but I run out of ideas.
Should I inflate my list beforehand and how? Can I adress the dataframe 
fields more effieciently?

Thanks for your help.