Dear all,
given I have data in a data.frame which indicate the number of people in
a
specific year at a specific age:
n <- 10
mydf <- data.frame(yr=sample(1:10, size=n, replace=FALSE),
age=sample(1:12, size=n, replace=FALSE),
no=sample(1:10, size=n, replace=FALSE))
Now I would like to make a matrix with (in this simple example)
10 columns (for the years) and 12 rows (for the ages). In each cell,
I would like to put the correct number of individuals.
So far I was doing this as follows:
mymatrix <- matrix(0, ncol=10, nrow=12)
for (year in unique(mydf$yr)) {
for (age in unique(mydf$age)) {
if (length(mydf$no[mydf$yr==year & mydf$age==age]) > 0) {
mymatrix[age,year] <- mydf$no[mydf$yr==year & mydf$age==age]
} else {
mymatrix[age,year] <- 0
}
}
}
This is fairly fast in such a simple setting.
But with more years and ages (and for roughly 300 datasets) this becomes
pretty slow. And in addition, this is not really elegant R-code.
Can somebody point me into the direction how I can do that in a more
elegant
way, possibly avoiding the loops?
Thanks,
Roland
+++++
This mail has been sent through the MPI for Demographic Rese...{{dropped}}
Reshaping data
3 messages · Rau, Roland, Dimitris Rizopoulos, Peter Dalgaard
just try mymatrix <- matrix(0, 12, 10) mymatrix[cbind(mydf$age, mydf$yr)] <- mydf$no mymatrix I hope it helps. Best, Dimitris ---- Dimitris Rizopoulos Ph.D. Student Biostatistical Centre School of Public Health Catholic University of Leuven Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/(0)16/336899 Fax: +32/(0)16/337015 Web: http://www.med.kuleuven.be/biostat/ http://www.student.kuleuven.be/~m0390867/dimitris.htm ----- Original Message ----- From: "Rau, Roland" <Rau at demogr.mpg.de> To: <r-help at stat.math.ethz.ch> Sent: Thursday, December 08, 2005 9:50 AM Subject: [R] Reshaping data
Dear all,
given I have data in a data.frame which indicate the number of
people in
a
specific year at a specific age:
n <- 10
mydf <- data.frame(yr=sample(1:10, size=n, replace=FALSE),
age=sample(1:12, size=n, replace=FALSE),
no=sample(1:10, size=n, replace=FALSE))
Now I would like to make a matrix with (in this simple example)
10 columns (for the years) and 12 rows (for the ages). In each cell,
I would like to put the correct number of individuals.
So far I was doing this as follows:
mymatrix <- matrix(0, ncol=10, nrow=12)
for (year in unique(mydf$yr)) {
for (age in unique(mydf$age)) {
if (length(mydf$no[mydf$yr==year & mydf$age==age]) > 0) {
mymatrix[age,year] <- mydf$no[mydf$yr==year & mydf$age==age]
} else {
mymatrix[age,year] <- 0
}
}
}
This is fairly fast in such a simple setting.
But with more years and ages (and for roughly 300 datasets) this
becomes
pretty slow. And in addition, this is not really elegant R-code.
Can somebody point me into the direction how I can do that in a more
elegant
way, possibly avoiding the loops?
Thanks,
Roland
+++++
This mail has been sent through the MPI for Demographic
Rese...{{dropped}}
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
"Rau, Roland" <Rau at demogr.mpg.de> writes:
Dear all,
given I have data in a data.frame which indicate the number of people in
a
specific year at a specific age:
n <- 10
mydf <- data.frame(yr=sample(1:10, size=n, replace=FALSE),
age=sample(1:12, size=n, replace=FALSE),
no=sample(1:10, size=n, replace=FALSE))
Now I would like to make a matrix with (in this simple example)
10 columns (for the years) and 12 rows (for the ages). In each cell,
I would like to put the correct number of individuals.
So far I was doing this as follows:
mymatrix <- matrix(0, ncol=10, nrow=12)
for (year in unique(mydf$yr)) {
for (age in unique(mydf$age)) {
if (length(mydf$no[mydf$yr==year & mydf$age==age]) > 0) {
mymatrix[age,year] <- mydf$no[mydf$yr==year & mydf$age==age]
} else {
mymatrix[age,year] <- 0
}
}
}
This is fairly fast in such a simple setting.
But with more years and ages (and for roughly 300 datasets) this becomes
pretty slow. And in addition, this is not really elegant R-code.
Can somebody point me into the direction how I can do that in a more
elegant
way, possibly avoiding the loops?
This almost gets you there:
with(mydf, tapply(no,list(age,yr), sum))
except that it puts NA where you want 0, which you could fix with
m <- with(mydf, tapply(no,list(age,yr), sum))
m[is.na(m)] <- 0
m
Other options include matrix indexing:
with(mydf, {
M <- matrix(0,12,10)
M[cbind(age,yr)]<-no
})
or (tada...) the reshape() function, esp. if you want a data frame as
output.
O__ ---- Peter Dalgaard ??ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907