building a spatial matrix
Sorry, you're right.
The result line should be:
result.m[cbind(factor(result$fcell), factor(result$cellneigh))] <-
result$distance
idcell <- data.frame(
id = seq_len(5),
fcell = sample(1:100, 5))
censDist <- expand.grid(fcell=seq_len(100), cellneigh=seq_len(100))
censDist$distance <- runif(nrow(censDist))
# assemble the non-symmetric distance matrix
result <- subset(censDist, fcell %in% idcell$fcell & cellneigh %in%
idcell$fcell)
result.m <- matrix(NA, nrow=nrow(idcell), ncol=nrow(idcell))
result.m[cbind(factor(result$fcell), factor(result$cellneigh))] <-
result$distance
It's just about instantaneous on the dataset you sent me:
system.time({
result <- subset(censDist, f_cell %in% id_cell$f_cell & f_cell_neigh %in%
id_cell$f_cell)
result.m <- matrix(NA, nrow=nrow(id_cell), ncol=nrow(id_cell))
result.m[cbind(factor(result$f_cell), factor(result$f_cell_neigh))] <-
result$distance
})
user system elapsed
0.361 0.007 0.368
Sarah
On Fri, May 13, 2016 at 10:36 AM, A M Lavezzi <mario.lavezzi at unipa.it>
wrote:
PLEASE IGNORE THE PREVIOUS EMAIL, IT WAS SENT BY MISTAKE Hello Sarah thanks a lot for your advice. I followed your suggestions unitil the creation of "result" The allocation of the values of result$distance to the matrix result.m, however ,does not seem to work: it produces a matrix with identical
columns
corresponding to the last values of result$distance. Maybe my description
of
the dataset was not clear enough.
I produced the final matrix spat_dist with a loop, that I report below (it
takes about 1 hour on my macbook pro),
set_i = -1 # create a variable to store the i values already examined
for(i in unique(result$id)){
set_i=c(set_i,i) # store the value of the i
set_neigh = result$id_neigh[result$id==i & !result$id_neigh %in% set_i]
#
identify the locations connected to i. If the distance between i and j was
examined before, don't look for the distance between j and i
for(j in set_neigh){
if(i!=j){
spat_dist[i,j] = result$distance[result$id==i & result$id_neigh==j]
spat_dist[j,i] = spat_dist[i,j]
}
else{
spat_dist[i,j]=0
}
}
}
It is not the most elegant and efficient solution in the world, that's for
sure.
I would be grateful, if you could suggest an alternative instruction to:
result.m[factor(result$fcell), factor(result$cellneigh)] <-
result$distance
so I will learn a faster procedure (I tried many times but to modify this structure but I did not make it). I don't want to abuse of your time, so forget it if you are busy Thank you so much anyway, Mario ps I attach the data. Notice that the 1327 units in id_cell are firms, indexed by id, located in location f_cell. Different firms can be located
in
the same f_cell. With respect to your suggestion, I added two columns to "result" with the id of the firms. On Fri, May 13, 2016 at 3:26 PM, A M Lavezzi <mario.lavezzi at unipa.it>
wrote:
Hello Sarah thanks a lot for your advice. I followed your suggestions unitl the creation of "result" The allocation of the values of result$distance to the matrix result.m, however ,does not seem to work: it produces a matrix with identical
columns
corresponding to the last values of result$distance. Maybe my
description of
the dataset was not clear enough.
I produced the final matrix with a loop, that I report below (it takes
about 1 hour on my macbook pro),
set_i = -1 # create a variable to store the i values already examined
for(i in unique(result$id)){
set_i=c(set_i,i) # store the value of the i
set_neigh = result$id_neigh[result$id==i & !result$id_neigh %in% set_i]
# identify the locations connected to i. Exclude those
for(j in set_neigh){
if(i!=j){
spat_dist[i,j] = result$distance[result$id==i &
result$id_neigh==j]
spat_dist[j,i] = spat_dist[i,j]
}
else{
spat_dist[i,j]=0
}
}
}
It not the most elegant and efficient solution in the world, that's for
sure
On Thu, May 12, 2016 at 2:51 PM, Sarah Goslee <sarah.goslee at gmail.com>
wrote:
I don't see any reason why a loop is out of the question, and answering would have been much easier if you'd included the requested reproducible data, but what about this? This solution is robust to pairs from idcell being absent in censDist, and to the difference from A to B being different than the distance from B to A, but not to A-B appearing twice. If that's possible, you'll need to figure out how to manage it. # create some fake data idcell <- data.frame( id = seq_len(5), fcell = sample(1:100, 5)) censDist <- expand.grid(fcell=seq_len(100), cellneigh=seq_len(100)) censDist$distance <- runif(nrow(censDist)) # assemble the non-symmetric distance matrix result <- subset(censDist, fcell %in% idcell$fcell & cellneigh %in% idcell$fcell) result.m <- matrix(NA, nrow=nrow(idcell), ncol=nrow(idcell)) result.m[factor(result$fcell), factor(result$cellneigh)] <- result$distance Sarah On Thu, May 12, 2016 at 5:26 AM, A M Lavezzi <mario.lavezzi at unipa.it> wrote:
Hello, I have a sample of 1327 locations, each one idetified by an id and a numerical code. I need to build a spatial matrix, say, M, i.e. a 1327x1327 matrix collecting distances among the locations. M(i,i) should be 0, M(i,j) should contain the distance among location
i
and j I shoud use data organized in the following way: 1) id_cell contains the identifier (id) of each location (1...1327)
and
the numerical code of the location (f_cell) (see head of id_cell below)
head(id_cell)
id f_cell 1 1 2120 12 2 204 22 3 2546 24 4 1327 34 5 1729 43 6 2293 2) censDist contains, for each location identified by its numerical code, the distance to other locations (censDist has 1.5 million rows). The head(consist) below, for example, reads like this: location 2924 has a distance to 2732 of 1309.7525 location 2924 has a distance to 2875 of 696.2891, etc.
head(censDist)
f_cell f _cell_neigh distance 1 2924 2732 1309.7525 2 2924 2875 696.2891 3 2924 2351 1346.0561 4 2924 2350 1296.9804 5 2924 2725 1278.1877 6 2924 2721 1346.9126 Basically, for every location in id_cell I should pick up the
distance
to other locations in id_cell from censDist, and allocate it in M I have not come up with a satisfactory vectorizion of this problem and using a loop is out of question. Thanks for your help Mario