So, I and some other colleagues developed a hierarchical clustering
algorithm to basically find the main clusters involving agricultural
industries according to a particular city (e.g. London city).. We
structured this algorithm in R. It is working perfectly. So, according to
our filters that we inserted in the algorithm, we were able to generate 6
clustering scenarios to London city. For example, the first scenario
generated 2 clusters, the second scenario 5 clusters, and so on. I would
therefore like some help on how I can choose the most appropriate one. I
saw that there are some packages that help in this process, like `pvclust`,
but I couldn't use it for my case. I am inserting a brief executable code
below to show the essence of what I want.
Any help is welcome! If you know how to use using another package, feel
free to describe.
Best Regards.
library(rdist)
library(geosphere)
library(fpc)
df<-structure(list(Industries = c(1,2,3,4,5,6),
+ Latitude = c(-23.8, -23.8, -23.9, -23.7,
-23.7,-23.7),
+ Longitude = c(-49.5, -49.6, -49.7, -49.8,
-49.6,-49.9),
+ Waste = c(526, 350, 526, 469, 534, 346)), class =
"data.frame", row.names = c(NA, -6L))
df1<-df
#clusters
coordinates<-df[c("Latitude","Longitude")]
d<-as.dist(distm(coordinates[,2:1]))
fit.average<-hclust(d,method="average")
clusters<-cutree(fit.average, k=2)
df$cluster <- clusters
> df
Industries Latitude Longitude Waste cluster
1 1 -23.8 -49.5 526 1
2 2 -23.8 -49.6 350 1
3 3 -23.9 -49.7 526 1
4 4 -23.7 -49.8 469 2
5 5 -23.7 -49.6 534 1
6 6 -23.7 -49.9 346 2
>
clusters1<-cutree(fit.average, k=5)
df1$cluster <- clusters1
> df1
Industries Latitude Longitude Waste cluster
1 1 -23.8 -49.5 526 1
2 2 -23.8 -49.6 350 1
3 3 -23.9 -49.7 526 2
4 4 -23.7 -49.8 469 3
5 5 -23.7 -49.6 534 4
6 6 -23.7 -49.9 346 5
>
Find the ideal cluster
4 messages · Jovani T. de Souza, David L Carlson, Michael Dewey
Look at the Cluster Analysis Task View, particularly section "Additional Functionality" (https://cran.r-project.org/web/views/Cluster.html) Maybe package clValid: The R package clValid contains functions for validating the results of a clustering analysis. There are three main types of cluster validation measures available, ?internal?, ?stability?, and ?biological?. The user can choose from nine clustering algorithms in existing R packages, including hierarchical, K-means, self-organizing maps (SOM), and model based clustering. In addition, we provide a function to perform the self-organizing tree algorithm (SOTA) method of clustering. Any combination of validation measures and clustering methods can be requested in a single function call. This allows the user to simultaneously evaluate several clustering algorithms while varying the number of clusters, to help determine the most appropriate method and number of clusters for the dataset of interest. Additionally, the package can automatically make use of the biological information contained in the Gene Ontology (GO) database to calculate the biological validation measures, via the annotation packages available in Bioconductor. The function returns an object of S4 class clValid, which has summary, plot, print, and additional methods which allow the user to display the optimal validation scores and extract clustering results. David L Carlson Professor Emeritus Texas A&M University On Sat, Dec 12, 2020 at 9:27 AM Jovani T. de Souza
<jovanisouza5 at gmail.com> wrote:
So, I and some other colleagues developed a hierarchical clustering
algorithm to basically find the main clusters involving agricultural
industries according to a particular city (e.g. London city).. We
structured this algorithm in R. It is working perfectly. So, according to
our filters that we inserted in the algorithm, we were able to generate 6
clustering scenarios to London city. For example, the first scenario
generated 2 clusters, the second scenario 5 clusters, and so on. I would
therefore like some help on how I can choose the most appropriate one. I
saw that there are some packages that help in this process, like `pvclust`,
but I couldn't use it for my case. I am inserting a brief executable code
below to show the essence of what I want.
Any help is welcome! If you know how to use using another package, feel
free to describe.
Best Regards.
library(rdist)
library(geosphere)
library(fpc)
df<-structure(list(Industries = c(1,2,3,4,5,6),
+ Latitude = c(-23.8, -23.8, -23.9, -23.7,
-23.7,-23.7),
+ Longitude = c(-49.5, -49.6, -49.7, -49.8,
-49.6,-49.9),
+ Waste = c(526, 350, 526, 469, 534, 346)), class =
"data.frame", row.names = c(NA, -6L))
df1<-df
#clusters
coordinates<-df[c("Latitude","Longitude")]
d<-as.dist(distm(coordinates[,2:1]))
fit.average<-hclust(d,method="average")
clusters<-cutree(fit.average, k=2)
df$cluster <- clusters
> df
Industries Latitude Longitude Waste cluster
1 1 -23.8 -49.5 526 1
2 2 -23.8 -49.6 350 1
3 3 -23.9 -49.7 526 1
4 4 -23.7 -49.8 469 2
5 5 -23.7 -49.6 534 1
6 6 -23.7 -49.9 346 2
>
clusters1<-cutree(fit.average, k=5)
df1$cluster <- clusters1
> df1
Industries Latitude Longitude Waste cluster
1 1 -23.8 -49.5 526 1
2 2 -23.8 -49.6 350 1
3 3 -23.9 -49.7 526 2
4 4 -23.7 -49.8 469 3
5 5 -23.7 -49.6 534 4
6 6 -23.7 -49.9 346 5
>
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!Tws97pkPo-5PwOFVXUKnAB17jy4Wop-N5HsB9u3NBOLATWcys9Qz_h8zZmhqq5I$ PLEASE do read the posting guide https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!Tws97pkPo-5PwOFVXUKnAB17jy4Wop-N5HsB9u3NBOLATWcys9Qz_h8zUffJHwg$ and provide commented, minimal, self-contained, reproducible code.
Dear Jovani If you cross-post on CrossValidated as well as here it is polite to give a link so people do not answer here when someone has already answered there, or vice versa. Michael
On 12/12/2020 15:27, Jovani T. de Souza wrote:
So, I and some other colleagues developed a hierarchical clustering
algorithm to basically find the main clusters involving agricultural
industries according to a particular city (e.g. London city).. We
structured this algorithm in R. It is working perfectly. So, according to
our filters that we inserted in the algorithm, we were able to generate 6
clustering scenarios to London city. For example, the first scenario
generated 2 clusters, the second scenario 5 clusters, and so on. I would
therefore like some help on how I can choose the most appropriate one. I
saw that there are some packages that help in this process, like `pvclust`,
but I couldn't use it for my case. I am inserting a brief executable code
below to show the essence of what I want.
Any help is welcome! If you know how to use using another package, feel
free to describe.
Best Regards.
library(rdist)
library(geosphere)
library(fpc)
df<-structure(list(Industries = c(1,2,3,4,5,6),
+ Latitude = c(-23.8, -23.8, -23.9, -23.7,
-23.7,-23.7),
+ Longitude = c(-49.5, -49.6, -49.7, -49.8,
-49.6,-49.9),
+ Waste = c(526, 350, 526, 469, 534, 346)), class =
"data.frame", row.names = c(NA, -6L))
df1<-df
#clusters
coordinates<-df[c("Latitude","Longitude")]
d<-as.dist(distm(coordinates[,2:1]))
fit.average<-hclust(d,method="average")
clusters<-cutree(fit.average, k=2)
df$cluster <- clusters
> df
Industries Latitude Longitude Waste cluster
1 1 -23.8 -49.5 526 1
2 2 -23.8 -49.6 350 1
3 3 -23.9 -49.7 526 1
4 4 -23.7 -49.8 469 2
5 5 -23.7 -49.6 534 1
6 6 -23.7 -49.9 346 2
>
clusters1<-cutree(fit.average, k=5)
df1$cluster <- clusters1
> df1
Industries Latitude Longitude Waste cluster
1 1 -23.8 -49.5 526 1
2 2 -23.8 -49.6 350 1
3 3 -23.9 -49.7 526 2
4 4 -23.7 -49.8 469 3
5 5 -23.7 -49.6 534 4
6 6 -23.7 -49.9 346 5
>
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
2 days later
Thank you so much! Sorry Michael, I will insert in the next. Best regards. [image: Mailtrack] <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> Remetente notificado por Mailtrack <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&> 14/12/20 19:11:49 Em s?b., 12 de dez. de 2020 ?s 14:06, Michael Dewey <lists at dewey.myzen.co.uk> escreveu:
Dear Jovani If you cross-post on CrossValidated as well as here it is polite to give a link so people do not answer here when someone has already answered there, or vice versa. Michael On 12/12/2020 15:27, Jovani T. de Souza wrote:
So, I and some other colleagues developed a hierarchical clustering algorithm to basically find the main clusters involving agricultural industries according to a particular city (e.g. London city).. We structured this algorithm in R. It is working perfectly. So, according to our filters that we inserted in the algorithm, we were able to generate 6 clustering scenarios to London city. For example, the first scenario generated 2 clusters, the second scenario 5 clusters, and so on. I would therefore like some help on how I can choose the most appropriate one. I saw that there are some packages that help in this process, like
`pvclust`,
but I couldn't use it for my case. I am inserting a brief executable code
below to show the essence of what I want.
Any help is welcome! If you know how to use using another package, feel
free to describe.
Best Regards.
library(rdist)
library(geosphere)
library(fpc)
df<-structure(list(Industries = c(1,2,3,4,5,6),
+ Latitude = c(-23.8, -23.8, -23.9, -23.7,
-23.7,-23.7),
+ Longitude = c(-49.5, -49.6, -49.7, -49.8,
-49.6,-49.9),
+ Waste = c(526, 350, 526, 469, 534, 346)),
class =
"data.frame", row.names = c(NA, -6L))
df1<-df
#clusters
coordinates<-df[c("Latitude","Longitude")]
d<-as.dist(distm(coordinates[,2:1]))
fit.average<-hclust(d,method="average")
clusters<-cutree(fit.average, k=2)
df$cluster <- clusters
> df
Industries Latitude Longitude Waste cluster
1 1 -23.8 -49.5 526 1
2 2 -23.8 -49.6 350 1
3 3 -23.9 -49.7 526 1
4 4 -23.7 -49.8 469 2
5 5 -23.7 -49.6 534 1
6 6 -23.7 -49.9 346 2
>
clusters1<-cutree(fit.average, k=5)
df1$cluster <- clusters1
> df1
Industries Latitude Longitude Waste cluster
1 1 -23.8 -49.5 526 1
2 2 -23.8 -49.6 350 1
3 3 -23.9 -49.7 526 2
4 4 -23.7 -49.8 469 3
5 5 -23.7 -49.6 534 4
6 6 -23.7 -49.9 346 5
>
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Michael http://www.dewey.myzen.co.uk/home.html