Statistical analysis of olive dataset
Inline. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Sat, Mar 12, 2016 at 9:39 AM, Axel <axeldibert at alice.it> wrote:
Hi to all the members of the list! I am a novice as regards to statistical analysis and the use of the R software, so I am experimenting with the dataset "olive" included in the package "tourr".
Stop experimenting and spend time with an R tutorial or two? There are many good ones on the Web. See also https://www.rstudio.com/online-learning/#R for some recommendations.
This dataset contains the results of the determination of the fatty acids in 572 samples of olive oil from Italy (columns from 3 to 10) along with the area and the region of origin of the oil (respectively, column 1 and column 2). The main goal of my analysis is to determine which are the fatty acids that characterize the origin of an oil. As a secondary goal, I wolud like to insert the results of the chemical analysis of an oil that I analyzed (I am a Chemistry student) in order to determine its region of production. I do not know if this last thing is possibile. I am using R 3.2.4 on MacOS X El Capitan with the packages "tourr" and "psych" loaded. Here are the commands I have used up to now: olivenum <- olive[,c(3: 10)] mean <- colMeans(olivenum) sd <- sapply(olivenum,sd) describeBy(olivenum, olive[2]) pairs(olivenum) R <- cor(olivenum) eigen(R) # Since the first three autovalues are greater than 1, these are the main components (column 1, 2 and 3). But I can determine them also using a scree diagram as following. Right? autoval <- eigen(R)$values autovec <- eigen(R)$vectors pvarsp <- autoval/ncol (olivenum) plot(autoval,type="b",main="Scree diagram",xlab="Number of components",ylab="Autovalues") abline(h=1,lwd=3,col="red") eigen (R)$vectors[, 1:3] olive.scale <- scale(olivenum,T,T) points <- olive.scale%*%autovec[,1:3] #Since I selected three main components (three columns), how should I plot the dispersion graph? I do not think that what I have done is right: plot(points, main="Dispersion graph",xlab="Component 1",ylab="Component 2") princomp (olivenum,cor=T) #With the following command I obtain a summary of the importance of components. For example, the variance of component 1 is about 0,465, of component 2 is 0,220 and of component 3 is 0,127 with a cumulative variance of 0,812. This means that the values in the first three columns of the matrix "olivenum" mostly characterize the differences between the observations. Right? summary(princomp(olivenum,cor=T)) screeplot(princomp(olivenum,cor=T)) plot(princomp(olivenum,cor=T)$scores,rownames(olivenum)) abline(h=0,v=0) I determined that three components can explain a great part of variability but I don't know which are these components. How should I continue? Thank you for attention, Axel
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.