Cluster analysis

Get the stamped bricks' XRD dataset by Shawn Graham here and save it in a plain text file.

> xrd <- read.delim("bricks_xrd.txt", sep="\t")
> summary(xrd)
     Sample       Quartz           Augite        Haematite       Gehlenite        Calcite         Analcime    Muscovite    
 se 14  : 2   Min.   : 11.00   Min.   : 10.0   Min.   : 3.00   Min.   : 2.00   Min.   :  2.0   Min.   : 5   Min.   : 3.00  
 fal 1  : 1   1st Qu.: 36.75   1st Qu.: 29.5   1st Qu.: 8.00   1st Qu.:11.00   1st Qu.: 40.0   1st Qu.:11   1st Qu.: 9.00  
 fal 2  : 1   Median : 52.00   Median : 54.0   Median :11.00   Median :30.00   Median : 66.0   Median :15   Median :12.00  
 fal 3  : 1   Mean   : 59.24   Mean   : 60.1   Mean   :12.98   Mean   :30.46   Mean   : 64.9   Mean   :17   Mean   :14.24  
 fnv13  : 1   3rd Qu.: 86.00   3rd Qu.: 91.5   3rd Qu.:17.00   3rd Qu.:47.00   3rd Qu.: 95.5   3rd Qu.:20   3rd Qu.:17.00  
 fnv14  : 1   Max.   :117.00   Max.   :120.0   Max.   :34.00   Max.   :90.00   Max.   :127.0   Max.   :50   Max.   :42.00  
 (Other):89                    NA's   :  9.0   NA's   : 8.00   NA's   :37.00   NA's   : 33.0   NA's   :51   NA's   :34.00  
    Dolomite      Anorthoclase       Sanidine         Albite     
 Min.   : 3.00   Min.   : 20.00   Min.   :  5.0   Min.   :  3.0  
 1st Qu.:17.00   1st Qu.: 42.00   1st Qu.: 30.0   1st Qu.: 50.0  
 Median :23.50   Median : 66.00   Median : 50.0   Median : 65.5  
 Mean   :24.43   Mean   : 65.33   Mean   : 49.5   Mean   : 65.2  
 3rd Qu.:32.00   3rd Qu.: 90.00   3rd Qu.: 65.0   3rd Qu.: 86.5  
 Max.   :48.00   Max.   :115.00   Max.   :114.0   Max.   :117.0  
 NA's   :26.00   NA's   : 47.00   NA's   : 70.0   NA's   : 50.0  

hclust

Before creating the actual cluster dendrogram, we have to calculate the distance matrix from our data frame. For this task we use the dist() function:

> dist_xrd <- dist(xrd[-1])

(Note that the first column (label) is left intentionally out with the xrd[-1] syntax, i.e. all columns but the first)

We are ready to create the dendrogram. The syntax is quite plain, even though the console output is not very satisfying. The cluster object is saved to another variable because we are going to plot it.

> clust_xrd <- hclust(dist_xrd)
> clust_xrd

Call:
hclust(d = d_xrd)

Cluster method   : complete 
Distance         : euclidean 
Number of objects: 96 

And now plot it:

> plot(clust_xrd)

Maybe adding the right label to each leaf:

> plot(clust_xrd, labels = xrd$Sample)

And here's the result:

Hierarchical cluster dendrogram

Once you get acquainted with these functions, you can also get the plot with one single line:

> plot(hclust(dist(xrd[-1])), xrd$Sample, hang = -1, cex = 0.7)

Updated: