Harris Matrix with Graphviz

On most excavations the large number of stratigraphic units and contexts makes it necessary to use some sort of representation of the relative chronological sequence to keep track of what has already been excavated (not to mention building archaeology). This tool is the Harris Matrix.

It can be defined as a directed graph from the most recent down to the older deposits, where the nodes represent layers, that are connected through stratigraphic relations (edges). This year in Gortyna I tried to use the Graphviz software for automating the creation of the Harris Matrix for the excavation area I was in.

UPDATE: I’ve published a first draft of a simple application to automate the generation of the Harris Matrix. Read more here.

An example of Harris Matrix

There are two steps involved here:

keeping the stratigraphic information stored in some way
processing information to obtain the graph

For installing Graphviz, you should follow the instructions for the operating system you are using: it is straightforward in most cases. The most important thing you ought to know before diving into this brief tutorial is that GV is not a GUI application, i.e. you don’t draw your graph, but rather you describe it in a text file. The file is then processed using one of the many programs that Graphviz If you can’t live without buttons and menus, try Dia, that can export also in Graphviz .dot format. Dia is a GTK+ based diagram creation program for Linux, Unix and Windows released under the GPL license.

Graphviz has its own native, plain text format, that is documented on the website. Graphviz .dot files can be read and written with any text editor like kate, gedit, jedit or notepad++. Keeping a file of this kind is the obvious choice for an experiment, but of course the single-file approach has also lot of problems.

This is a sample from the final .dot file I had compiled during the excavation weeks:

digraph matrix {
    723->722
    505->732
    729->732
    731->730->729
    726->729
    730->726
    726->810->725
    729->810->725
    729->733->792->793
    722->731
    732->737->736->733
    733->810->725
    729->505
    736->506
    505->506
    179->759
    759->725
    759->737
    759->769->768->778
    768->303
    737->739->736->778
    736->769
    778->303
    506->303
    769->506
    769->780
    778->779
    736->773->774->779->780
    779->303
    780->303
    506->780
    505->724
}

Apart from the initial preamble, it’s a ridiculously easy syntax. The Harris Matrix is to be read top-down, so i.e. A -> B means “A comes after B”. You can also concatenate multiple relations on the same row. Indenting is not mandatory, but it helps keeping your file clean. You can write comments on any line after a # character, like

# this is a comment
A -> B -> C
A -> D -> E # this one too!

It’s not that difficult to keep this file updated by hand, really. One thing you could worry about are redundant relations that could for sure make your graph ugly and unreadable. But this is about automation, so this isn’t going to be a problem: we’ll be recording each relation, even the useless ones.

We said at the beginning that the Harris Matrix is a directed graph. Graphviz comes with a lot of tools, but only one does what we need, and it’s named dot. From the command line we can just run

dot harris-matrix.dot -Tpng -o harris-matrix.png

and get in zero seconds our data compiled as a graph. The -Tpng command line option specifies which one of the many available output formats we want to get. The -o flag (that is, option) precedes the output filename.

So far, the result is quite good. But redundant relations are still there, and I promised it wouldn’t be a problem at all.

Here’s when the power of UNIX comes in help. tred is another of the many tools provided by Graphviz, that acts as a “transitive reduction filter for directed graphs”. So, it has to run before dot reads the input file. A pipe (represented by the | character) is the easiest way to pass data from one program to another in UNIX style. Here’s how I did it:

tred harris-matrix.dot | dot -Tpng -o harris-matrix-tred.png

Note that dot by default accepts input from stdin, while tred by default uses stdout as output. Many simple programs that do one single operation, well done: this is the core of the UNIX philosophy, and Graphviz follows it. Once you understand this concept, things will be much easier. The output of this second command is slightly different from the first one:

You can play around with some general options to change the graphic layout of your graph. These are two options I often use to get better looking Harris Matrices:

digraph matrix { # these two options go at the beginning of the graph file
    concentrate=true;
    node[shape=rect];

That’s enough for now. In the next tutorial, we’ll go further, using Graphviz as a programming library through Python. This means that we won’t need anymore to enter manually the relations, we will have a GUI, and our data will be stored in a database.

Stefano Costa