Harris Matrix with Graphviz
On most excavations the large number of stratigraphic units and contexts makes it necessary to use some sort of representation of the relative chronological sequence to keep track of what has already been excavated (not to mention building archaeology). This tool is the Harris Matrix.
It can be defined as a directed graph from the most recent down to the older deposits, where the nodes represent layers, that are connected through stratigraphic relations (edges). This year in Gortyna I tried to use the Graphviz software for automating the creation of the Harris Matrix for the excavation area I was in.
UPDATE: I’ve published a first draft of a simple application to automate the generation of the Harris Matrix. Read more here.
There are two steps involved here:
- keeping the stratigraphic information stored in some way
- processing information to obtain the graph
For installing Graphviz, you should follow the instructions for the
operating system you are using: it is straightforward in most
cases. The most important thing you ought to know before diving into
this brief tutorial is that GV is not a GUI application, i.e. you
don’t draw your graph, but rather you describe it in a text
file. The file is then processed using one of the many programs that
Graphviz If you can’t live without buttons and menus, try Dia, that can export also in Graphviz .dot
format. Dia is a GTK+ based diagram creation program for Linux, Unix
and Windows released under the GPL license.
Graphviz has its own native, plain text format, that is documented on
the website. Graphviz
.dot
files can be read and written with any text
editor like kate, gedit, jedit or notepad++. Keeping a file of this
kind is the obvious choice for an experiment, but of course the
single-file approach has also lot of problems.
This is a sample from the final .dot
file I had compiled during the excavation weeks:
digraph matrix {
723->722
505->732
729->732
731->730->729
726->729
730->726
726->810->725
729->810->725
729->733->792->793
722->731
732->737->736->733
733->810->725
729->505
736->506
505->506
179->759
759->725
759->737
759->769->768->778
768->303
737->739->736->778
736->769
778->303
506->303
769->506
769->780
778->779
736->773->774->779->780
779->303
780->303
506->780
505->724
}
Apart from the initial preamble, it’s a ridiculously easy syntax. The
Harris Matrix is to be read top-down, so i.e. A -> B
means “A comes after B”. You can also
concatenate multiple relations on the same row. Indenting is not
mandatory, but it helps keeping your file clean. You can write
comments on any line after a #
character, like
# this is a comment
A -> B -> C
A -> D -> E # this one too!
It’s not that difficult to keep this file updated by hand, really. One thing you could worry about are redundant relations that could for sure make your graph ugly and unreadable. But this is about automation, so this isn’t going to be a problem: we’ll be recording each relation, even the useless ones.
We said at the beginning that the Harris Matrix is a directed
graph. Graphviz comes with a lot of tools, but only one does what
we need, and it’s named dot
. From the command line we
can just run
dot harris-matrix.dot -Tpng -o harris-matrix.png
and get in zero seconds our data compiled as a graph. The -Tpng
command line option specifies which one of the many available output
formats we want to get. The -o
flag (that is, option) precedes the
output filename.
So far, the result is quite good. But redundant relations are still there, and I promised it wouldn’t be a problem at all.
Here’s when the power of UNIX comes in help. tred
is
another of the many tools provided by Graphviz, that acts as a
“transitive reduction filter for directed graphs”. So, it
has to run before dot
reads the input file. A
pipe (represented by the |
character) is the easiest way to pass data from one program to another
in UNIX style. Here’s how I did it:
tred harris-matrix.dot | dot -Tpng -o harris-matrix-tred.png
Note that dot
by default accepts input from stdin, while tred
by
default uses stdout as output. Many simple programs that do one
single operation, well done: this is the core of the UNIX philosophy,
and Graphviz follows it. Once you understand this concept, things will
be much easier. The output of this second command is slightly
different from the first one:
You can play around with some general options to change the graphic layout of your graph. These are two options I often use to get better looking Harris Matrices:
digraph matrix { # these two options go at the beginning of the graph file
concentrate=true;
node[shape=rect];
That’s enough for now. In the next tutorial, we’ll go further, using Graphviz as a programming library through Python. This means that we won’t need anymore to enter manually the relations, we will have a GUI, and our data will be stored in a database.