A few days ago I've started my first Python 3 program. It's written from scratch using modules from Python's standard library. I'm using the Debian package from experimental, which works fine.
The program is meant to extract meaningful data about Samian Pottery in Britain from a dataset available at ADS, published by Steven Willis. I discovered this dataset some months ago through an ABZU entry that pointed to the ADS page (you have to agree to the site's use conditions before you can access the page).
Given my past and current work about pottery distribution studies, I'm interested in playing with data very alike those I produced, albeit from a different period and region.
I plan to report about my progress in analyzing Willis' data both from a digital and archaeological point of view in the next few weeks. I currently have no access to Willis' paper so this makes my efforts blind in the sense that I don't know which kind of analysis he performed and which conclusion he came to eventually. However, this sounds to me like an interesting experiment about the nature of archaeological data and the assumptions we make about them.
Comments
Pingback
[...] Samian Pottery data are available in various formats, namely XLS, SXC (OpenOffice.org 1.0 format) and TXT (tab-separated values in fact). There is no actual difference between the content and the structure of files among the different formats, just the spreadsheet files have lots of contexts in just 3 files (each context is a single sheet), while the tab-separated values files are one per context. That said, and provided that I already planned to extract data using the Python standard library programming modules, I thought the text files would be the best choice to start. Unfortunately, it's 173 files and - just because of the ADS site's conditions of use - there's no easy way to automagically download them using command line tools like wget or similar (which I do use frequently otherwise). You will be better off with some Firefox extension like FlashGot. I assume that if you want to follow this series you will manage to download the files on your machine. I wonder how difficult it would have been to zip those files into a single compressed file. That would have saved some 5 minutes to myself and at least 3,5 MB worthy on the server (not to mention the fact that using the recently released XZ compression format I could have them compressed down to 74K (yes I did).¹ [...]