Projects and Software

I’ve worked on quite a few open and closed source projects - feel free to take a look at my github profile.

Below are a selection of open source projects that I am most proud of or have been the primary contributer on. I don’t have as much time as I’d like to maintain these projects but I do when I get chance. Feel free to contact me on twitter if you’re interested in helping out with any of the below projects.


I recently built an annotation tool for building a large dataset of newspaper articles that discuss scientific work. Newspaper journalists often decline to cite scientific work that they discuss in their articles but usually mention the name of the author or academic institute in passing. The aim of this tool is to build a corpus that could be used to formalise links between news and science in order to improve transparency and reporting quality.


I built a scientific paper indexing system called Partridge which classifies paper by type (e.g. Review Paper, Research Paper, Discussion/Opinion Paper, Case Study and so on) and also makes papers searchable by core scientific concept. Core Scientific Concepts (CoreSCs) allow us to label the semantic intent behind sentences in a scientific paper. For example Hypothesis, Methodology, Results, Conclusion. Using Partridge you can do queries like “Show papers where ‘malaria’ appears in the hypothesis and ‘quinine’ (a malaria drug) appears in the methodology”. I still maintain partridge but I don’t have much time to keep it up to date recently.


SAPIENTA is the underlying classification engine that Partridge uses to label new scientific papers with CoreSCs. My supervisor Maria originally built SAPIENTA with colleagues from Aberystwyth University and the European Biomedical Institute (EBI). In the summer of 2013, I spent about 3 months rewriting SAPIENTA from a series of PERL scripts to a Python web service. We do run a web service that applies CoreSC labels to PDFs and XML documents but it is a little bit flakey. Contact me if you are interested in using this system.