GitHub - mikekestemont/DeepScript: Code for the DeepScript Submission to ICFHR2016 Competition on the Classification of Medieval Handwritings in Latin Script
Skip to content

Code for the DeepScript Submission to ICFHR2016 Competition on the Classification of Medieval Handwritings in Latin Script

Notifications You must be signed in to change notification settings

mikekestemont/DeepScript

Repository files navigation

DeepScript

Introduction

This repository holds the code which was developed for the ICFHR2016 Competition on the Classification of Medieval Handwritings in Latin Script, of which the results are described in the following paper:

Florence Cloppet, Véronique Eglin, Van Cuong Kieu, Dominique Stutzmann, Nicole Vincent. 'ICFHR2016 Competition on the Classification of Medieval Handwritings in Latin Script'. In: Proceedings of the 15th International Conference on Frontiers in Handwriting Recognition (2016), 590-595. DOI 10.1109/ICFHR.2016.106.

The main task for this competition was to correctly predict the script type of samples of medieval handwriting, as represented by single-page, grescale photographic reproductions of codices. A random selection of examples goes below:

Random examples

The 'DeepScript' approach described here scored best in the second task for this competition ("fuzzy classification"). This system uses a ‘vanilla’ neural network model, i.e. a single-color channel variation of the popular VGG-architecture. The model takes the form of a stack of convolutional layers, each with a 3x3 perceptive field an increasingly large number of filters at each block of layers (2 x 64 > 3 x 128 > 3 x 256). This convolutional stack feeds into two fully-connected dense layers with a dimensionality of 1048, before feeding into the final softmax layer where the normalized scores for each class label get predicted. Because of the small size of the datasets, DeepScript borrowed the augmentation from the work by Sander Dieleman and colleagues which is described in this awesome blog post. The original code is available from this repository. We would like to thank Sander for his informal help and advice on this project. Below goes an example of the augmented patches on which the model was trained (see example_crops.py):

Examples of augmented patched

Scripts

The following top-level scripts included in the repository are useful at a higher level:

  • prepare_data.py: prepare and preprocess train/dev/test sets of the original images
  • train.py: train a new model
  • test.py: test/apply a previously trained model
  • filter_viz.py: visualize what filters were learned during training
  • example_crops.py: generate examples of the sort of augmented crops used in training
  • crop_activations.py: find out which patches from a test set maximally activate a particular class

By default, new models are stored in a directory under the models directory in the repository. A pretrained model can be downloaded as a ZIP archive from Google Drive: unzip it and place it under a models directory in the top-level directory of the repository. The original data can be obtained via registering on the competition's website.

Visualizations

Of special interest was the ability to visualize the knowledge inferred by a trained neural network (i.e. the question: what does the network 'see' after training? Which sort of features has it become sensitive to?). For this visualization, we heavily drew from the excellent set of example scripts offered in the keras library. Below, we show examples of randomly initialized images which were ajusted via the principle of gradient ascent to maximally activate single neurons on the final convolutional layer (see filter_viz.py). The generated images have been annotated with a couple of interesting paleographic features that seem to emerge:

Visualization of filter activations

Additionally, it is possible to select the patches from the unseen test images which maximally activated the response of a certain class out the output layer. Examples of top-activating patches (without augmentation) are given below.

Best test patches for classes

The confusion matrix obtained for the development data shows that the model generally makes solid predictions, and mostly makes understandable errors (e.g. the confusion between different types of textualis variants):

Confusion matrix validation set

Dependencies

Major dependencies for this code include:

These packages can be easily installed via pip. I recommend Continuum's Anaconda environment with ships with most of these dependencies. The code has been tested on Mac OX S and Linux Ubuntu under Python 2.7. Note that this version number might affect your ability to load some of the pretrained model's components as pickled objects.

Publication

The preliminary results of our specific approach have been presented at the 2016 ESTS conference in Antwerp. A dedicated publication describing our approach is on the way. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the TITAN X used for this research.

About

Code for the DeepScript Submission to ICFHR2016 Competition on the Classification of Medieval Handwritings in Latin Script

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages