Have you ever sent a letter that never arrived? From 1652-1815 the British navy and privateers seized all types of documents from enemy ships, ship logs, cargo lists and even private letters. Hundreds of years later, the Huygens Institute in the Netherlands started the Dutch Prize Papers project to digitise and analyse these historical documents and transcribe them with the help of Transkribus.
Marijcke Schillings, historian, researcher and coordinator of the Dutch Prize Papers told us more about the project from its start in 2016, its documents and how the Transkribus software was involved to create an AI text recognition model.
The Dutch Prize Papers project
What are the Prize Papers?
From ship logs, cargo lists, plantation records, and crew interrogations to letters, the documents making up the Dutch Prize Papers are anything but trivial. These papers are kept at The National Archives in Kew (London).
Marijcke Schillings explains that “the Prize Papers are documents seized by the British navy and privateers from enemy ships in the period 1652-1815”. As a powerful player on the seas, the British Navy, together with privately owned and operated ships, engaged in naval warfare to disrupt enemy trade.
“This collection also contains approximately 38,000 Dutch business and private letters,” says Schillings1. With the great variety of documents, the Prize Papers therefore offer the possibility of “different types of international research" and insights into “all social strata of society”.
What is the Dutch Prize Papers project?
As the national institute for research on the history and culture of the Netherlands, the Huygens Institute (HI) is dedicated to innovative and collaborative research on historical sources and literary texts. The aim of the Dutch Prize Paper project was, firstly, to have a large selection of digitised (Dutch) documents available for research and, secondly, "to make printed and handwritten texts more searchable and readable".
At the end of 2015, the Huygens Institute received a substantial subsidy that made it possible to achieve the first step.
In June 2019, 72,000 scans or 140,000 pages of mainly Dutch documents from the seventeenth to the early nineteenth century and their metadata became available online on the Dutch Prize Papers website. To improve access to digitised documents, the Huygens Institute therefore created a Virtual Research Environment (VRE).
Working towards the second step, Marijcke Schillings and her colleagues from the DPP project turned to Transkribus’ text recognition software. Since several Huygens Institute projects "already had experience with the user-friendly HTR platform and achieved good results", the team decided to start a pilot project, with the primary goal of exploring automatic text recognition.
PrizePapers Collection, folder 1800-1810_24/HCA32-1210-0016b, fragment layout-analysis. Transkribus Platform
Creating an AI model with Transkribus
For this pilot project, 100 scans of documents from different centuries and written in varying languages were selected to train a custom text recognition model.
Layout Ground Truth
After choosing the material, the team started working on creating Ground Truth pages of the layout, specifically the text regions and the baselines of the historical pages. Schillings elaborates that the baselines were first placed automatically and then checked manually, as the lines of text tended to be disintegrated or crooked.
Using the P2PaLA layout analysis tool the Ground Truth pages were then used to train three structure recognition models. However, when these models were tested, the results were not as accurate as hoped, indicating the need for additional training material. Recognising the challenges with the P2PaLA layout analysis tool, Transkribus has since introduced trainable layout models, such as the Field Models and Table Models. These trainable layout models require less training data while being more precise.
Text Ground Truth
The next step was to create Ground Truth pages of transcribed text to train the text recognition model. The Ground Truth pages were generated by using existing models and then checked and manually corrected. Based on 100 pages of Ground Truth, the DPP team created two custom text recognition models. “We decided to let create a model including a base model (i.e. IJsberg) first and a second one, exclusive of a base model.”
Model Result Comparison:
DPP= ede gescheept in het Schip de Gesina Mana, Comyn Cannelis
DPP2= Dene gescheept en her Schip de Gesena Aana, Comin Corneeir
Manual=ende gescheept in het schip De Gesina Maria, Captyn Cornelis
As the team expected, the first model, including the base model IJsberg, produced the best results, which is shown in the comparison of text recognition results.
Working with the Transkribus Platform
“The experience with the Transkribus tools was very good.” summarises Marijcke Schillings. By creating two multilingual models, the team explored the potential of Handwritten Text Recognition (HTR), which was the primary objective of the pilot project. This effort resulted in a positive evaluation report that showed a significant improvement in readability.
Due to challenges in the accuracy of the layout analysis, a different tool called “Loghi” was applied to the documents of the Dutch Prize Papers Project in June 2023, which improved considerably the readability and searchability of the documents.
Listening to the feedback of our users, Transkribus now offers an improved and more efficient way of recognising layouts: trainable layout models. The trainable Field Models and Table Models are designed to produce accurate results even with complex layouts such as those found in newspapers, index cards or spreadsheets.
Creating opportunities for further research
Marijcke Schillings concludes that with this project, the DDP team was able to allow “interested people anywhere to view a small selection of papers”, consisting of more than 100,000 images, that are legible and digitally available.
The next step of the DPP project is to focus on making one specific type of document accessible, the bills of lading. Bills of lading were not usually kept after the shipment of goods by sea, clarifies Schillings. They do, however, reappear in the cargoes seized by British privateers2.
We at Transkribus are delighted to have been part of this pilot and wish the DPP project team continued success in their research into the bills of lading.
Thank you Marijcke Schillings for taking the time to talk to us!
1 R. van Gelder, Zeepost. Nooit bezorgde brieven uit de 17de en 18de eeuw (Amsterdam/Antwerpen 2008) 20-21.
2 “Flessen op papier”, A.P. v[an] V[liet], in: Buitgemaakt en teruggevonden. Nederlandse brieven en scheepspapieren in een Engels archief. Sailing Letters Journaal V. Onder redactie van E. van der Doe, P. Moree, D.J. Tang, met medewerking van P. de Bode (Zutphen 2013) 196-197.
Thumnail: Logo website Dutch Prize Papers