A collaborative approach: READ-COOP and the Europeana Foundation join forces to enhance the Transcribathon platform

By Fiona Park

Not everyone who works with history is a professional historian. From hobby genealogists to volunteers in local museums, laypeople have always played an important role in keeping history alive. And in the digital age, there is a new way for volunteers to get involved.

The EnrichEuropeana+ project is a citizen science initiative. It brings ordinary people together to work towards a common scientific or academic goal. In the case of Enrich Europeana, this involves creating fully annotated digital versions of the Europeana Collections. To achieve this, volunteers from all over Europe transcribe and enrich handwritten sources using the Transcribathon platform: a custom-made website that allows volunteers to transcribe from home using their normal computer.

In 2021, EnrichEuropeana+ decided to update the Transcribathon platform with new technology and a new look. As the experts in transcription software, READ-COOP were asked to be part of the project, and we gladly accepted. Here is what happened.

Enriching European cultural heritage

The Europeana project was launched back in 2008. The goal was to preserve pieces of cultural heritage, such as letters, portraits and official documents, from across the continent and make them accessible to the public. This resulted in the Europeana Collections: digital collections of items grouped by topic or time period. For example, if you look at the “Building” collection, you can find a photo of the Manhattan Life Insurance building in New York (housed at the Swedish National Museum of Science and Technology) as well as a newspaper article about the construction of a new student house in Bulgaria (housed at the Pencho Slaveykov Public Library in Varna). By making artefacts like these digitally available, it means everyone can enjoy and learn from them, without having to take a trip to Stockholm or Varna to do so.

Manhattan Life Insurance Building in New York © Okänd

But the biggest advantage of digital collections is that they are fully searchable. If a user is looking for newspaper articles about constructions in Bulgaria, they can simply type in those search terms and find what they are looking for much more quickly than they would if searching through a physical collection. The searching process is made possible through metadata — extra information about the artefact that is programmed into its digital version. Metadata isn’t just the title, date and description that you would find in a regular museum but also many other entities such as names and places that are mentioned in the artefact, or tags summarising its content. When the user types in a search term, the collection searches through the metadata of all the items, finds the ones matching the search term and shows the item to the user.

A large citizen science initiative

However, transcribing digital artefacts and enriching them with metadata requires a human being to look at or read through the material, assign tags and other metadata and input these into a computer system. Ideally, the transcription and metadata should also be checked by a second human being, to ensure that everything is inputted correctly. Of course, this takes quite a bit of time, and most museums, libraries, and archives simply don’t have the resources to input transcriptions and metadata themselves.

So Europeana came up with a novel solution to this problem: citizen science. The transcriptions and metadata would be added to the digital artefacts by a team of volunteers, leaving museum staff free to do more specialist work. The volunteers would be trained on how to input the data using their own computer at home, making it possible for anyone around the world to contribute to digitising the Europeana Collections.

Documents from the “Saxony At Work” run © Europeana Foundation

A key part of the project is the Europeana Transcribathon platform, where volunteers can view materials, transcribe texts, and enrich them with metadata using just their regular computer at home. Europeana also organise transcription events known as “runs”. Each run has a particular theme, for example, Saxony’s industrial culture or theatrical manuscripts in Portuguese, and a specified time period, usually several days or weeks. During the run, volunteers can transcribe the documents on that theme and often also compete against each other to see who can process the most documents in the time period. While the “winners” often don’t win anything more than the honour of being at the top of the leaderboard, the sense of competition increases volunteers’ motivation and makes the whole event more fun for everyone.

Incorporating Transkribus into Transcribathon

The original Transcribathon platform, which was created in 2016, was a pure transcription editor. Volunteers could manually transcribe text using their computer, but no automatic transcriptions were possible. In 2021, Europeana decided to update the platform with handwriting recognition software. This would mean that volunteers no longer had to do time-consuming manual transcriptions, they could simply proofread an automatic transcription. As proofreading generally takes much less time, volunteers would be able to process more documents in the same amount of time, helping the online collections to expand more rapidly.

The easiest way to create a new digital platform is to base it on something that already exists, and that is exactly what Europeana did. READ-COOP already had a functioning platform for the transcription and enrichment of historical documents (Transkribus) and a way for other platforms to communicate directly with Transkribus (the metagrapho API). This would form the basis of the new Transcribathon platform.

The metagrapho API allows other platforms to access Transkribus technology © READ-COOP

For the uninitiated, an API is a piece of software that acts as a messenger between two different platforms. A user requests information on one platform, and the platform sends this request to the API of another platform. Once this second platform has a response to the request, the API brings it back to the first platform and the person gets the information they need. A good example of this is a flight booking site. A user wants to find out what flights are available between two different cities, so they input a departure airport and destination on a flight booking site. An API then sends this message to a second platform, in this case, the computer system of the airline. This computer system finds the possible flights and the API sends this information back to the flight booking site. The user can then see all the available flights.

The new Transcribathon platform works in a similar way. When a volunteer wants to get an automatic transcription of a text, they request this on the Transcribathon platform. Transcribathon then sends this request to the metagrapho API, which uses handwriting recognition technology to process the image and generate an automatic transcription. Finally, once the processing is complete, the Transcribathon platform can access the transcription and show it to the volunteer, again via the metagrapho API.

Using an existing API in this way meant the Europeana team didn’t have to build their own text recognition system from scratch. They simply had to build a platform that the metagrapho API could interact with, enabling them to access the technology in the main Transkribus platform. This meant that Transkribus’ text recognition technology could be integrated into the platform quite quickly, without too much costly development.

A Croatian postcard from the Europeana Collections. © Dragutin Hirc

An easy-to-use transcription editor

Updating the technology behind Transcribathon meant that the transcription editor — the part a volunteer uses to input or proofread transcriptions — was no longer able to cope with the richer data format that it was receiving back from the metagrapho API. Therefore, it was necessary to build a new transcription editor for Transcribathon. This would, among other things, allow volunteers to click on a line of the transcription, and see the corresponding line in the image of the text.

Again, it was decided to not create a new editor right from scratch. Instead, READ-COOP took the existing editor in the Transkribus software, modified it to fit the requirements of Transcribathon, and turned it into a widget. The widget was then simply inserted into the Transcribathon platform, making it possible for users to access and edit the transcriptions generated by the metagrapho API. Like with the API, using the existing Transkribus editor and simply modifying it also saved precious development time and costs.

The new-look Transcribathon editor © Europeana Foundation

The power of collaboration

In short, by using the existing Transkribus technology, the EnrichEuropeana+ project was able to update the Transcribathon platform much more quickly and efficiently than would have been possible if they had developed everything from scratch. With the metagrapho API and custom transcription editor widget, Transcribathon could take the best of READ-COOP’s technology and modify it to suit the requirements of this unique citizen science project.

And the project has already been a success. The new version of the platform has recently been used for several runs, including the transcription of historical documents in Croatian as well as a multilingual run of 19th-century documents, in which volunteers processed over 1400 documents in just 6 weeks. We look forward to seeing what future collaborations between EnrichEuropeana+ and Transkribus will bring!

One of the many documents from the Zagreb run © Ivan Ulčnik

This project was a Europeana Generic Services project and it was co-financed by the Connecting Europe Facility of the European Union.

Start unlocking the past with Transkribus

Leverage the power of Transkribus to get the most out of your historical documents.