3 AI Models For Transcribing German Text In Fraktur, Kurrent and Sütterlin

If you regularly work with German historical documents then there are three types of German script that you are probably very familiar with: Fraktur, Kurrent, and Sütterlin. These scripts were used from the 16th century right up until the Second World War, covering several centuries of German and Central European history. However, nowadays, they are almost impossible to read for the untrained eye, making transcribing these kinds of documents a long and time-consuming process.

Thankfully, technology can now speed things up. Platforms such as Transkribus use AI models to recognise Fraktur, Kurrent, Sütterlin, and other scripts, and automatically create a digital version of the text. These digital versions can then be easily searched for certain words or phrases and shared with colleagues and the general public.

If you are new to using Transkribus to read historical documents in German, this post will introduce you to these three key scripts and show you three AI models that are ideal for transcribing them.

What is Fraktur?

The Fraktur font was used widely in German print from the early 16th century until it was outlawed by the Nazi Party in 1941. A form of black letter typeface, Fraktur letters are angular, rather than curved, and so it is often known in German as “gebrochene Schrift“, or “broken script”. Fraktur typefaces also often contain ligatures, most of which have their roots in German cursive handwriting.

How is it different to Kurrent?

In contrast to Fraktur, the “Kurrentschrift“, as it is known in German, is a type of handwritten script. It was also developed in the early 16th century and was then used up until the beginning of the 20th century when it was replaced by the newly developed Sütterlin script (see below). Until then, it was the standard handwriting that was taught in schools throughout Germany.

And what is Sütterlin?

As mentioned, the Sütterlin script was another type of German handwriting and was the successor to the Kurrent script. At the beginning of the 20th century, the Prussian Ministry of Science, Art and Culture decided it was time to update Kurrent with a form of handwriting that was easier to read. In 1911, they commissioned designer Ludwig Sütterlin to create such a script, which he gladly did. The Sütterlin script was first introduced into Berlin schools in 1914 and soon spread to become the dominant handwritten script throughout Germany. You can find out more information on Ludwig Sütterlin’s Wikipedia page.

3 AI models for reading Fraktur, Kurrent and Sütterlin

The German Giant

If there is one model that is useful for documents written in Kurrent, Fraktur, or Sütterlin, it is this one. Trained with over 15 million words from a very diverse range of handwritten and printed manuscripts, the German Giant is able to transcribe almost any handwritten or printed document with relative accuracy and without extra training. In addition to the Kurrent and Sütterlin documents, the training data also included some German-language documents written in Latin script, making it ideal for manuscripts containing multiple types of handwriting. It has a CER of 8.3%.

→ Go to model

German Fraktur 19th-20th Centuries

This AI model focuses on a particular type of Fraktur text: documents written in the 19th and 20th centuries. Developed by the Austrian National Library and the Newseye project, the model is based on 442,121 words from a wide variety of historical newspapers and publications. It also has a CER of just 1%, outperforming most standard OCR engines with these types of documents. However, please note that the model was trained exclusively on German-language documents, making it less suitable for Swedish or Finnish Fraktur, for example.

→ Go to model

German Kurrent 17th-18th Centuries

This Transkribus Kurrent model is based on 1,840,000 words from a diverse set of documents, including council minutes of the Pomeranian government of Stralsund, the assessor votes of the Wismar High Court and various private letter collections. It was developed by the University of Greifswald, has a CER of 5.5% and is suitable for transcribing all manner of Kurrent documents from the 17th and 18th centuries.

→ Go to model

How do I use a public AI model with Transkribus?

Transkribus’ transcriptions are based on AI models. Each model has been trained to read a specific type of handwritten or printed text in a certain language, and often a certain time period or genre too.

If you want to transcribe a document with Transkribus, you first need to upload a scan of the document and then you choose a model. There are over a hundred public models available, covering a wide range of languages, scripts, and materials. Transkribus will take the information stored in that model and apply it to your document, creating an automatic transcription.

But what if there isn’t a model that is suitable for the text in your documents? Then you also have the chance to train your own. To do this, you need a series of pre-transcribed documents, collectively known as “Ground Truth”. The more Ground Truth data you use to train your model, the more information it will contain and the more accurate it will be when transcribing new documents. To save time, many people use a public model as the base for their custom model and then fine-tune it with further Ground Truth.

For more information about text recognition and training models, check out our Help Center.

Upload a document and give Transkribus a try:

Start unlocking the past with Transkribus

Leverage the power of Transkribus to get the most out of your historical documents.