If you regularly work with German historical documents then there are three types of German script that you are probably very familiar with: Fraktur, Kurrent, and Sütterlin. These scripts were used from the 16th century right up until the Second World War, covering several centuries of German and Central European history. However, nowadays, they are almost impossible to read for the untrained eye, making transcribing these kinds of documents a long and time-consuming process.
Thankfully, technology can now speed things up. Platforms such as Transkribus use AI models to recognise Fraktur, Kurrent, Sütterlin, and other scripts, and automatically create a digital version of the text. These digital versions can then be easily searched for certain words or phrases and easily shared with colleagues and the general public.
If you are new to using Transkribus to read historical documents in German, this post will introduce you to these three key scripts and show you three AI models that are ideal for transcribing them.
The Fraktur font was used widely in German print from the early 16th century until it was outlawed by the Nazi Party in 1941. A form of black letter typeface, Fraktur letters are angular, rather than curved, and so it is often known in German as “gebrochene Schrift“, or “broken script”. Fraktur typefaces also often contain ligatures, most of which have their roots in German cursive handwriting.
In contrast to Fraktur, the “Kurrentschrift“, as it is known in German, is a type of handwritten script. It was also developed in the early 16th century and was then used up until the beginning of the 20th century when it was replaced by the newly developed Sütterlin script (see below). Until then, it was the standard handwriting that was taught in schools throughout Germany.
As mentioned, the Sütterlin script was another type of German handwriting and was the successor to the Kurrent script. At the beginning of the 20th century, the Prussian Ministry of Science, Art and Culture decided it was time to update Kurrent with a form of handwriting that was easier to read. In 1911, they commissioned designer Ludwig Sütterlin to create such a script, which he gladly did. The Sütterlin script was first introduced into Berlin schools in 1914 and soon spread to become the dominant handwritten script throughout Germany. You can find out more information on Ludwig Sütterlin’s Wikipedia page.
If there is one model that is useful for documents written in Kurrent and/or Sütterlin, it is this one. Trained with a whopping 3,610,922 words from a very diverse range of handwritten manuscripts, Transkribus German Handwriting M1 is able to transcribe almost any handwritten document with relative accuracy and without extra training. In addition to the Kurrent and Sütterlin documents, the training data also included some German-language documents written in Latin script, making it ideal for manuscripts containing multiple types of handwriting. For such a diverse model, it has a low CER of just 4.7%.
This AI model focuses on a particular type of Fraktur text: documents written in the 19th and 20th centuries. Developed by the Austrian National Library and the Newseye project, the model is based on 442,121 words from a wide variety of historical newspapers and publications. It also has a CER of just 1%, outperforming most standard OCR engines with these types of documents. However, please note that the model was trained exclusively on German-language documents, making it less suitable for Swedish or Finnish Fraktur, for example.
This Transkribus Kurrent model is what we sometimes call a “super model”: it is based on 1,840,000 words from a diverse set of documents, including council minutes of the Pomeranian government of Stralsund, the assessor votes of the Wismar High Court and various private letter collections. It was developed by the University of Greifswald, has a CER of 5.5% and is suitable for transcribing all manner of Kurrent documents from the 17th and 18th centuries.
Transkribus’ transcriptions are based on AI models. Each model has been trained to read a specific type of handwritten or printed text in a certain language, and often a certain time period or genre too.
If you want to transcribe a document with Transkribus, you first need to upload a scan of the document and then you choose a model. There are currently 94 public models available, which are all completely free to use. Transkribus will take the information stored in that model and apply it to your document, creating an instant transcription.
But what if there isn’t a model that is suitable for the text in your documents? Then you also have the chance to train your own. To do this, you need a series of pre-transcribed documents, collectively known as “Ground Truth”. The more ground-truth data you use to train your model, the more information it will contain and the more accurate it will be when transcribing new documents. To save time, many people use a public model as the base for their custom model and then fine-tune it with further Ground Truth.
Leverage the power of Transkribus to get the most out of your historical documents.