Creator(s)
Achim Rabus, Martin Meindl & Milanka Matić-Chalkitis (MultiHTR project)
Language(s)
Centuries
CER on Validation Set
9.5%
Size (Nr. of Words)
144,709
Model ID
47882
This is the first version of a combined model for the Deutsche Einheitskurzschrift (DEK), based on natural and synthetic training data. The natural GT data consists of several diaries of a private person and was kindly provided by the German Diary Archive (DTA) (https://tagebucharchiv.de/). Special thanks at this point go to the director of the DTA, Marlene Kayen. The synthetic training data (electronically available longhand texts converted into German standard shorthand) are composed of Goethe's “Faust” (https://jens-wawrczeck.de/stenogenerator/goethe/Faust%201%20(Goethe)%20-%20A4%20oL.pdf und https://www.projekt-gutenberg.org/goethe/faust1/) and Grimm's fairy tales. The model was trained by Achim Rabus. Martin Meindl and Milanka Matić-Chalkitis also worked on the creation of this model as part of the MultiHTR project at the Department of Slavic Languages and Literatures of the University of Freiburg (Germany). The model is suitable for transcribing natural manuscripts written in DEK. It can also be useful as a base model for other German shorthand systems.
You can use this model to automatically transcribe Handwritten documents with Handwritten Text Recgnition in Transkribus.