Creator(s)
Milanka Matić-Chalkitis (MultiHTR project)
Language(s)
Turkish Ottoman (1500-1928)
Centuries
CER on Validation Set
11.6%
Size (Nr. of Words)
201,096
Model ID
56496
The model is based on handwritten and printed data in Arabic-Persian script and Ottoman-Turkish language. The model was trained by Milanka Matić-Chalkitis as part of the MultiHTR project (project leader: Prof. Dr. Achim Rabus) at the Department of Slavic Languages and Literatures of the University of Freiburg (Germany). The handwritten training data largely comprises the poetry collection 'Mecmua' (https://mecmua.acdh.oeaw.ac.at/toc.html), the poetry collection of the Ottoman poet Keşfī and a smaller collection of travelogs and correspondence of the Ottoman military apparatus from the QHoD project (Digital Edition of Sources on Habsburg-Ottoman Diplomacy 1500-1918; https://qhod.net/). We would like to thank Prof. Dr. Hülya Çelik (University of Bochum), Prof. Dr. Yavuz Köse (University of Vienna) and Dr. Stephan Kurz (Austrian Academy of Sciences) for kindly providing the data and for their close cooperation. The printed data includes parts of various newspapers and journals from the late Ottoman period, which were provided by Suphan Kirmizialtin (Ditigal Ottoman Corpora https://www.digitalottomancorpora.org/). Many thanks for the great support and helpfulness to Suphan. The ground truth was reused according to the 'data recycling' principle, so that the training data has a high diversity in terms of physical quality, layout, font, age and transcription rules. The model is to be understood as an auxiliary transcription model for users with little or no knowledge of the Ottoman-Turkish language and/or Arabic-Persian script.
You can use this model to automatically transcribe Handwritten documents with Handwritten Text Recgnition in Transkribus.