Ottoman-Turkish Generic

Model details

Creator(s)

Milanka Matić-Chalkitis (MultiHTR project)

Language(s)

Turkish Ottoman (1500-1928)

Centuries

CER on Validation Set

11.6%

Size (Nr. of Words)

201,096

Model ID

56496

About this Model

The model is based on handwritten and printed data in Arabic-Persian script and Ottoman-Turkish language. The model was trained by Milanka Matić-Chalkitis as part of the MultiHTR project (project leader: Prof. Dr. Achim Rabus) at the Department of Slavic Languages and Literatures of the University of Freiburg (Germany). The handwritten training data largely comprises the poetry collection 'Mecmua' (https://mecmua.acdh.oeaw.ac.at/toc.html), the poetry collection of the Ottoman poet Keşfī and a smaller collection of travelogs and correspondence of the Ottoman military apparatus from the QHoD project (Digital Edition of Sources on Habsburg-Ottoman Diplomacy 1500-1918; https://qhod.net/). We would like to thank Prof. Dr. Hülya Çelik (University of Bochum), Prof. Dr. Yavuz Köse (University of Vienna) and Dr. Stephan Kurz (Austrian Academy of Sciences) for kindly providing the data and for their close cooperation. The printed data includes parts of various newspapers and journals from the late Ottoman period, which were provided by Suphan Kirmizialtin (Ditigal Ottoman Corpora https://www.digitalottomancorpora.org/). Many thanks for the great support and helpfulness to Suphan. The ground truth was reused according to the 'data recycling' principle, so that the training data has a high diversity in terms of physical quality, layout, font, age and transcription rules. The model is to be understood as an auxiliary transcription model for users with little or no knowledge of the Ottoman-Turkish language and/or Arabic-Persian script.

Try it out

Ottoman-Turkish Generic is freely available to everyone

You can use this model to automatically transcribe Handwritten documents with Handwritten Text Recgnition in Transkribus.