Creator(s)
NOSCEMUS project
Language(s)
German, Greek Ancient (to 1453)
Centuries
CER on Validation Set
0.8%
Size (Nr. of Words)
667,127
Model ID
52640
The “NOSCEMUS General Model” is tailored towards recognizing Latin prints from the early modern period. Although the model is designed to recognize Latin prints set in Antiqua-based typefaces, it is also capable of recognizing passages in Greek and passages set in (German) Fraktur.
In creating the Ground Truth the following transcription guidlines were followed:
– ligatures (e. g. Æ or æ, Œ or œ) and standard abbreviations (e.g. -que, -us, -tur, …mm…, …nn…) have been expanded
– long s (ſ) was transcribed as a normal s
– small caps were transcribed as majuscules
– special characters and diacritics (e. g. &, ë, ï or ę) were kept
The model was released by Stefan Zathammer and it is based on training data coming from the Digital Sourcebook of the NOSCEMUS project (https://transkribus.eu/r/noscemus/#).
If you use the Noscemus model as a base model for your own model, or if your edition is based on a transcription made with the help of the Noscemus model, you are kindly requested to mention the Noscemus model.
The NOSCEMUS project (https://www.uibk.ac.at/projects/noscemus) has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 741374).
You can use this model to automatically transcribe Handwritten documents with Handwritten Text Recgnition in Transkribus.