Noscemus GM 6

Name: Noscemus GM 6
Author: NOSCEMUS project

Description

The "Noscemus General Model" is tailored towards recognizing Latin prints from the early modern period. Although the model is designed to recognize Latin prints set in Antiqua-based typefaces, it is also capable of recognizing passages in Greek and passages set in (German) Fraktur. In creating the Ground Truth the following transcription guidlines were followed: - ligatures (e.g. Æ or æ, Œ or œ) and standard abbreviations (e.g. -que, -us, -tur, …mm…, …nn…) have been expanded - long s (ſ) was transcribed as a normal s - small caps were transcribed as majuscules - special characters and diacritics (e. g. &, ë, ï or ę) were kept The CER in GM 6 is slightly higher than in GM 5 which is due to a more varied validation set. The model was released by Stefan Zathammer and it is based on training data coming from the Digital Sourcebook of the NOSCEMUS project (https://transkribus.eu/r/noscemus/#/). If you use the Noscemus model as a base model for your own model, or if your edition is based on a transcription made with the help of the Noscemus model, you are kindly requested to mention the Noscemus model. The NOSCEMUS project (https://www.uibk.ac.at/projects/noscemus/) has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 741374).

Try this model

Drag an image here

Select a file...

PNG or JPG up to 10 Mb

Your transcription will appear here

By uploading an image, you accept our terms and privacy policy.

Use this model Open in Transkribus

Very low error rate0.8% CER

Character Error Rate (CER) measures the percentage of characters incorrectly recognised. Lower is better. This model scored 0.8% on its validation set. As a rule of thumb, a CER below 10% is considered good for most handwritten material. This is a larger model trained on diverse material, which generally makes it more robust across different handwriting styles. That said, larger training sets also make it harder to push the CER down further.

Measured on the model's own validation data. Results on your documents may differ depending on handwriting style, document condition, language, and how closely your material resembles the training data.

Words667,127

Lines101,526

Training Pages3,270

Model ID52640

Related models

Description

Try this model

Related models

Noscemus GM 5

Transkribus Print M1

SKOBOK 5

19th century Greek 8.0