19th century Greek 8.0

Model details

Creator(s)

coidacis@gmail.com

Language(s)

Greek Modern (1453-), Greek Ancient (to 1453)

Centuries

19th, 20th

CER on Validation Set

0.93%

Size (Nr. of Words)

187,561

Model ID

148525

About this Model

The “19th century Greek 8.0“ is designed to recognize the Greek prints of the 19th century. It is the eighth of a series of models specially developed for the digitization of Greek literary texts published from the 19th century onwards. This is the first that goes public and is built on the largest ground-truth dataset.

Ground Truth Data

The model was trained on 894 pages derived from 7 different books with varying typographic and scanning features (font families, font sizes, colors, etc.). The dataset for this version corresponds to samples of 7 documents, 894 pages, 24,836 lines, and 187,561 words.

In the ground truth creation, the following transcription guidelines were followed:

- all the Greek diacritics were kept (άὰᾶἀἁᾱᾰϊᾳσ̌)

- the Greek ligatures ϛand ϗ were transcribed as “ς” and “κ”

- the Greek ligature ȣ was transcribed as “ου”

- the three dots ellipsis character … was transcribed as three seperate dots …

The “19th century Greek“ series is maintained by Fotini Koidaki, who initially began creating the model at a research hosted at the Aristotle University of Thessaloniki, and finally completed and published it in the TALOS AI4SSH lab.

For more information or details please contact Fotini Koidaki at coidacis@gmail.com.

The project acknowledges funding from the TALOS-AI4SSH ERA Chair in Artificial Intelligence for Humanities and Social Sciences grant (grant agreement: 101087269).

Try it out

19th century Greek 8.0 is freely available to everyone

You can use this model to automatically transcribe Handwritten documents with Handwritten Text Recgnition in Transkribus.