Spanish print XVIII-XIX

Model details

Creator(s)

evasanchezsalido@gmail.com

Language(s)

Castilian

Centuries

CER on Validation Set

Size (Nr. of Words)

91,640

Model ID

48440

About this Model

PyLaia model created from Ground Truth data resulting from the transcription and manual segmentation of a sample of 193 pages of the Spanish XVIII-XIX press, in particular volumes from “Diario de Madrid 1788-1825” (https://hemerotecadigital.bne.es/hd/card?oid=0001510462).

This model has been developed within the CLARA-HD project (https://clara-nlp.uned.es/home/dh) founded by the Spanish Ministery and is valid for automatically transcribing similar Spanish prints of the same period. Manual segmentation is recommended since newspapers usually contain tables and columns. A CER of 1% on validation set has been achieved.

For more information or details please contact Eva Sánchez Salido at evasan@lsi.uned.es or Ana García Serrano at agarcia@lsi.uned.es.

Please cite this model as: Menta, A., Sánchez-Salido, E., & García-Serrano, A. (2022). Transcripción de periódicos históricos: Aproximación CLARA-HD. Proceedings of the Annual Conference of the Spanish Association for Natural Language Processing 2022: Projects and Demonstrations (SEPLN-PD 2022).

Try it out

Spanish print XVIII-XIX is freely available to everyone

You can use this model to automatically transcribe Handwritten documents with Handwritten Text Recgnition in Transkribus.