Spanish print XVIII-XIX

Model details

Creator(s)

evasanchezsalido@gmail.com

Language(s)

Castilian

Centuries

CER on Validation Set

1%

Size (Nr. of Words)

91,640

Model ID

48440

About this Model

PyLaia model created from Ground Truth data resulting from the transcription and manual segmentation of a sample of 193 pages of the Spanish XVIII-XIX press, in particular volumes from “Diario de Madrid 1788-1825” (https://hemerotecadigital.bne.es/hd/card?oid=0001510462).

This model has been developed within the CLARA-HD project (https://clara-nlp.uned.es/home/dh) founded by the Spanish Ministery and is valid for  automatically transcribing similar Spanish prints of the same period. Manual segmentation is recommended since newspapers usually contain tables and columns. A CER of 1% on validation set has been achieved.

For more information or details please contact Eva Sánchez Salido at evasan@lsi.uned.es or Ana García Serrano at agarcia@lsi.uned.es.

Please cite this model as: Menta, A., Sánchez-Salido, E., & García-Serrano, A. (2022). Transcripción de periódicos históricos: Aproximación CLARA-HD. Proceedings of the Annual Conference of the Spanish Association for Natural Language Processing 2022: Projects and Demonstrations (SEPLN-PD 2022).

Try it out

Spanish print XVIII-XIX is freely available to everyone

You can use this model to automatically transcribe Handwritten documents with Handwritten Text Recgnition in Transkribus.