Early Portuguese Printing

Model details

Creator(s)

Language(s)

Centuries

CER on Validation Set

%

Size (Nr. of Words)

NaN

Model ID

About this Model

This model was trained on a dataset of selected Portuguese grammars and linguistic publications spanning the 16th to the 18th centuries. These documents, along with many others, are publicly accessible through the Portuguese National Digital Library (bndigital.bnportugal.gov.pt). The training set for this version comprises 122,754 words (676 pages) printed in Portuguese since 1536. The dataset reveals texts that include unique letters, diacritics, historical acronyms, typography, and fleurons characteristic of the historical Portuguese writing system adapted to the new press technology, all of which this model has been trained to recognize. Given the linguistic focus, both grammatical and historical, of its training set, this model can also recognize certain Greek letters, Latin text, table patterns and simple initial capitals. However, due to the limited training in these areas, it is not recommended for those uses. This model was developed as part of a master's degree project in the postgraduate linguistics program at the Universidade Federal de Santa Catarina (UFSC). The author was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior – Brasil (CAPES).

Try it out

Early Portuguese Printing is freely available to everyone

You can use this model to automatically transcribe Handwritten documents with Handwritten Text Recgnition in Transkribus.