Sámi OCR

Model details

Creator(s)

National Library of Norway

Language(s)

Norwegian, Sami Languages

Centuries

CER on Validation Set

1.18%

Size (Nr. of Words)

485,407

Model ID

179305

About this Model

This is a multilingual Sámi model trained on printed text in North Sámi, South Sámi, Lule Sámi and Inari Sámi. The model is trained on pages from books and newspapers from The National Library of Norway’s collection. The training material consists of Sámi texts written in the contemporary written standards of the four Sámi languages. In total, the model is trained on 485 407 words, and it achieves a CER of 1.18% on the validation set.

Try it out

Sámi OCR is freely available to everyone

You can use this model to automatically transcribe Handwritten documents with Handwritten Text Recgnition in Transkribus.