National Archives Netherlands · PyLaia · Published December 17, 2021
IJsberg_PyLaia
Text Recognition
Description
This the second model created by the National Archives of the Netherlands. It is based on the careful transcription of dozens of different handwritings coming from the 17th, 18th and 19th century and comprises scans from the Incoming Documents from the Dutch East India Company (Overgekomen Brieven en Papieren van de VOC) of the National Archives of the Netherlands and of 19th century Notarial deeds from the Noord-Hollands archief and eight other State Archives in the provinces.
DocID's:
146280
146321
158134
165671
165672
192776
192777
269375
Every 100th scan is GT.
Try this model
Use this modelOpen in Transkribus
Very low error rate4.1% CER
Character Error Rate (CER) measures the percentage of characters incorrectly recognised. Lower is better. This model scored 4.1% on its validation set. As a rule of thumb, a CER below 10% is considered good for most handwritten material. This is a larger model trained on diverse material, which generally makes it more robust across different handwriting styles. That said, larger training sets also make it harder to push the CER down further.
Measured on the model's own validation data. Results on your documents may differ depending on handwriting style, document condition, language, and how closely your material resembles the training data.
Words1,538,478
Lines247,861
Training Pages5,917
Model ID38769