Skip to content
  • Pricing

Train AI models for your documents

Every collection of historical documents has its own unique handwriting. When public models aren't enough, Transkribus lets you train a custom AI model — tailored to your specific handwriting, language, and document style. No machine learning expertise required.

300+Public AI models
2–5%CER achievable
25–50Pages to start training

How model training works

Training a custom model in Transkribus follows a proven, iterative workflow. Each cycle improves your model's accuracy.

01

Upload your documents

Start by uploading scans of the handwritten or printed documents you want to transcribe. Transkribus accepts JPEG, PNG, PDF, and TIFF. Organize your documents into collections for easy management.

Tip: Start with 25–50 representative pages that cover the range of handwriting styles in your collection.

4typesFormats supported

Text Recognition Models

Train a custom text model with PyLaia

PyLaia is the deep learning engine behind Transkribus text recognition models. It handles handwritten text from any century, any language, and any script — from medieval Latin manuscripts to 20th-century Kurrent. You provide the Ground Truth; PyLaia learns the handwriting.
Works for any script: Latin, Cyrillic, Arabic, Hebrew, Chinese, and more
Handles mixed print and handwriting on the same page
25–50 transcribed pages are enough to start training
Models automatically improve with more Ground Truth data
Share your model with colleagues or the entire Transkribus community

Train models for structured tables

Historical documents are full of tabular data — census records, church registers, ship manifests, accounting ledgers. Table models detect row and column structures and extract cell contents into structured data you can export to Excel, CSV, or XML.

Document with detected table structure
Extracted Table Data
InstitutionTownAmountObjectDateDisposition
Franklin College (6)New Athen, O.General3/23/16
Fargo College (3)Fargo, N.D.100,000Endowment4/27/16Gen 1914, 5/18/16
Franklin Academy (2)Franklin, Neb.5,000Library Building8/3/16Gen 1914, 8/7/16
Fessenden Acad. & Ind. SchoolFessenden, Fla.General12/22/16
Ferris Institute (2)Big Rapids, Mich.50,000Buildings2/12/17
Findlay College (2)Findlay, O.100,000Endowment5/23/17Gen 1914, 5/28/17
Fairmount CollegeWichita, Kan.200,000Endowment6/7/176/14/17
Franklin CollegeFranklin, Ind.50,000General9/13/17Gen 1914, 9/17/17
Fisk UniversityNashville, Tenn.1,000,000Endowment6/14/18
Friends UniversityWichita, Kan.200,000Endowment6/20/18Gen 1914, 8/8/18

Extract specific fields from forms

When you need to extract specific data points — names, dates, addresses, amounts — from structured or semi-structured documents, field models locate and read individual fields. Ideal for census forms, registration cards, and administrative records.

Document with detected fields
Extracted Fields

Coming Soon

Named Entity Recognition (NER)

Soon you'll be able to train models that automatically identify and tag named entities in your transcriptions — persons, places, dates, organizations, and custom entity types. NER transforms raw text into structured, searchable data without manual annotation.
Automatically detect persons, places, dates, and organizations
Define custom entity types for your research domain
Train on your own annotated examples
Link entities across documents for network analysis
Combine with search to build powerful research databases

Ground Truth Tips

How to produce training data efficiently

The quality and quantity of your Ground Truth directly determines model accuracy. Here are proven strategies to create training data faster.

Run a public model first

Use Text Titan or a language-specific public model for an initial transcription. Correcting is 3–5x faster than transcribing from scratch.

Correct systematically

Work through each page and fix all errors. Pay special attention to unusual characters, abbreviations, and line breaks.

Pick diverse samples

Include pages from different writers, time periods, and document types. Diversity in training data leads to a more robust model.

Train, evaluate, repeat

After your first model, use it to pre-transcribe more pages, correct those, and retrain. Each cycle adds data and improves accuracy.

Start training your first model

Create a free account, upload your documents, and train a custom AI model — no machine learning background needed.

Free50 credits every month
No codeNo ML expertise needed
GPU-poweredTraining in hours