Train AI models for your documents

Every collection of historical documents has its own unique handwriting. When public models aren't enough, Transkribus lets you train a custom AI model — tailored to your specific handwriting, language, and document style. No machine learning expertise required.

Start training free Browse public models

300+Public AI models

2–5%CER achievable

25–50Pages to start training

Upload your documents

Start by uploading scans of the handwritten or printed documents you want to transcribe. Transkribus accepts JPEG, PNG, PDF, and TIFF. Organize your documents into collections for easy management.

Tip: Start with 25–50 representative pages that cover the range of handwriting styles in your collection.

4typesFormats supported

Text Recognition Models

Train a custom text model with PyLaia

PyLaia is the deep learning engine behind Transkribus text recognition models. It handles handwritten text from any century, any language, and any script — from medieval Latin manuscripts to 20th-century Kurrent. You provide the Ground Truth; PyLaia learns the handwriting.

Works for any script: Latin, Cyrillic, Arabic, Hebrew, Chinese, and more

Handles mixed print and handwriting on the same page

25–50 transcribed pages are enough to start training

Models automatically improve with more Ground Truth data

Share your model with colleagues or the entire Transkribus community

Train models for structured tables

Historical documents are full of tabular data — census records, church registers, ship manifests, accounting ledgers. Table models detect row and column structures and extract cell contents into structured data you can export to Excel, CSV, or XML.

Extracted Table Data

Institution	Town	Amount	Object	Date	Disposition
Franklin College (6)	New Athen, O.		General	3/23/16
Fargo College (3)	Fargo, N.D.	100,000	Endowment	4/27/16	Gen 1914, 5/18/16
Franklin Academy (2)	Franklin, Neb.	5,000	Library Building	8/3/16	Gen 1914, 8/7/16
Fessenden Acad. & Ind. School	Fessenden, Fla.		General	12/22/16
Ferris Institute (2)	Big Rapids, Mich.	50,000	Buildings	2/12/17
Findlay College (2)	Findlay, O.	100,000	Endowment	5/23/17	Gen 1914, 5/28/17
Fairmount College	Wichita, Kan.	200,000	Endowment	6/7/17	6/14/17
Franklin College	Franklin, Ind.	50,000	General	9/13/17	Gen 1914, 9/17/17
Fisk University	Nashville, Tenn.	1,000,000	Endowment	6/14/18
Friends University	Wichita, Kan.	200,000	Endowment	6/20/18	Gen 1914, 8/8/18

Extract specific fields from forms

When you need to extract specific data points — names, dates, addresses, amounts — from structured or semi-structured documents, field models locate and read individual fields. Ideal for census forms, registration cards, and administrative records.

Extracted Fields

Coming Soon

Named Entity Recognition (NER)

Soon you'll be able to train models that automatically identify and tag named entities in your transcriptions — persons, places, dates, organizations, and custom entity types. NER transforms raw text into structured, searchable data without manual annotation.

Automatically detect persons, places, dates, and organizations

Define custom entity types for your research domain

Train on your own annotated examples

Link entities across documents for network analysis

Combine with search to build powerful research databases

Ground Truth Tips

How to produce training data efficiently

The quality and quantity of your Ground Truth directly determines model accuracy. Here are proven strategies to create training data faster.

Run a public model first

Use Text Titan or a language-specific public model for an initial transcription. Correcting is 3–5x faster than transcribing from scratch.

Correct systematically

Work through each page and fix all errors. Pay special attention to unusual characters, abbreviations, and line breaks.

Pick diverse samples

Include pages from different writers, time periods, and document types. Diversity in training data leads to a more robust model.

Train, evaluate, repeat

After your first model, use it to pre-transcribe more pages, correct those, and retrain. Each cycle adds data and improves accuracy.

Start training your first model

Create a free account, upload your documents, and train a custom AI model — no machine learning background needed.

Get started free Read the training guide

Free50 credits every month

No codeNo ML expertise needed

GPU-poweredTraining in hours

Train AI models for your documents

How model training works

Upload your documents

Train a custom text model with PyLaia

Train models for structured tables

Extract specific fields from forms

Named Entity Recognition (NER)

How to produce training data efficiently

Run a public model first

Correct systematically

Pick diverse samples

Train, evaluate, repeat

300+ community models — start without training

The Text Titan I ter

The Text Titan I (Super Model)

Dutch Dean (Super Model)

Dansk Dokumentalist (Super Model)

Start training your first model