This second version of a transcription assistance model for handwritten Yiddish texts was developed by Aleksej Tikhonov as part of the MultiHTR project (Freiburg/Germany, PI: Achim Rabus).
The bulk of the GT data was supplied by the DYBBUK project, funded by the European Union (ERC StG, No. 958150). The texts were sourced from dramas written by Moyshe Hurwitz (1844-1910) and Joseph Lateiner (1853-1935). We extend our gratitude to Ruthie Abeliovich and Sinai Rusinek (the DYBBUK project, Tel Aviv University).
Another part of the GT data comes from Astrid Lembke (University of Mannheim) and consists of Yiddish texts from the 16th century; the manuscript MS Cambridge, Trinity College, F.12.45, which includes two poems by Elia Levita (ca. 1469-1549) as well as three narrative texts: the Mayse mi-Danzek (story from Danzig), the Mayse mi-Menz (story from Mainz) and the Mayse fun Würms (story from Worms). We would like to thank Astrid Lembke for the professional exchange and advice.
For the semi-automatic transfer of the GT from the Hebrew to the Latin alphabet, the tool Protea t3xt conv3rt3r by Gal Abramovitz was used.
The second model represents a phonetically and orthographically more accurate transcription attempt in which special characters and signs of the International Phonetic Alphabet were used in a slightly modified function. The "Transcription aid model 1 for handwritten Yiddish (Hebrew to Latin)" should be used for a more straightforward but less accurate transcription aid.
The aid model can open up the text's content or enable a limited keyword search, even for people who do not know the Hebrew alphabet in its Yiddish applications or are learning it.
In the transcription model, which should be understood as a reading, learning, and indexing aid (not as a transliteration model), there are Hebrew graphemes that can be read differently depending on the convention, tradition, the time of origin, the region of origin, and their phonetic environment. For this purpose, six special or digraphemes were built into the transcription model, indicating to the reader that the sound in question can be read differently at the position in question. The final decision on how to read the sound in question lies with the reader, who knows the individual context in which the text was written and the phonetic context. The special graphemes and the options for reading them are summarized below:
<ɒy>: double iodine <ייי> or <ײַ> can be read as [ay] or [ey]
<ɐ>: Aleph <א> can be read as [a], [o] or an intermediate vowel between [a] and [o] or it remains silent.
<b̤>: Beth <ב> can be read as both a [b] and a [v]. The <b̤> stands for a breathy [b] in phonetics, but is used here to indicate the optional reading.
<ʊ>: Waw <ו> can be read as a [u], an [o] or an intermediate vowel between the two sounds mentioned. In addition, the consonantal options of the reading are [f] or [v]. Phonetically, [ʊ] is a short back vowel, like the first vowel in the German example "Butter" or the English "book".
<ph>: Pe <פ> without diacritics can be read as [p] or [f]. For English readers: The combination does not always result in [f].
<sh>: Shin <ש> without diacritical marks can be read as [s] or [š]/[ʃ]. For English readers: The combination does not always result in [š]/[ʃ].