The Dynamic Lexicon
The Dynamic Lexicon is an NEH-funded project to automatically create bilingual dictionaries (Greek/English and Latin/English) using parallel texts - source texts in Greek or Latin aligned with their English translations - along with the syntactic data encoded in treebanks. From these raw materials, we can construct a lexical entry that illustrates how a word is used not simply in all of Greek or Latin literature, but in any subset of that collection (e.g., Greek drama or the works of Cicero).
Dynamic Lexicon Entry for the Greek noun δύναμις.
|
While the automatically induced information naturally contains noise (e.g., the misclassification of ἔχις or the mistranslation of the second example sentence), it reveals larger patterns of usage consistent with traditional lexica. In particular, we have automatically induced three categories of information:
- Morphology. This entry has correctly categorized δύναμις as a noun. Some lexemes have multiple parts of speech – e.g., the very common word καί can be used as a conjunction ("and") and as an adverb ("even") and has different sense and syntactic behavior as a result of this distinction.
- Sense. By aligning all our Greek source texts with their English translations at the level of individual sentences and then words, we have induced that δύναμις has three predominant senses in all of Greek literature – "power," "force," and "army" – and that "army" itself is an especially dominant sense in the works of Flavius Josephus.
- Syntax. The availability of syntactially parsed data allows us to calculate that the most common attributes for δύναμις are ναυτικός ("naval") and πεζικός ("on foot") – both especially dominant in the works of Polybius. The alignment of parallel texts lets us present appropriate translations of each (e.g., a naval force rather than a naval army)
In addition, the availability of Greek/English and Latin/English parallel text that has been aligned at the level of individual sentences also allows us to supplement the lexical entry with several instances of its actual use in text – allowing us to present not only the source text but also its automatically aligned translation.
Data
The published form of the Dynamic Lexicon includes automatically generated lexical entries along with the underlying intermediate analysis used to generate them (including word-level alignments between source texts and their translations, and automatic morphological tagging and syntactic analysis for the Greek and Latin originals). All data is licensed under a Creative Commons Attribution-Sharealike license.
- Dynamic Lexicon entries. XML files for all automatically generated lexical entries. [Greek] [Latin]
- Automatically word-aligned parallel texts. [Greek] [Latin]
- Automatically aligned, tagged and parsed source texts. [Greek] [Latin]
- Treebank data. [Greek] [Latin]
Publications
Bamman, David, and Gregory Crane, "Measuring Historical Word Sense Variation," in: Proceedings of the 11th ACM/IEEE Joint Conference on Digital Libraries (JCDL 2011) [pdf]
Bamman, David, and Gregory Crane, "The Ancient Greek and Latin Dependency Treebanks," in: Caroline Sporleder, Antal van den Bosch and Kalliopi Zervanou (eds.), Language Technology for Cultural Heritage (Springer, 2011). [pdf]
Bamman, David, and Gregory Crane (2009), "Computational Linguistics and Classical Lexicography," Digital Humanities Quarterly 3.1 [html]
Bamman, David, and Gregory Crane (2008), "Building a Dynamic Lexicon
from a Digital Library," Proceedings of the 8th ACM/IEEE-CS Joint
Conference on Digital Libraries (JCDL 2008) (Pittsburgh) [preprint]
Acknowledgments
Grants from the National Endowment for the Humanities (PR-50013- 08, "The Dynamic Lexicon: Cyberinfrastructure and the Automated Analysis of Historical Languages") and NEH/DOE/NERSC ("Large-Scale Learning and the Automatic Analysis of Historical Texts") provided support for this work. This research used resources of the National Energy Research Scientific Computing Center, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.