Perseus Vocabulary Tool Help (deprecated)

last revised 10/25/02

The Perseus Vocabulary Tool is designed to allow users to explore the vocabulary of the Greek and Latin texts in the Perseus Digital Library. Using the Vocabulary Tool you can select a set of documents or document sections and then view a list of all of the words that appear in those texts.


Setting Up Your List

A Guide To Using the Vocabulary Tool

Work Selection: When you first use the vocabulary tool, you are presented with a selection box that shows all of the works in a language in a Perseus collection. You can select the documents for your vocabulary list by selecting the documents in this box. As usual, Macintosh users can select more than one work by holding the Command key as they click, and Windows users can select more than one work by holding the Control key. You may select as many works or parts of works as you like.

Sort Order: It is possible to select several ways to sort your list. Different sort orders are useful for different tasks.

List Length: The tool also allows you to select the percentage of the words in a document that you want to include in your list. As with the sort orders, the different percentages are useful for different purposes. The vast majority of words in any text appear only once. If you are looking for a list that contains the essential vocabulary for your selected texts, pick a higher percentage. If you want a comprehensive list, pick a lower percentage or the "all words" option. Selecting an alphabetical listing of words requires the display of all words in your selection.

Output Formats: The vocabulary tool provides two different ways to format your output. You can choose a table that will provide attractive output in a web browser or a comma-delimited list that you can import into other software programs. Note that some browsers have problems displaying very large tables; if you are requesting a very long list, the comma-delimited version may work better.

Column Selection: The vocabulary tool can provide a great deal of information such as word frequencies, key term scores, percentages, and short definitions. This option lets you select the data that appears in the vocabulary table so that you can use the format that is best suited for your needs. A complete description of all of the available columns is provided below.

The defaults for these features are to sort by weighted frequency and to display the top 50%. For a typical text this gives a list of 100 to 300 distinct words. These are also the values used if you create a vocabulary list using the "Vocabulary in this document" link in the sidebar of a Perseus classical text.

Viewing the Results

After you make your selection, the system will calculate a custom vocabulary list for your documents.

Vocabulary Results Screen

Vocabulary Size and Density: Several numbers will appear at the top of your vocabulary list to help you understand general characteristics about the vocabulary of your selection.


These three numbers are intended to help you understand the level of vocabulary complexity in your selection. A work with more complex vocabulary will have more unique words while a work with simpler vocabulary will have fewer unique words. The vocabulary density ratio provides a normalized mechanism for this same information. If the vocabulary density ratio is small, the vocabulary is more complex; as the number increases, the text becomes easier. Another way to think about this ratio is that it is an expression of the number of words on average that you will encounter between every new word.

Compare the word counts and vocabulary density scores for Aeschylus' Oresteia and Xenophon's Anabasis. The Oresteia contains 18,934 words and 6,974 unique words with a vocabulary density score of 2.715. This means that, on average, one out of every three words that a reader encounters will be new. On the other hand, Xenophon's Anabasis contains 57,193 words with 4,358 unique words, for a vocabulary density score of 13.124. The higher vocabulary density score suggests a much simpler vocabulary; on average only one in every thirteen words will be new. In fact, the Anabasis is three times longer than the Oresteia but it contains only about 2/3 as many unique words.

Similarly, Livy's History, books 1-10, is 159,132 words long but contains only 8,735 unique words, so its vocabulary density is 18.218. Virgil's Aeneid, less than half as long (63719 words), uses almost as many different words (7,531 of them), giving it a vocabulary density score of only 8.461. In other words, while Livy's vocabulary is larger than Virgil's, new words do not appear as frequently.

The Vocabulary List: Below the table header, the vocabulary list will appear along with a series of numbers to give you information about each word in the context of your list. The actual contents of your list will vary based on the way that you customized the list and the sort order that you requested.

Refining Your Word List: At the bottom of your vocabulary list, you will find the same controls that you used to establish your initial vocabulary list. This will allow you to select new works, refine your sort order, or change the number of words that your list contains.
Vocabulary in Other Languages and Other Collections: At the very end of the initial selection screen and each vocabulary lists are links that will display the vocabulary tool for other languages and other collections in the Perseus Digital Library.

Things You Can Do with the Vocabulary Tool

The Vocabulary Tool is very versatile and it can be used in several ways to help you read a text in the Perseus Digital Library.


revised 25-Oct-02, AEM