Tuesday, December 27, 2005

CALLT 1/5: Dictionaries

In this article, I'll describe various dictionaries as a continuum from the low-end to the high-end.

Basic dictionaries



Electronic dictionaries have two forefathers. One is the dead-wood dictionary. The other is the word list - a text file, opened and searched with your favorite text editor.

The basic feature of an electronic dictionary is to automate the search. Other common features are

  • Reverse search: Search from the known language to the foreign one.

  • Multiple meanings: The word can have several known-language translations.

  • Extra information like grammatical class, pronunciation or examples.



Multimedia dictionaries



Multimedia dictionaries can contain sound and images. The obvious use of sound it to play the pronunciation as a recorded waveform or with a speech synthesizer. Images are more useful for the encyclopedias.

Hypertext is more interesting extension to a traditional dictionary. Combining a dictionary and a grammar should be techonologywise quite easy. For example, if the user looks up a preposition, a grammar page would be more beneficial than a few example sentences.

Morphological dictionaries



Morphological dictionaries can parse inflected words. When the user types an inflected word, the dictionary is smart enough to look up the root word.

The other direction is to produce inflected words. For example Multitran, an English-Russian dictionary, is able to produce the most common inflected verb forms.

A combination of morphological and multimedia dictionary could parse the word and offer the user grammatical information on the inflection.

Wiktionary: Collaborative collection of vocabularies



Wiktionary is a collaborative dictionary, which is based on the wkimedia engine. The wikimedia engine also powers wikipedia. Anyone can edit the articles describing a word.

The core of the wiktionary are known-to-known articles - for example, explaining English words in English. These known-to-known articles have a long list of translations to other languages, for example the word "mouse" has translations to 29 languages.

There are also foreign word articles. They are shorter and contain links to the main language articles. Often they also contain links to related words, which is nice.

Personally, I consider the traditional 2-language dictionaries to be better than Wiktionary. They typeset the entries more beautifully and don't mix more than 2 languages in search. Their coverage is usually good, and sometimes they even contain grammatical information, phrases and example sentences. In the future, I expect that Wiktionary will become a good known-to-known dictionary, while the independent 2-language dictionaries will continue to beat it in the foreign-to-known area.

No comments: