Thursday, November 04, 2010

Blueprints for a translation sentence website

Summary: The previous post pointed out a huge gap in CALL: there are no tools to practise writing. This post addresses it by proposing a translation sentence website.

Why translation sentences?

When people start writing in a foreign language, they first form sentences in their fluent language and then translate them piece by piece. This component behavior can be conditioned. This lowers the barrier to write as the student is already fluent in syntactic structures and only needs to slot in the phrases from his domain area.

Aretae, says that consructivism is the most overlooked aspect of teaching.
Take martial arts for a couple years, and watch how much the Sensei DOESN'T explain...but rather makes you practice until you have an internal representation of the system...then adds 3 words to clarify your mistakes.

Translation sentences are constructivist in the sense that if you have problems with syntax, they teach you syntax. If you have problem with prepositions, they teach you their correct use. If you have problem with word inflection, they improve that area.

Checking correctness

Translation sentences have multiple correct answers. It takes someone who is fluent in the language to classify answers as correct or wrong. Simple string matching is out of question. Finnish Annotator had an option to answer flashcards by writing, and string matching was insufficient even for words. In the end, the site normalized away substantive and verb articles (a, an, them, to) and used edit distance to ignore typos in known-language words.

First of all, instead of listing correct answers there should be word-level regular expression notation to denote different options. This avoids combinatorial explosion when, say, one slot has 3 correct phrases and another has 5.

Since many phrases have synonyms, the site can make sentences more ready from the first try by supporting them. For example, sentences "There is plenty of snow" and "There is a lot of snow" are both correct. This could be denoted with "synonym:'plenty of'".

There should also be a list of wrong answers, which contain common errors. This enables high-quality constructivist feedback.

Preparing for unknown

Nobody can guess all correct and interestingly wrong answers to a translation sentence. The site should deal with it instead of denying it.

First of all, it needs a social web interface for fine-tuning the sentences. This means adding new acceptable answers and new explanations for wrong ansers. This interface is open for authorized users whose language skill has been verified. The site saves all unclassified answers to translation sentences and sorts them by frequency. If many students give a similar wrong answers, explanation should be added.

This social aspect also means that it is not enough to solve the technical challenge of building the website. Before a single line of code is witten, there must be confirmed support from a steering group of CALL researchers. Writing such a site is a big effort and makes the programmer blind to simple but obvious shortcomings in it. The research group would review and criticize away such defects. The group would also kickstart sentence writing and introduce it to students until it reaches a critical mass of having enough sentences to be useful. The same people would be adding correct and wrong answers.

Automatic ways to prepare for unknown

Finnish Annotator used edit distance to ignore typos. This can be applied to translation sentences in two ways. Firstly, correct sentence structure can be verified even if songle words are mistyped when edit distance is used to compare words.

Secondly, sentence level edit distance can point out added or missing words after identifying the closest correct or wrong answer. Missing words are always errors, since problem authors should mark optional words as optional.

Added words may be errors, but they may also be just a symptom that the sentence is new and not enough variations have been added. The site should prepare for it by making a list of words, which are usually innocent when added. This list may need context restriction, so that certain added words are known to be innocent in certain contexts.

Replaced words benefit from similarity comparison. The error is likely to be small if the word is a verb both in a correct answer and the student's answer, especially if the verb is inflected in the same way.

In 2006 I browsed some books about syntactic parsing, where a sentence is converted into a parse tree. The methods looked very difficult to implement, because syntax is much more complex than word inflection. For example the following sentences have the same meaning, but sentence-level parsing is necessary to identify it automatically: "If it rains tomorrow, the party is held inside." and "The party is held inside if it rains tomorrow." Sentence parsing is also useful for rating wrong but unclassified answers: answers which can be parsed into a parse tree are less wrong than syntax-violating sentences.

What kinds of sentences to train?

Simple sentences, which deal with one topic at a time. The topic may be some syntactic structure, preposition, time phrase etc. Simplicity leaves less room for unexpected variation, therefore giving better feedback.

Prior art and differences to flashcard programs

I'm not aware of any significant academic prior art since PLATO, a groundbraking CALL system from the 70s. Unfortunately the details of PLATO's translation sentence engine are not available. Their technology may be obsolete, but the people making it faced the same challenges as we face today. They were not stupid, and most importantly they established a feedback loop where they improved based on experience, while I'm just theoretically speculating.

The main difference to flashcard programs is that a single translation sentence is "difficult and slow" while a single flashcard "easy and fast". Therefore, a site for training tesuji skills in a board game, is better prior art than Anki. Also for, you hear anecdotes where people voluntarily bang it for hours, stopping only when the remaining problems are too easy or too difficult. has an automatic rating system for both users and problems. Problems are rated based on how many people get them right. Users are rated based on how many problems they get right. The rating scale is given in kyu/dan level so that problems and users should have similar rating.

Item response theory tells the mathematical formulas for implementing such ratings, although I don't know which exact method uses. The previous chapters have dealed with methods to distinguish grave errors from small typos and errors in problems themselves. Rating system based on item response theory benefits from having more information than just pass or fail.

Chatbots and communicative language teaching

How could each translation sentence also be meaningful communication? One way is to give the student a communicative task, for example "order a flight ticket to Melborne", and to write a chatbot to hold the other end of the conversation.

This would explode the number of correct reactions. The user could start by greeting. An order for "ticket to Melbourne" could be formulated in tens of different ways.

The details about time window, passanger class etc. could be given right away or the chatbot would have to ask them. It is no longer enough to just parse inflection and syntax, we need to pay attention to meaning. Hard-coded ontology would be needed for each chatbot.

In the last post I said that CLT is nice to have but comes with an expensive price tag. Chatbots are a good example. Each one would take a long time to write. Their feedback would have inferior quality compared to translation sentences. Most importantly, they would not scale to cover large amounts of material.

They would not be enough to train students to write.

No comments: