Thursday, January 12, 2006

CALLT 5/5: Testing: When You Train Like You Fight, You Fight Like You Train

First, I'll define super-realistic testing; a form of computer-based testing, which provides uniform measures of skill, is sensitive to progress in language skill, makes cheating almost impossible, can be used for training and scales to a large number of students. Then I'll describe the only language testing project I have heard of [not. Just heard their specification is not ready.]

First, an example of super-realistic test for Chinese character recognition. The test material contains all chracters, which the student is supposed to recognize after the course. The test is administered with a flashcard-like web application. The test chooses randomly 20-40 characters. As I promised earlier, this kind of test would provide:

  • Uniform measure of skill:It doesn't matter, whether you learned the characters from course material or some other way.

  • Sensitive to progress:If you learn more characters, you get better scores.

  • Impossible to cheat:If you memorize all characters beforehand, you have learned what you are supposed to learn.

  • Scales to a large number of students:One server is more than enough for all person studying Chinese in Finland.


About the relationship between training and testing: In super-realistic testing, the same material can be used for training. The differences between training an testing is that (1) testing chooses a sample of material, (2) in testing, the true identity of the student needs to be checked.

The close link between training and testing ensures that there is "initial incentive" to collect material even before it reaches the level where cheating is impossible: When the amount of material is still small, it can be already used for training. Since a large number of students can use the same server, also a large number of teachers can divide the load of creating the material.

The same idea can be applied to sentence translation and reading comprehension, although not directly. For example in sentence translation it is common that there are several correct responses. Also common mistakes need to be taken into account - if the student commits a common mistake, he or she should be informed about the nature of the mistake. Just announcing that the answer is wrong does little good. In character recognition, it is OK to show the result after a wrong answer, but in sentence translation it may be better to let the user try again. Since minor spelling mistakes are more probable in long sentences, several tries are reasonable when training. Also some kind of lazy matching should be used, if the point of the sentence is grammatical and the student remembers some word a bit wrong.

In my last Chinese test, there were 4 kinds of questions: (1) Multiple choice: Given the character and the pinyin, Identify the english meaning, (2) Multiple choice: Given the character, idenify pinyin (3) Arrange the characters so that they form a sentence, (4) Given the English meaning and the pinyin, draw the character. From these 4 tasks, only the fourth is something that can't be digitized. The number of questions required to reach uncheatable level is quite low for the other types of questions. (There is a program to train character drawing, but it is too sensitive. It is good that when you train the program insists on absolutely correct form. However, in testing the standards should such that it is possible to get it right the first time.)

The fact that the progress in super-realistic testing is numerically measurable (number of characters recognized, number of sentences translated) makes it possible to draw charts of progress, if the student does the test on regular intervals (say, once every 2 weeks). These progress charts will add motivation by giving a hidden promise that if you keep on putting the same effort, eventually the progress line will reach the very top, and the effort is finished.

No comments: