Wednesday, December 29, 2010


A body mass index of 23.7 is normal weight, so I haven't been on a serious diet ever before. However, it is too much stomach fat for pole dancing, so next year I'll diet from 63kg to 55kg, resulting in 20.7 BMI.

The problem with mere physics-based diet of eating everything less it that it increases hunger, devastates ability to concentrate and produces relapses. Cutting out just starch (pasta, potatos, rice) seems to alleviate these side effects. Full ketogenic diet would require too much cooking. Keeping weight stable is much easier task: just weight yourself every morning, and eat less for a day if your weight crosses your Alarm Barrier. Body has some internal weight-stabilization mechanism, which makes this easier by controlling appetite.

Actually I already started on December 1st, first losing 1.5kg by dieting. Then a puke disease on Christmas days with vomitting and diarrhea took another 1 kg away, apparently permanently, despite me eating as much as I could keep inside, and sometimes a bit more, triggering The Symptom. Although the figure on the scale decreases, this is bad news for dieting. First of all, bed rest wastes muscles, so relative strength doesn't increase. Secondarily, the bacterial ecosystem in the guts still haven't recovered to the point where stool doesn't resume liquid form from a bit heavier exercise.

Wednesday, December 22, 2010

Merry Christmas, your last one

Twas the night before Yuletide and all through the hole
Not a creature was stirring, not even a Dhole
Aldebaren hung at the right place at nine
In the hopes that Great Cthulhu would come out this time

The Fungi from Yuggoth, all snug in their caves
Were plotting to turn all the people to slaves
The Deep Ones in Rlyeh, the Ghouls in their graves
Were dancing and singing and acting depraved

When what do my wondering eyes should appear
But a mouldering sleigh and eight corpselike reindeer
With a horrible driver so leprous and reeking
I knew right away that my fear was unspeaking

The reindeer were gross, as they flew up from hell
And It hoarsely whispered and chanted a spell
Ia Shub Niggurath! Cthulhu ftagn!
Nyarlathotep! I summon you on!

As decomposed flesh before the charnel stench rise
And meet with the open air polluting the skies
Up to the housetop the horror it rose
And the gangrenous odors assailed my nose

And then in a slopping noise heard on the roof
The lumbering clomping of octopoid hoofs
As I drew in my head and was turning around
The horror lurched into my room with a bound

Its eyes how they pulsate
So bulbous and gory
This blasphemous creature
So noxious and hoary

I was frozen by fear, my feet woudn't run
I threw up my cookies, this wasn't much fun
It whispered my name and said "You come with I"
I tried to refuse and it said "Then you die."

It came at my throat with its grim claws extended
But a miracle saved its victim intended
I had three Elder Signs in a slot in the floor
It screamed with a fiendish sound and went out the door

It sprang to its sleigh, and its team gave a surge
And away they all flew to the sound of a dirge
I heard it exclaim as it flew out of sight
"You're lucky this time, for the stars weren't right."

This joke gets old quickly, but there's always the first time you hear it...

Monday, December 20, 2010

Follow-up on the goals for 2010

Summary: What's the point of setting goals, if you don't track them?

  • Cleaning up this blog. Completed except for a few removed posts and the status signaling post, which I didn't know to be provocative. The burst of negative feedback in the summer was probably just a sign that the posts were relevant. Status and sexuality are still a bit mysterious, so unavoidably some posts miss the mark. The best option is to write so clearly written and understand the subject so well understood that everyone agrees with the posts. However, if some rabid psychobitch chooses to keep on insisting that I'm too young to walk 10 km without being accompanied by my mother, no amount of rational explanations can possibly ever resolve that, and that's why the comments are nowadays moderated.

  • Year of bodyweight training. Completed. Got that push-up from headstand to handstand in October. Not planning to continue bodyweight training, it neglects feet and gets monotonous.

  • Get Chinese to completion where active study can be replaced by reading news and blogs in relaxed way. Completed for newspaper text only. Also, Chinese Language Group revealed that writing and speaking skills need a lot of work. However, there won't be Chinese goals for 2011. I'll integrate writing practise to other activities, like writing journal partly in Chinese.

  • More active social life. Completed. Been participating in events of The Club, pole dancing events and in organizing Go Congress in the summer.

Last year I wrote that physical instincts give valid raw data, but social insticts give only bad advice. This has changed. Now also social instincts give correct raw data, and the bottleneck is acting on those instinctual situational assessments.

In Pole Debut 2010, I had a 15-minute chat about pole dancing with a previously unknown girl. Overcoming bitch shields used to be impossible for me, but now it happens once every 3 months when strong context is available. Further progress will hopefully enable that more reliably with less context (for example in concerts). That is still a far cry from Roissy's or Finndistan's level, where they can talk with pretty much any girl without any context in nightclubs. However, even modest progress shows that the approach works and the direction is right and that there is no reason for despair despite me being 10 years behind my peers in social skills. Advanced methods like negs are still far, but the few written negs I've tried have produced good results. The option of just magically getting great social skills overnight is not on the table. Gradual progress is, and it is happening.

Wednesday, December 15, 2010

Simo the Crank

This quote is from Emanuel Derman's My Life as a Quant

When I was a graduate student at Columbia in the 1970s, physics was the great attractor for the aspiring scientists of the world. Bearing witness to this was the large box of documents kept near the entrance to the physics deparment library. We referred to it as the "crank file." The box contained the unsolicited typewritten letters, manuscripts, and appeals that poured steadily into the mailbox of the department's chairman. Eccentric though the documents were, they made fascinating reading. There were eager speculations on the nature of space and time, elaborately detailed papers refuting relativity and qunatum mechanics, grandiose claims to have unified them, and farfetched meditations that combined physics with more metaphysical topics. I remember one note that tried to deduce the existence of God from the approximate equality of the solid angled subtended by the sun and the moon when observed from the earth, a remarkable circumstance without which there would be no solar eclipse.

None of these papers had much chance of getting past a journal referee. Few of the writers had much hope of even getting into graduate school. They may not have wanted to. The letters were mostly a cri de caeur from isolated and solitary physicist manques all over the world.

Most of my classmates laughed at the naivete of the letter writers, but as I skimmed through the crank file I found it hard to feel superior. Instead, peering into the box of manuscripts, I always saw my pale reflection. Out there, beyond academia and industry, were people like us, similarly in thrall of the same sense of mystery and power that lay behind the attempts to understand and master the universe with only imagination and symbols. ...

Old madness took a grip of me for 2 months. But this dark side needs to be processed somehow, merely denying it as in 2009 is not the right answer.

Friday, December 03, 2010

Slow maybe

So I found a contact person from Tampere University and met her on Tuesday. She's a practising teacher, which means that she has visibility to the frontline trenches of computer-aided language learning. The 30min discussion that we had already clarified many things.

The primary difference between educational establishment and the Chinese/Japanese self-study scene is that the educational establishment is content-rich (they have paid, experienced and talented professionals writing content) and algorithm-poor. By contrast, the Chinese-Japanese self-study scene is content-poor (they have few good databases like EDICT/CEDICT for vocabulary) and they squeeze the last ounce of juice from those by applying high-quality algorithmics.

AFAIK The flashcard senetence deck which I used was created by simply searching example sentences for each HSK character from Juku. Also MDBG annotator work by searching the characters from the one extensive free Chinese vocabulary that there exists, namely CEDICT, despite the fact that it doesn't even break the words into several meanings. This way, algorithmics take full use of the few available databases.

By contast, in educational establishment we have professionally collected dicitonaries where each entry give plenty of information about phrases and alternative meanings, but where the algorithmics are controlled by companies who have no interest to do Firefox extension plugins, flashcard programs or annotators. The difference is even bigger in textbooks.

It seems to me that the teachers on the trenches don't need another slow and difficult site like the translation sentence site. Instead, they get far with customizations to existing tools. Let's hope that my contact person has patience to meet week after week and months after month to discuss needs and uses of technology in language teaching. It will be like peeling onions: I need to peel their needs layer after layer by small customizations (at most 1 week) which give them what they want.

In the long run, to get a thesis topic I need to find out an area where longer computer science effort is needed to implement useful features to CALL tools. However, getting from small customizations to that level will take months. Let's hope that the teaching/language contact person has patience to go through the technology mapping discussions instead of getting impatient for the apparent lack of concrete results.

I'd rather spend months in analysis paralysis, considering various rerearch topics, rather than to pick a research topic only to find out years afterwards that my research output is completely useless, because I failed at due diligence in the beginning. That nightmare scenarion actually happened with Finnish Annotator. I'd rather spend months doing due diligence, only to find out that there is no CALL research topic, thatn start and waste effort prematurely.

Tuesday, November 23, 2010

Check your assumptions at the door

Finishing graduate studies is hard. At Tampere University computing science department, there are about 100 graduate students. Every year about 5 of them graduate. This means that every PhD thesis requires about 20 student enrollment years. For every student who makes it, about 2 others fail. What makes me think I can do it despite not working on it full time?

What are my strengths and weaknesses?

The worst weakness is that I plan to work while doing graduate studies. This restricts available time.

My stregth is that industry experience has given me solid programming routine. Any research plan should rely on this strength in order to be realistic. Competitive disadvantage has to be balanced out by a competitive advantage. Otherwise risks grow.

How will I find time and energy?

A few years ago starting regular exercise increased my energy levels permanently. Thus far I have poured this extra concentration only into Chinese. However, Chinese is moving from the active study phase to the slow and steady vocabulary build-up phase. This liberates the time for graduate studies. Without a clear plan how to use this time, I'll just waste it, getting nothing in return.

Minimizing the amount of work is the second path. Both computer-aided language learning and bioinformatics are fields, where it is important to cross the cultural gap between two disciplines. Relatively modest skill in programming and mathematics is enough compared to pure computing science topics like model checking. This means that smaller number of hours is enough to produce new discoveries.

How will I benefit from graduate studies?

First of all, I don't expect to get paychecks from university. Applying for a university position would be a bad choice since industry salaries are bigger than researcher salaries. Getting a good salary from research requires a teaching position. You have to prepare for that already while studying by working as a teaching assistant. When I studied, I prepared for industrial work.

Getting a PhD degree will make it possible to apply to new kinds of jobs with higher pay. If I am able to come up with a popular CALL website, it will continue to mill small amounts of advertisement income for years. Setting up such a site requires a big initial effort but little maintenance effort after that. It creates an economic incentive to make the site high-quality from the start.

Monday, November 22, 2010

Chinese character exercise for N900

Summary: This post introduces Bezca, a CALL software prototype for training Chinese characters. It showcases that (1) the technology in Finnish Annotator can be modified to many different purposes, (2) having a day job does not prevent me from writing CALL software with realistic goals, and (3) context is fairly easy to integrate to any CALL software, as long as you take an uncompromising attitude towards the need for context.

In Bezca, you train Chinese characters by drawing the strokes with a stylus, finger of plectrum. Correctly drawn strokes appear to the screen as you draw them:

If you don't remember the character, you can look at the hint which also tells the next stroke:

Clicking "Show Examples" displays dictionary words and examples sentences which use the character in question.

In the example browser, if you don't know some word in an example sentence, you can just click on it and enjoy further examples about that word. This way, you can browse examples in an endless chain, Wikipedia style.

Bezca also contains a spaced repetition system. Pressing "This was easy" shows the exercise again in a week. Pressing "This was hard" shows it again in a day. After that, it uses exponential time gaps to decrease or increase the period based on user responses.

In the beginning, the program calibrates difficulty to suit the student's skill level. It shows some characters and asks the user to say if they are suitably difficult or too easy. This way, students can go directly to material which is new for them.

This is a prototype and not yet mature enough to be distributed. It contains only 180 characters. Installing requires a memory card. I don't currently have any plans to take it further, because demand is likely to be small, the use of CEDICT would make it a copyright violation to ask for a price, and it would take a lot of effort to input 1000 characters. However, I'm happy to demonstrate it face-to-face to anyone, especially CALL researhcers.

Friday, November 05, 2010

Action points and plan B

1. I've been working on an N900 port of the character drawing exercise.
It demonstrates that FA technology is still valid and can be reused. Spend 2 week finalizing the proof-of-concept prototype.

2. Post an introduction to the N900 port.

3. Write a plea for CALL research partners into this blog.

4. Make a list of CALL researchers by browsing the researcher lists in Finnish university web pages.

5. Write cold call mails for CALL researchers, tell them about my track record in CALL, and ask if they want to talk about research. Add links to my thesis and blog.

If step 5 produces no contacts, it means that cold calling doesn't work (what a surprise.) In that case I have to use personal contacts to get a research topic for graduate studies in 2011. Usually people don't talk about their work, including researchers. This leaves me with few options.

6. Contact Yoe to ask for a research topic in bioinformatics. This post hints that she might have a suitable topic for someone trained in math and programming. I don't know Yoe, so to avoid cold calling, I would ask recommendations from Vera and Janka. ("We have tracked Simo for years and he is what he claims to be. If you have a suitable research topic in bioinformatics, you'd probably benefit a lot from assigning it for Simo.")

Good: Bioinformatics has reputation as down-to-earth and useful branch of applied researh. I'll learn new things because of the 'bio' part.

Bad: No earlier track record. Not enough background to evaluate if the research topic would pass the scrutiny in Hamming's advice.

If 6. fails:

7. Ask The Scientist for an applied research topic in model checking based on this post.

Good: I know The Scientist personally. Also FA already made me familiar with finite languages and state machines. For example, I implemented deterministic state machine minimization to make some vocabulary state machines faster to handle, while The Scientist is working with algoritms to simplify nondeterministic state machines.

Bad: When studying, we hade a course about DisCo and temporal logic of actions. While the theory part was a fun trip to a different worldview, DisCo toolset left a really bad taste in my mouth. It had "ivory towery" feel: it could never become useful for solving practical problems, no matter how well the researchers reach their research goals.

I'm aware that The Scientist doesn't work with DisCo, and it's just my stupidity that I don't understand the field. After all, he does writes more often, for more readers, about a wider range of topics and so on. But that does not change the fact that it would be insane for me to do research in an area where I don't understand the big picture.

Asking The Scientist for a research topic may lead to a very embarrassing situations where I have to say no even if he gives me everything I ask for.

If 7 fails:

8. Write a plan for 2011 which does not include graduate studies.

Thursday, November 04, 2010

Blueprints for a translation sentence website

Summary: The previous post pointed out a huge gap in CALL: there are no tools to practise writing. This post addresses it by proposing a translation sentence website.

Why translation sentences?

When people start writing in a foreign language, they first form sentences in their fluent language and then translate them piece by piece. This component behavior can be conditioned. This lowers the barrier to write as the student is already fluent in syntactic structures and only needs to slot in the phrases from his domain area.

Aretae, says that consructivism is the most overlooked aspect of teaching.
Take martial arts for a couple years, and watch how much the Sensei DOESN'T explain...but rather makes you practice until you have an internal representation of the system...then adds 3 words to clarify your mistakes.

Translation sentences are constructivist in the sense that if you have problems with syntax, they teach you syntax. If you have problem with prepositions, they teach you their correct use. If you have problem with word inflection, they improve that area.

Checking correctness

Translation sentences have multiple correct answers. It takes someone who is fluent in the language to classify answers as correct or wrong. Simple string matching is out of question. Finnish Annotator had an option to answer flashcards by writing, and string matching was insufficient even for words. In the end, the site normalized away substantive and verb articles (a, an, them, to) and used edit distance to ignore typos in known-language words.

First of all, instead of listing correct answers there should be word-level regular expression notation to denote different options. This avoids combinatorial explosion when, say, one slot has 3 correct phrases and another has 5.

Since many phrases have synonyms, the site can make sentences more ready from the first try by supporting them. For example, sentences "There is plenty of snow" and "There is a lot of snow" are both correct. This could be denoted with "synonym:'plenty of'".

There should also be a list of wrong answers, which contain common errors. This enables high-quality constructivist feedback.

Preparing for unknown

Nobody can guess all correct and interestingly wrong answers to a translation sentence. The site should deal with it instead of denying it.

First of all, it needs a social web interface for fine-tuning the sentences. This means adding new acceptable answers and new explanations for wrong ansers. This interface is open for authorized users whose language skill has been verified. The site saves all unclassified answers to translation sentences and sorts them by frequency. If many students give a similar wrong answers, explanation should be added.

This social aspect also means that it is not enough to solve the technical challenge of building the website. Before a single line of code is witten, there must be confirmed support from a steering group of CALL researchers. Writing such a site is a big effort and makes the programmer blind to simple but obvious shortcomings in it. The research group would review and criticize away such defects. The group would also kickstart sentence writing and introduce it to students until it reaches a critical mass of having enough sentences to be useful. The same people would be adding correct and wrong answers.

Automatic ways to prepare for unknown

Finnish Annotator used edit distance to ignore typos. This can be applied to translation sentences in two ways. Firstly, correct sentence structure can be verified even if songle words are mistyped when edit distance is used to compare words.

Secondly, sentence level edit distance can point out added or missing words after identifying the closest correct or wrong answer. Missing words are always errors, since problem authors should mark optional words as optional.

Added words may be errors, but they may also be just a symptom that the sentence is new and not enough variations have been added. The site should prepare for it by making a list of words, which are usually innocent when added. This list may need context restriction, so that certain added words are known to be innocent in certain contexts.

Replaced words benefit from similarity comparison. The error is likely to be small if the word is a verb both in a correct answer and the student's answer, especially if the verb is inflected in the same way.

In 2006 I browsed some books about syntactic parsing, where a sentence is converted into a parse tree. The methods looked very difficult to implement, because syntax is much more complex than word inflection. For example the following sentences have the same meaning, but sentence-level parsing is necessary to identify it automatically: "If it rains tomorrow, the party is held inside." and "The party is held inside if it rains tomorrow." Sentence parsing is also useful for rating wrong but unclassified answers: answers which can be parsed into a parse tree are less wrong than syntax-violating sentences.

What kinds of sentences to train?

Simple sentences, which deal with one topic at a time. The topic may be some syntactic structure, preposition, time phrase etc. Simplicity leaves less room for unexpected variation, therefore giving better feedback.

Prior art and differences to flashcard programs

I'm not aware of any significant academic prior art since PLATO, a groundbraking CALL system from the 70s. Unfortunately the details of PLATO's translation sentence engine are not available. Their technology may be obsolete, but the people making it faced the same challenges as we face today. They were not stupid, and most importantly they established a feedback loop where they improved based on experience, while I'm just theoretically speculating.

The main difference to flashcard programs is that a single translation sentence is "difficult and slow" while a single flashcard "easy and fast". Therefore, a site for training tesuji skills in a board game, is better prior art than Anki. Also for, you hear anecdotes where people voluntarily bang it for hours, stopping only when the remaining problems are too easy or too difficult. has an automatic rating system for both users and problems. Problems are rated based on how many people get them right. Users are rated based on how many problems they get right. The rating scale is given in kyu/dan level so that problems and users should have similar rating.

Item response theory tells the mathematical formulas for implementing such ratings, although I don't know which exact method uses. The previous chapters have dealed with methods to distinguish grave errors from small typos and errors in problems themselves. Rating system based on item response theory benefits from having more information than just pass or fail.

Chatbots and communicative language teaching

How could each translation sentence also be meaningful communication? One way is to give the student a communicative task, for example "order a flight ticket to Melborne", and to write a chatbot to hold the other end of the conversation.

This would explode the number of correct reactions. The user could start by greeting. An order for "ticket to Melbourne" could be formulated in tens of different ways.

The details about time window, passanger class etc. could be given right away or the chatbot would have to ask them. It is no longer enough to just parse inflection and syntax, we need to pay attention to meaning. Hard-coded ontology would be needed for each chatbot.

In the last post I said that CLT is nice to have but comes with an expensive price tag. Chatbots are a good example. Each one would take a long time to write. Their feedback would have inferior quality compared to translation sentences. Most importantly, they would not scale to cover large amounts of material.

They would not be enough to train students to write.

Sunday, October 31, 2010

Current state of Chinese CALL

Summary: This very dry post summarizes which areas of Chinese learning are adequately covered by CALL tools, and which areas still need better tools and content.

Hamming's essay for choosing research topics describes CALL spot on, if you replace "research paper read by thousands" with "learning system used by thousands". Finnish Annotator would have passed Hamming's scrutiny, since annotators had already proven useful in Japanese and Chinese, but none was available for Finnish. It aimed at the core of Finnish reading comprehension.

The downfall of FA was partly due to inadequate openness, feedback and networking. This was also predicted in Hamming's essay:

Some people work with their doors open in clear view of those who pass by, while others carefully protect themselves from interruptions. Those with the door open get less work done each day, but those with their door closed tend not know what to work on, nor are they apt to hear the clues to the missing piece to one of their "list" problems. I cannot prove that the open door produces the open mind, or the other way around. I only can observe the correlation. I suspect that each reinforces the other, that an open door will more likely lead you and important problems than will a closed door.

Let's take Hamming's advice to the conclusion and make a list of important problems in computer-aided language learning. This list only covers Chinese, which has the special challenge of learning the characters. It also ignores collaborative learning methods and concentrates on single-user teaching machines.

Why ignore collaborative learning?

The currently dominant learning theory is Communicative Language Teaching (CLT). People use language to achieve communication goals like buying a ticket or describing a problem. CLT claims that also in teaching, each sentence should be part of a speech act with a communicative aim. Modern first-year language textbooks achieve communicative context by describing situations, where tourists achieve communication goals.

CLT is trivially true in the sense that sooner or later you have to move from isolated sentences to communication, for example talking, email exchange or searching for information (and not just reading for the sake of language). However, you have a long way to go before you can read books or write blogs. Before CLT forces itself through the door, you have to bootstrap the language skill somehow. I'm not at all convinced that CLT is necessary in the initial phase. The situation I see on the ground is that the Japanese/Chinese self-study scene is blithely unaware of CLT and still achieves good results.

Don't get me wrong: communicative context is nice, and the best kind of context you can have. But it is hard work to achieve communicative context. You have to make compromises in other areas. CALL scene is nowhere near the level where the presence or absence of communicative context would make a difference.

Finally, a word of warning if you try to achieve communicative context by collaborative learning. B.F.Skinner, the father of behaviorism described the problems of collaborative learning methods already in 1953. What's the point of making CALL tools at all, if you just digitize the same old problems?

Skinner's children were growing up. When the younger was in fourth grade, on November 11, 1953, Skinner attended her math class for Father's Day. The visit altered his life. As he sat at the back of that typical fourth grade math class, what he saw suddenly hit him with the force of an inspiration. As he put it, "through no fault of her own the teacher was violating almost everything we knew about the learning process." In shaping, you adapt what you ask of an animal to the animal's current performance level. But in the math class, clearly some of the students had no idea of how to solve the problems, while others whipped through the exercise sheet, learning nothing new. In shaping, each best response is immediately reinforced. Skinner had researched delay of reinforcement and knew how it hampered performance. But in the math class, the children did not find out if one problem was correct before doing the next. They had to answer a whole page before getting any feedback, and then probably not until the next day. But how could one teacher with 20 or 30 children possibly shape mathematical behavior in each one? Clearly teachers needed help. That afternoon, Skinner constructed his first teaching machine.


Area Status Method is
Reading, 0 - 1000 characters Jury is still out on correct approach Mixed
Reading, 1000 - 3000 charactersSolution is known but not implemented Spaced repetition systems with immersive sentence decks
Reading, 3000+ characters Solution implemented, room for improvementReading natural texts through an annotator and using example sentence search for new characters and phrases
Writing Not even started Translation sentences, chatbots (neither exists)
Listening Solved Listening internet radio or simplified podcasts
Speaking Solved Talking face to face or through Skype

Beginner phase: 0 - 1000 characters

First of all, beginners and advanced students should use very different methods. When advanced students learn a phrase, it integrates naturally with their existing knowledge. They can immediately use the word in different contexts. Beginners are only forming those knowledge structures.

For valid historical reasons, current CALL tools are not very good for beginners. In many cities elementary courses are available for Chinese and Japanese, but courses stop after that. Beginner phase also lasts for shorter period. Therefore there is less demand and less tool development for elementary tools. In the intermediate and advanced phase, it is important that the tools scale and can teach large amounts of phrases and accommodate different skill levels. This also means that a software package only needs to implement one scalable method well, for example dictionary search or flashcards.

For beginners, my unjustified gut instincts is that learning games like Slime Forest Adventure are the way to go. (1) Beginners forget things more quickly, since their knowledge structures are just forming. Therefore intensive teaching methods are good and immersive approaches which give little time to forget are preferable. (2) Beginners need to look at the language from several different perspectives (sentence comprehension, syntax, word inflection, communication) all of which are completely new to them. Game programming has the tradition of subgames, which have their own set of rules. I don't see such tradition of variability in other types of software.

Reading comprension in 1000 - 3000 characters

The software is there, but content has plenty of room for improvement. Annoatotrs enable reading easy texts and spaced repetition systems with sentence decks are good for learning characters. Regarding content, I haven't seen any easy reader texts except in Chinesepod. The sentences in my HSK deck were pretty random: they were ripped from the example sentence collection in an online dictionary and then automatically classified by difficulty.

3000+ characters

At this point you can read natural texts and start to read for content. An annotator and example sentence search are all you need. They already exist.


I haven't met any CALL tools for training writing skill. The only method is to "jump to the water and swim" by just starting to write emails and blog posts. This is comparable to practising reading comprehension by just taking a dictionary and a foreign-languge book. Sure, you can do that, but it requires a lot of motivation and willpower.


There are many free radio stations available, and Chinesepod offers easier dialogs. You can listen to them while you clean or cook. There is nothing to improve, since we are already at zero time commitment. This is the ultimate in efficiency.


Speaking is the only way to learn to speak. I don't see how CALL tools could play any role in this. Skype already works.

In Tampere University Alakuppila cafe ther are regular meetings, where Chinese exhange students talk with Finnish language students. For those who live in less forutnate places, there are various commercial services, some of which offer free samples.

Sunday, October 24, 2010

Dying embers of lost passion: Post-mortem of Finnish Annotator

What Finnish Annotator?

Finnish Annotator was my CALL website, developed around 2005-2006. In those years, I was finishing my studies and spent summers writing the website. The site featured an annotator for Finnish and Chinese, a flashcard program and a character-drawing exercise. I took it down in 2008 as it had no users.

Annotator is a "text dictionary", which decodes the inflection and searches explanations for all words in a copy-pasted text. While Google Translate is free, annotators are more useful for language-learners. You can read the text as long as you completely understand it, resorting to hovering your mouse over annotations only when you have to.

The entry page shows how it annotated Chinese text. It also describes how you could turn a copy-pasted text into a flashcard deck. The post about the fundamental problem of flashcards mentioned that my website tried to solve it by taking example sentences from the annotated text. Indeed after you you press "show answer" it showed annotated example sentence where "kun" was used.

The Finnish vocabulary contained 1000-word test vocabulary. The demo page used to work on all browsers, but currently crashes Firefox. Being acutely aware of the need for context, the word definitions contain well-split meanings and example phrases, and sometimes even comparison and contrast to related words.

At the bottom of the Chinese entry page there is a screenshot of the character-drawing exercise, where you move the brush with your mouse and the stroke appears if you are moving the brush correctly. This mayseem similar to Skritter, since both programs took influence from WriteChinese, a piece of prior art from the nineties.

Morphology engine and master's thesis

About half of the code in the website deinflects Finnish words. Finnish inflection is very complex: for example substantives can have 4 different types of postfixes. The site used two-level morphology and state machines to decode the words. These were a bit obsolete methods to handle morphology, but they were provably successful for Finnish and clearly described in Koskenniemi's book. Modern methods would have required access to commercial state machine libraries, which I didn't have.

My thesis described the algorithms in the morphology engine. It used athematical notations and also contained a few proofs. When I returned it, it got full points.

The algorithm for compiling two-level inflection rules contained a minor simplification. Thesis inspector said that it was actually publishable research, but I didn't follow up on that, since I was not planning to return to school. Anyway, it kind of demonstrates that I already know how to do research, I just don't know how to identify it and wrap it into form, which can be sent to conferences and journals.

How it failed

Since I consider myself economically rational and didn't work for two summers, I had to rationalize away the congnitive dissonance somehow. My feeble excuse was that I was doing a semi-commercial system, which would continue to mill extra income after intial setup effort. In practise, what I did was closer to a mild for of hikikomori.

Firstly, I didn't tell about the system to many people, thinking that I'll publish the product when it is ready. Therefore not a single person becase interested enough in it to give feedback and criticize away obvious weaknesses which were easy to correct but for which I was blind, having spent too much time doing it. For example the need to log in first was such a weakness.

Also, in those days I had not yet discovered the Game of Talking and I kept getting bad outcomes in human relationships without really understanding what the hell went wrong. When I wrote last year "Most people develop these surfacial skills as young adults. Unfortunately, you can't skip the development of social skills. If you fail to complete this developmental task as a young person, it will continue to haunt you and drag you down until you solve it.", I meant also Finnish Annotator. This severly limited my ability to get feedback on the system.

The system was quite close in function not just to MDBG annotator, but also to Lukutulkki, a commercial system for annotating English text to Finnish speakers. Had I presented it right, some CALL researchers should have become interested in it.

The most damaging hit from commercial mindset was my reluctance to use gray copyright vocabularies. It was also a question of quality, as dict vocabularies didn't have split meanings nor example phares. I actually started to collect my own Finnish vocabulary. In the end, it had inflections for about 5000 words and meanings and example phrases for somewhat over 1000 words. At that point, Google Translate published Finnish translation, so I thought that no way in hell am I going to get the vocabulary collected before free services offer better than what I have. Since the system had no users, I took it down. It was really idiotic move to start to collect vocabulary from scratch. I believe now that Takkirauta's talk about Manstein's matrix has a seed of truth, and if you notice that you are doing a lot of repetitive informational work (like vocabulary colllection), you are probably doing something wrong and should stop to ponder different options. Don't just do something, stand there!

The main lessons I learned from it are the importance of social skills and awareness that I am prone to obsessive-compulsive tunnel vision which makes me exert a lot of effort when the right solution would be to look at different options.

What parts of it are still useful

Before I can apply for graduate studies, I need to find a research group. Finnish Annotator is my main merit for persuading others to include me in their work and publications. Next, I'll list examples of how the technologies and components in FA could contribute to CALL research.

The character-drawing engine can be modified to train students to write Russian or Arabic characters. In the first Arabic course I participated, learning to read the script was a huge part of the course. Speeding it up with spaced repetition system, which gradually introduces new material after ensuring that the student has mastered dependencies could make a big enough difference for a publishable paper.

Since the two-level morphology can handle Finnish inflection, it can deal with almost any language. Annotation works best when embedded to other services. FA didn't just annotate copy-pasted text, it also annotated any example sentences in the flashcards. Annotation can be integrated to boost any existing research ambitions in CALL.

Tuesday, October 19, 2010

Free beats commercial in CALL (computer-aided language learning)

ChinesePod is the only commercial language learning service which I have used. In 2009 I subscribed for one year. I was quite satisfied with it. Their service consists of textbook-style lessons. Each lessons is independent and covers one theme. The easiest lessons are targeted at beginners; the hardest one take their text from outside source and assume that the student can read it without aids. They publish several lessons a week.

Textbook chapters are annotated by hand. This way, annotations are correct even when words have several meanings or meaning depends on the context. In addition, there is spoken dialog for each chapter.

In the autumn 2009 I discovered Anki and 20000-word HSK sentence deck, and just stopped using Chinesepod despite having paid subscription. At the time, character recognition was the main obstacle preventing me from reading natural texts, and free tools addressed this problem better. Spaced repetition system was superior to the lessons of Chinesepod.

Service Free or commercialRating
Chinesepod Commercial Good, but not as good as Anki + MDBG
Skritter Commercial Inferior to pencil and paper
Slime Forest AdventureSemi-commercialGood for the very limited purpose of learning hiragana and katakana
Anki Free Great way to increase character recognition count
MDBG Free Great way to make sense of sentence deck sentences and increase reading comprehension after you know enough characters

Companies can put more resources into finalizing their CALL tools. Therefore they have higher quality content. Free CALL tools have two advantages. Firstly, they can use "grey copyright" databases, which are de facto free, although license prohibits commercial use and sometimes also other use.

Secondly, two unrelated individuals can contribute to free tools. Both in Anki and MDBG this plays crucial role. In MDBG, Paul Denisowski initiated the CEDICT vocabulary collection and then disappeared. Someone who prefers to stay anonymous maintains MDBG. Anki was written by Damien Elmers while the 20000-sentence HSK deck was written by Brian Vaughan.

The semi-commercial tool, Slime Forest Adventure, would become better if it was open-source - sooner or later, someone would address the fundamental problem of flashcards and turn it into another great tool. But it possibly wouldn't exist without the profit motive.

Sunday, October 17, 2010

Slime Forest Adventure

Summary: This post points out what is novel and good in Slime Forest Adventure, a teaching game for learning Japanese. It also compares spaced repetition systems to games.

Teaching games have somewhat bad reputation. Typically, an enthusiastic teacher chirps "Children spend hours and hours playing WOW. We should make games which harness this to advance learning! Kids learn best when they are motivated and have fun!"

The result is something like Memory Cards or Hangman, which are not really learning games nor learning games, and the teachers are suprised to see that kids continue with WOW. To see why those are not learning games, imagine you have to learn 50 words for a test in 2 days. You have them prepared both as a word list and as a Hangman game. Would you use Hangman to memorize them? Hell no, it would take way too much time without any benefit on learning. To see why they are not really learning games, we need to ponder the very definition of "game".

What does it mean for a software to be a game?

Patterns in Game Design by Staffan Björk and Juusi Holpainen lists 200 patterns, which are commonly used in games. The patterns deal with the subjective experience of playing rather than structuring of game code. Patterns have names like Game World, Levels, Boss Monster, Score, Lives, Resource Investments, Combat, Ability Losses, Storytelling, Alliances, etc.

Hangman and Memory Cards implement just 3 of these patterns. They have Score, namely, how many guesses you have to do before all cards are paired or the whole word is visible. Hiding the words is an example of Asymmetric Information. The games also have a Goal.

Spaced repetition systems as games

Reports where people bang thousands of cards with spaced repetition systems are quite common. In some people, flashcard programs create gamelike ability to maintain attention. This is exactly the feature of games which teachers envy, so let's use our new yardstick to measure how gamelike spaced repetition systems are.

Firstly, they implement Score (how many cards you have mastered) and Asymmetric Information (it shows the card only after the player has tried to guess). They implement goal two times. There is the Goal of remembering a single flascard, and also the Committed Goal of flashing certain number of flashcards each day or week. The player set that goal themselves. This makes flashcard programs at least as good as Memory Cards or Hangman.

Chapter 12 in Patterns in Game Design deals with balancing. Spaced repetition systems implement Right Level of Difficulty and Smooth Learning Curve, since spaced repetition algorithms are all about showing difficult cards more often than easy ones and taking controlled doses of new difficulty.

Right Level of Difficulty

That the level of difficulty experienced by player is one intended by game design.

For the challenges in games to be interesting to players, they need to have the Right Level of Difficulty. If the challenges are too easy, players may be bored while if they are too difficult, players may give up playing game.

Example: Adventures that (sic) can be bought for many types of tabletop roleplaying games are categorized after which levels the players' characters should have. Although a Game Master may use any adventure for any group of characters, the Right Level of Difficulty will most probably only occur if the players have the right levels.

Using the Pattern: Although the difficulty of a game is individual to each player, games can be designed so that players can progress according to their own learning curve. Setting the Right Level of Difficulty in games can either be done by making challenges easier, by making challenges more difficult, or by controlling which challenges players have to meet.

Challenges can be made easier, either by providing information about how to solve the challenge or by making the actions of overcoming the challenge easier to perform, for example, by the presence of Achilles' Heels. Information can be given by Clues, Traces, Extra-Game Information, or by letting players discover it themselves through Experimenting. Making challenges easier usually requires some form of Tradeoff for players and can be done through Selectable Sets of Goals or Supporting Goals. Having to choose one goal from Selectable Set of Goals where the different goals have Varied Gameplay allows the player to choose the goal with the perceived Right Level of Difficulty but makes the other goals impossible to complete. The Right Level of Difficulty in a game can also be created by Varied Gameplay to require the players to use different competences. Supporting Goals, for example, trying to find Easter Eggs, do not have to make other goals impossible but take extra time to perform and may deplete Resources for the player.

Making challenges more difficult can be done by introducing opposition or by making the required player actions more difficult to perform. Opposition can take the form of Enemies or Preventing Goals of Agents or other players in Multiplayer Games.
... (goes on and on) ...

Consequences: Providing the Right Level of Difficulty in games allows players to feel Tension as there is a risk that they may fail, while giving the Empowerment since they have a Perceived Chance to Succeed and Illusion of Influence. If the Right Level of Difficulty is continuously provided for players, it gives them a Smooth Learning Curve and increases the likelihood that players progress to having Game Mastery. If this Right Level of Difficulty is due to Competition, the learning is enforced by a Red Queen Dilemma.

Moreover, people who use good flashcard programs notice the difference in their reading skill. This introduces Game Mastery. In learning games, Game Mastery is all about scalability. Players notice a boost in language skill if learning is quick and there is enough content to make a difference.

Now we have concluded that flashcards programs win Hangman and Memory Cards 7 - 3 in gamelikeness. It is debatable if Hangman and Memory Cards are games at all when they lose so easily to programs, which are nothing like games.

Making flashcards more immersive

Patterns in Game Design is not just a yardstick for measuring gamelikeness but also a cookbook for increasing gamelikeness. Sentence decks could be made more gamelike by making sentences form a terse story full of sex, drugs, violence and cliffhangers. That would add Storytelling and Narrative Structure. Cliffhangers would be Hovering Closures (events which are about to occur and can be clearly observed by players.) Desire to see progress in plot would add Anticipation (The feeling of being able to predict future game events in the games to which one has emotional attachment) and Surprises.

Slime Forest Adventure

Since spaced repetition systems are already gamelike, why not integrate one into a game? Slime Forest Adventure (SFA) does this by using an SRS as a combat system.

In the combat, you hit slime enemies by typing the correct hiragana, katakana or kanji. SRS ensures that combat is always suitably difficult. When you learn to consistently remember a group of characters, you can move to new areas as your skill is sufficient to fend off the slimes. This way, plot advances.

I was going to write that Slime Forest Adventure fails to address the fundamental problem of flashcards, making it a factlet memorization game rather than language learning game. However, the athor has added word recognition tasks for hiragana and katakana, offering very limited contextual integration. SFA could and should offer much more context to becomde a real language game.

Anyway, SRS integration makes it best-of-the-breed learning game, since competitors don't even try. SRS combat is a novel innovation, which unfortunately has not been copied elsewhere. Slime Forest Adventure is copyrighted from 2003, so this isn't even new.


Current hegemonic paradigm in teaching games is utterly flawed and provides neither immersion nor learning. This post introduced two ways to attack the problem: (1) to use SRS as a combat system as in Slime Forest Adventure, or (2) to make flashcard programs more gamelike by adding elements from Patterns in Game Design. Properly done, these approaches achieve both immersion and better learning. These approaches are old but remain unexplored.

Tuesday, October 12, 2010

Supernerd warning!

The posts about computer-aided language learning may cause anxiety in sensitive people who are used to following unwritten rules on which topics you are and are not allowed to discuss. If their nerdiness makes you anxious, you can simply close the tab in your web browser and the monsters will disappear.

Recent progress in computer-aided language learning

Summary: This post tells why Anki and sentence mining are important steps forward in the computer-aided language learning scene. Both steps have happened during the last 3.5 years.

Background: The fundamental problem of flashcard programs

When studying languages, flashcard programs show you a word and ask you to give the translation. In recognition task the program shows the foreign word and asks for the English meaning. Production task tests your ability to spell out the foreign word. Flashcard programs are called spaced repetition systems because they contain timing algorithms which ask easy questions rarely and difficult questions often until they become easy. This ensures that the material is on average suitably difficult.

The fundamental problem is that you don't learn the word by remembering its translation. If you now memorize that Telugu word "adivaramu" means Sunday, you'll just forget it in a few weeks. Spaced repetition systems can delay this to months by reminding you about the word. But to permanently learn a word in the sense that Finnish English speakers know that "Sunday" means "sunnuntai", you need context. You need to see the foreign word in tens or hunderds of sentences, so that it integrates with larger data structures in your head and is no longer just a factlet like "the circumference of earth is 44000km".

This problem is specific to spaced repetition systems, because it is already solved in the analog world. Language textbooks provide the context in the text chapters. Filogists who train to be interpreters and translators mainly read books to expand their vocabulary. In that situation all words are in context.

I first realized this problem after I banged through 1000 Lojban words with Logflash only to forget them all in 3 months.

My first conclusion was that you should only flash cards for which you have text. This worked great with Practical Chinese Reader I & II. First I flashed the words and then I read the text. Thanks to spaced repetition system I could go through chapters much faster.

When I started to build my own language-learning website, I fully realized the importance of tackling this problem. I was using MDBG annotator. It can turn any text into decent study material, unless the text is much above your level. My first approach was to grab the context from the same source as the words. My website had a feature which turned a copy-pasted a Chinese text into a flashcard deck, which contained all words in the text. It also had an easy interface for removing familiar words. The word flascards had context attached: After you gave your answer, it showed the sentences where the word appeared. It also annotated the sentence MDBG-style: When your mouse hovered over any unknown word in the sentences, the meaning of the word appeared.

This solution had a shortcoming: The sentences were too long and difficult, and having just one sentence of context was not enough. I also realized that the real learning happened when studying the sentences, and that they were at least as important as the words being flashed.

My second solution was to collect a database of translated easy sentences and to automatically match them to flashcards. I never properly implemented this, because it required HUGE amount of database collection. Anyone who has ever written example sentences knows how slow it is. The best I achieved was to type enough sentences for an elementary course in Chinese. The material contained Skritter-style character drawing exercises for 200 characters and simple, clear, translated example sentences for them all. This produced adequate quality but it didn't scale. This lack of scalability made it a toy site. Shortly after that, I graduated and stopped developing the site.

Sentence-based flashcards

During the last 3.5 years, an ingenious solution surfaced to the Fundamental Problem: Sentence mining. The idea is that sentences are the basic unit of flashing, not words. Just like gymansts train whole-body movements and just trust that individual muscles get stronger, in sentence flashcards you just trust that you also learn words while flashing sentences.

This is a new developement, as Xamuel's artice is written September 2009 and the Chinese sentence deck I now use was written in 2008. I stopped working on SRS in 2007. This idea is so simple that it makes me ashamed that I didn't notice it. I had already diagnosed the problem and was trying different solutions to it, but somehow failed to take the last step of imagination and to fully move to sentence-based cards.

My own experience confirms that it works like dream. During my Chinese study, I've periodically benchmarked my character count with Clavis Sinica's character test. During the first 4 years, I reached the weekly average score of 2200. During 10 months with sentence deck, the character count exploded to 3000. I could have reached the current skill level a full year earlier, had I known about this method. Now I no longer use the sentence deck, because it has been so efficient that the bottleneck has moved away from single Chinese characters and more context-heavy methods like reading texts with MDBG are more appropriate.


The rise of Anki is the second big step forward in the computer-aided language learning (CALL) scene. Anki does not contain anything revolutionary, but it combines all good features from all previous flashcard programs into one consitent and easy package. It is so good that if I entered into CALL scene again for the purpose of doing research for graduate studies, I would scrap my old website, which included a spaced repetition system, and use the superior, refined and open-source Anki instead as a basis.


Although my own CALL efforts failed, recent developments in CALL field demonstrate that I was tackling the right questions: How to get context for words in flascards, and how to construct a good spaced repetition system. Progress happened when these problems were addressed. I've witnessed the superiority of the result myself with Anki and 20000-card HSK sentence deck.

Wednesday, October 06, 2010

MAOA and reconvictions

Helsinki University recommends that the decision to free a murder or keep him in jail should use genetic information among other data. If this is implemented, it is the first time that personality estimates based on genetic tests determine a person's future.

When a man is convicted to life in prison in Finland, he can only be released by pardon. Estimates about his danger to society are used when deciding about pardon. The existing method is to use PCL-R scale to estimate how psychopathic the person is. The steak of the new research is that MAOA gene + PCL-R score together provide even better estimate.

MAOA gene comes in high-activity and low-activity variants. Among convicts with low-activity MAOA, there is no link between PCL-R score are reconviction rate. Among high-activity MAOA convicts, each extra point in PCL-R increases reconviction rate with about 7%.

In many studies MAOA has been linked to depression and psychopathy, but the results are full of "ifs" at best and mutually contradictory at worst. If a person with low-acticity MAOA is exposed to childhood violence, it increases the risk of becoming a psycho. Links between MAOA and depression are contradictory. Individual studies have linked MAOA to economic risk taking and voting: People with high-activity MAOA prefer to take risk and use their vote, while low-activity MAOA carriers prefer to take insurance and vote less often.


Should I start graduate studies next year, I should prepare for it already this year. The first step is to find a research topic. Bioinformatics seems like a good source of research topics. There are new results and new types of data coming out every year, so it should be possible to make solid research by merely applying standard computing science methods to some new problem. In this kind of applied research, the strong programming routine from industry background should be an advantage. This way I could avoid the need to catch up with 30 years of algorithm development history, a burden which handicaps for example state machine or graph theory research. Instead of developing those algorithms, my task would be to pick and combine algorithms and adjust them to the problem at hand. It is not easier, but it is more skill oriented and less memory oriented. Another advantage of applied bioinformatics research is that it has concrete goals to strive at. This does much to avoid buzzword-heavy, bullshitty basic research from which you can see straight away that it is never going to produce anything useful, which makes it extremely demoralizing for people working on it.

Thursday, September 30, 2010

"Do you visit this planet often?"

Recently a new planet, Gliese 581g, was found. From all planets it is the most earthlike thus far. It has a gravity around 1.1g - 1.7g.

Until it is confirmed if it contains ice and water or not, we can have wet fantasies about this planet next door. The distance is only 20 light years, but thanks to time dilation, once we develop near-light-speed travel, people will be able to travel there and back without dying of old age in between. Only their relatives at Earth will all be gone.

So here's my suggestion: let's put TEKES funding into a big technology program which aims at manned space flight to Gliese 581g. That should produce more tanglible results than the current projects.

As FuturePundit points out, this is also a question of global security: When will the invasion space ships from Gliese 581g arrive overhead and begin planetary bombardment? Robert Heinlein taught us in The Moon Is a Harsh Mistress that the side which can drop stuff from planetary orbit has a huge advantage. In this case, attack is the best defence. It is either us or them.

One side of the planet is always in the sunlight, while the other is always in the dark. This is because the planet rotates around its star in the same cycle as it rotates around itself. The most habitable zone is guessed to be somewhere between scroching sunlight and eternal darkness, in the interim region called the terminator.

Who should we send there? Let's offer Arnold Schwarzenegger a promotion from governator into gliesenator ask him once more to step in the terminator. When he'll be back to earth, his muscles will be once more in prime shape after living 24 hours a day in heavier gravity and telling "hasta la vista, baby!" for any pretty green femme fatales who try to seduce him into the dark side, controlling the horizontal and the vertical in the outer limits.

Thursday, September 09, 2010

Pole dancing criticism

Pole dancing is frivolous waste of time

Regarding age-appropriate activities, men in their 30s should be doing graduate studies, starting families and companies, learning martial arts and investing, but not dancing. Dancing in the nightclubs is appropriate for people in their early twenties.

Therefore I emphasized the fitness angle: Since you have to put in the hours anyway to stay healthy and fit, pole dancing is just as good way to do it as any.

Male pole dancing is for homosexuals

Let's face it: most male pole dancing clips in YouTube are damn gay. I prepared for probing questions about my sexual orientation by using opportunities to emphasize heterosexuality, the last time being 5 posts ago. Fortunately pole dancing is so unknown that this never came up. It is much easier to convince that I am not a 20 years old stripper girl than to explain "No, I am not gay despite having abysmal sexual track record and being interested in dancing."

Nobody forces you to send stripper vibes by doing deep squats, rubbing yourself against the pole and removing your clothes. Also nobody forces you to send gay vibes by dancing topless and spreading legs clothed in Latex pants while high-pitch disco music plays in the background. There are plenty of other styles. It's all about what you want to express.

Pole dancing is what strippers do

Miley Cyrus pole dancing
The most common counterargument goes something like this: "Once I saw a movie where a stripper was leaning against the pole for support while doing deep squats and shaking her ass, with thick wads of cash portruding from her mail stamp sized strings and fuck me boots in her feet. What are YOU doing among THEM!?"

First of all, the single message which pole dancers try to get across to broadcast media reporters is that pole dancing is not striptease. This is visible in various What Is Pole Dancing pages, which invariably emphasize its roots in Chinese circus acrobatics.

I wonder if these people have never gone to a strip club to demolish the glamorous and sexy images which movies attach to striptease by seeing how grim and frivolous places they are in practise. Or have they patronized porn bars often enough to see performances which are actually good? Anyway, my experience is that and you can be lucky if you see a few elementary level pole dancing moves. Strippers end up looking sexy in the same way as Anssi Kela is touching: They are so blatantly manipulative that counterreaction exceeds intended reaction.

What you won't see there are gymnastics and acrobatics which are typical to pole dancing:

Sunday, September 05, 2010

Visual style of parapara

Summary: This post lists the rules which give parapara its distinctive look. The rules are described on a level of abstraction where they can be applied to pole dancing.

Parapara is a japanese dance style which emphasizes hands. Various organizations publish official parapara routines for dance songs popular in Japan. Everyone dances the same way. Some dance clubs hold parapara evenings, where people gather together to dance the song routines, which they have practised among friends or at home from official videos.

In parapara, the primary goal is to synchronize moves with music. Dancers do similar moves when similar musical patterns are repeated in the song. The moves closely reflect the rhythmic structure of the song.

This maximal synchronization has been achieved by emphasizing choreography for the whole song. Each move is precisely timed, for example "the next full sweep will last for 4 beats".

The moves are very simple. Simplicity is a priorization choice. It enables rhythmic synchronization, since simple moves can be done quickly. It also enables emphasis on routine for the whole song, as dancers don't need to practise individual moves. The moves don't need strength or flexibility. Speed requires some practise. You may not be immediately able to reproduce all moves from a video, but when someone shows them in slow motion face-to-face, not a single move poses any challenge.

Parapara moves have lots of straight lines, round swirls and symmetry. Usually either the moves are symmetric along vertical or horizontal axis, or they are followed by 'mirror moves' which complement symmetry.

Finally, parapara moves have the same clarity as a single beat of a drum. Sometimes you can see if a move is correct or wrong, just like you can hear extra beats among repeating drum pattern.

In parapara, human body becomes a visual rhythmic instrument which complements musical rhythmic instruments.

It is a mystery for me, why most dancers ignore the rhythmic struture of their songs. It seems to be a kind of 'dog whistle' which is hard to see for some, while for others it is an elephant in the room.

For example in SubTV's Dance program, Turo Kankaanpää's dance moves are well synced to the rhythm, and one commenter notices it. However, the judges didn't comment on it at all, even if one commented positively about the immersion and appeal it created. In Dance, the first round can result in 3 outcomes: rejection, pass or a second try in group choreography round to get more data. Turo went to the choreography round.

In some dance traditions, sync is so secondary that the song can be just switched to another at will. This would be unthinkable in parapara. Also when talking about parapara, for example Paula doesn't get it:
Paraparaa tanssitaan yleensä eurobeatin, trancen tai muun jumputuksen tahtiin, mutta oikeastaan sitä voi tanssia mihin tahansa rytmikkääseen musiikkiin. Paraparassa ei ole tarkoituksena sheikkata peppua ja näyttää seksikkäältä - päinvastoin! Tarkoituksena on oikeastaan vain heilua puolelta toiselle ja käsillä viuhoa sarjan mukaan.

What would it mean to apply the rules of parapara for pole dancing?
  • Search for positions like the half flag or pole sit, where hands and feet are free to do parapara-style patterns.
  • Practise how to do easy moves with accurate timing (for example a fireman spin for 4 beats).
  • Practise how to do a series of quick transitions between easy positions.
  • DON'T practise sexy & lusty move from striptease tradition as they are not in line with parapara's abstract & flirty body language.
  • DON'T waste time in practising individual difficult tricks.

Saturday, August 21, 2010

Fireman spins

Went to my first pole dancing class. There the sharp-witted Iina taught various spins, including the fireman spin. At home I had only tried climbing and static holds, so spinning was new to me.

The poles in the studio are static chrome poles. I have a titanium pole, which has both static and rotating mode. The friction is different in different poles. Chrome has linear friction: the harder you grip the better your hold. This makes spinning quite easy from the very first try. On the contrary, the friction in titanium is binary - either the grip holds or you slide quickly to the ground. I just can't do any spins on static titanium pole at home. Luckily rotating titanium pole is the easiest option of all.

Sunday, August 01, 2010

Status 4/4: When status signaling utterly and completely fails

A child is not considered competent to evaluate himself if he is thirsty or hungry. He should obey orders from any familiar adult. This true story tells how a group of people used invisible status signals to reduce me to a child's subject position.

Two months ago The Club organized a short forest walk (vaellus). It lasted 4 hours. There were 6 of us. We walked to a sleeping place, grilled some sausage and returned. I'll tell the events from status perpective. Here DHV means demonstrating high value (superior status) and DLV means demonstrating low value (low status).

The only participant which I knew beforehand was Mikko. I had seen him in The Club's monthly bar evening. There he told that Pulse will hold a concert the following day. I joined him there, thus DHVing (obeying) him.

Another participant was Veikko. Since neither me nor Mikko had a car, Veikko took us from the bus station to the starting spot of the forest walk, thus DHVing himself and DLVing me and Mikko.

Earlier, I got drunk at a metal concert. I was hung over and had slept for only 4 hours, which made me very grumpy. This DLV:d me, since I was in no position to play the Game of Talking competently. Already in the car I yapped way less than 25% of the chit-chat.

The first warning sign happened when we left the car and were about to leave. Mikko told me that I should put my jacket to the bag, because we wouldn't stop during the journey. I DLV:d by putting my jacket to the bag.

The second red flag emerged when there was a pool of water in the path. Mikko told me to cross the pool from the right side. This confused me: he acted as if I needed advice in crossing a pool of water! I rejected his offered subject position firmly put politely by using the other side. I thought that this represented only his opinion, to be met with zen-like calm and indifference, not realizing that vultures had already smelled blood and were circling in for a kill.

When we held a pause, Mikko offered me water. Starting to see a pattern, I rejected. Mikko no longer believed my "no" even after I had repeated it 10 times and emphasized that I had done comparable blueberry picking trips many times before. He insisted that I should take some water. After a few more repetitions he believed, commenting that he can't force me to take water and it is my own fault if I dehydrate or get a sunstroke.

In the grilling place the same thing happened again with Veikko and sausages. It ended only when Veikko shouted with semi-angry tone "Take the fucking sausages now!" and I complied. After 30 seconds, I put them back to his sausage packet and he finally understood that he can't force me to take them.

Somehow, Mikko and Veikko had conspired to put me into a child's subject position - of someone whose word is worthless, who should take orders from anyone on trivial matters like crossing a pool of water, and who can't evaluate if he is thirsty or hungry. They reached their consensus with signals which were completely invisible to me.

One reason why I want to learn status signaling is to avoid these kinds of total collapses of social fabric, where all assumptions of normal conversation are cancelled. Earlier, I used to be scared shitless by these incidents and withdraw from social interaction for weeks, thinking that I did something bad to deserve then. After going to gym, my personality changed so that now I am merely angry at people who conspire against me and motivated to learn how to execute the right game moves to avoid these kinds of accidents.

Monday, July 12, 2010

Status 2/3: To ignore or not to ignore, that is the question

Old opinion on status

Earlier I thought that mainly upper class persons have status. Marketing men take advantage of people's natural drive to improve their lot. Lower and middle classes buy status symbols because companies like to sell overpriced goods by associating them with high status. Status is a ruse to separate the gullible from their money.

There is an old saying that money does not bring happiness, but it's better to be unhappy in a Lexus than in a Lada. When we consider status as a ruse, it would be better for those people to sell their Lexus and save or invest the money, Ilkka style. If they retain the Lexus, they are bleeding money when its price depreciates month by month, while getting only hallucinations in return. The only exceptions are people who are so dirty rich that the cost of the car is neglible.

Seen this way, status matters only for persons who need to make a good impression on many others, like salesmen or celebrities. The 1% of people who get 50% of visibility benefit from status symbols. This strengthens the illusion that status is more important than it really is.

New opinion on status

Reading Roissy and paying attention to the Game of Talking has made me realize that there is a special kind of status, psychosocial dominance, which is tightly vowen to everyday interaction and orthogonal to economic status. If I want to have a sex life and settle down at around age 35, I must learn the ropes of these status games.

This does not make the old opinion obsolete. Rather, I need to complement it so that I can express status enough to come across as a desirable sexual partner, while still being able to avoid getting stiffed by marketers.

Sunday, July 11, 2010

Status 1/3: Economics and status in programming work, rabbit-duck edition

Summary: Programing work can be seen from two angles. The "economic" angle is that programming aims to produce software. The "status" angle is that in any big bureaucracy, the internal dynamics make status the primary aim.

Economic view

Alistair Cockburn defined software development as a co-operative game of invention and communication. The primary goal is to deliver useful, working software. The secondary goal is to prepare for the next game.

In this naive view, programming is just one type of work which aims to deliver customers something they are ready to pay for in order to keep the organization afloat.

A good software developer should know technology and economics and to have earlier experience about similar software.

This view is in line with transhumanist ideology, where technological progress is a driving force which enables new and better things, when people get more tools to implement their will and can afford more and more slack.

Status view

The status view emphasizes hierarchy and influence inside a bureaucracy. The primary game is how to gain influence through office politics.

From status view, programmers have two millstones in their neck. They are at the lowest level of their hierarchy. Secondly, they have a bad reputation for not being experts in gaining Roissy-style psychosocial dominance by talking.

In status view, shipping working software is irrelevant, unless failure to do so threatens the very existence of the organization. The only relevant question is if your back is covered if something goes wrong.

A good developer should know marketing, be extroverted and slightly narcist.

Technical knowledge is important only to the extent that you don't lose your face. Technical knowledge has a half-life while social skills don't. Therefore technical ways to solve problems are inferior to other ways. People who know too much technology are losers who play the wrong game.

The four post-Marxist social classes

Half Sigma divided people into 4 social classes. The two higher classes are the college graduate class and the value transference class. Here, the economic view uses the values of the college graduate class and the status view uses the ideals of the value transference class.

This post was about the conflict between economic and status-based worldviews. The video is about the conflict betwen scientific and religious worldviews. In both cases, reality can be interpreted in two equally justified ways.

Saturday, July 03, 2010

Tuesday, June 29, 2010

The difference between social democrats and transhumanists

Genetic testing tells some people that they have a high risk of disease. They can prepare for it if they hear about the illness years before its onsalught.

FuturePundit predicted in 2007 what actions people will take when genetic tests reveal information about their future:

But which risks will be worth testing for? Those you'll be able to do something about. Suppose a genetic variation makes Alzheimer's inevitable at middle age and that diet has little influence on when you'll get it. Well, I guess you could decide to avoid taking on family responsibilities that you won't be around to fulfill. But initially the biggest potential for doing something about a risk will involve risks that can be influenced by diet or exercise.


What you should do when you discover 5 or 10 years hence that you have high genetic risk of a disease: Write your elected officials and argue for more research on the disease you are on course to get. Lobby for cures for diseases that will otherwise kill you and your loved ones.

This is exactly what Sergey Brin did when he discovered by genetic testing that he has a hereditary risk for Parkinson's disease.

The article also contains a description of social democratic attitude, where people are considered helpless victims rather than players who execute moves to improve their position in the go board of life:

In the study, a team of researchers led by Robert Green, a neurologist and geneticist at Boston University, contacted adults who had a parent with Alzheimer’s and asked them to be tested for a variation in a gene known as ApoE. Depending on the variation, an ApoE mutation can increase a person’s risk for Alzheimer’s from three to 15 times the average. One hundred sixty-two adults agreed; 53 were told they had the mutation.

The results were delivered to the participants with great care: A genetic counselor walked each individual through the data, and all the subjects had follow-up appointments with the counselor. Therapists were also on call. “People were predicting catastrophic reactions,” Green recalls. “Depression, suicide, quitting their jobs, abandoning their families. They were anticipating the worst.”

Thursday, June 24, 2010

This is a wrong place to debate Nokia's technology policy, since NDAs make it impossible to recite evidence.

The previous post got a lot of attention, but unfortunately you won't see similar topics in the future. This blog will continue to be about my private life and interests.

Sunday, June 20, 2010

Nokia, the next Geoworks?

Summary: Nokia gave a profit warning, because their high-end phones do not sell well. This post compares the problems of Symbian, Nokia's high-end phone OS, to Steve Yegge's analysis of why Geoworks went bankrupt.

What Geoworks?

Geoworks was a software company which wrote a windowing system and applications in assembler. In the end, it went bankrupt in the end of nineties. When Steve Yegge wrote about the benefits of high-level languages, he used Geoworks as an example how using low-level languages takes a toll on business.

His argument is that low-level languages make optimization impossible and implementing features slow. After the system reaches a critical point in complexity, usability starts to suffer. User see sluggishness, bugs and lack of features.

...But it's because we wrote fifteen million lines of 8086 assembly language. We had really good tools, world class tools: trust me, you need 'em. But at some point, man...

The problem is, picture an ant walking across your garage floor, trying to make a straight line of it. It ain't gonna make a straight line. And you know this because you have perspective. You can see the ant walking around, going hee hee hee, look at him locally optimize for that rock, and now he's going off this way, right?

This is what we were, when we were writing this giant assembly-language system. Because what happened was, Microsoft eventually released a platform for mobile devices that was much faster than ours. OK? And I started going in with my debugger, going, what? What is up with this? This rendering is just really slow, it's like sluggish, you know. And I went in and found out that some title bar was getting rendered 140 times every time you refreshed the screen. It wasn't just the title bar. Everything was getting called multiple times.

Because we couldn't see how the system worked anymore!

Which is higher-level language, C or C++?

One benchmark of language level is how many lines of code are needed to implement a feature. In high-level languages, the compiler does more work. The programmer has to write less code. This means that implementation is faster. There are also less bugs, since the lines of code which were not needed don't contain bugs, and because debugging is easier in small haystack.

C language is infamous for being low-level. Therefore it's paradoxical that Symbian Open C is a advertised as a productivity tool. But sadly it really is a productivity tool compared to Symbian C++.

The examples below demonstrate why. The snippets below read a configuration variable from a file. The scenario is that we want to run automatic system tests on a communication protocol and to automate the selection of an access point. It is stored in format "accesspoint=Winsock". The important thing here is the length of the listing, not the exact content.

// Read a configuation variable with 35 lines of code.
_LIT8(KAccessPointId, "accesspoint=");
TBool ReadAccessPointNameL(const TDesC& aFileName, TDes& aResult)
RFs fs; // File session
RFile file; // File handle
TBool apNameFound = EFalse;

// Connect to file server

// Open file for reading
if (file.Open(fs, aFileName, EFileWrite) == KErrNone)
// Read the file to memory (we can't use line-by-line
// reading with TTextFile, since it can't handle 8-bit text)
TInt size = 0;
HBufC8* content = HBufC8::NewLC(size);

// Find the start and end of the access point name.
TInt start = content.Find(KAccessPointId());
if (start > KErrNotFound)
start = start + KAccessPointId.Length();
TInt end = start;

// Find the next newline.
do {
} while(end < content.Length() &&
(*content)[end] != '\r' &&
(*content)[end] != '\n');

// Save the result.
aResult.Copy(content.Mid(start, end - start));
apNameFound = ETrue;


return apNameFound;

The same in C:

// Read a configuation variable with 20 lines of code.
const char* access_point_id = "accesspoint=";
char* read_access_point_name(const char* file_name)
char* result = NULL;
// Open file for reading.
FILE* file = fopen(file_name, "r");
if (file) {
char line[200];

// Read line by line and search for access point variable.
while(!result && fgets(file, line, 200)) {
if (strstr(line, access_point_id) == line) {

// Remove newline.
int len = strlen(line);
while (line[len - 1] == '\r' || line[len - 1] == '\n') {
line[len--] = 0;

// Get the value of the variable.
reuslt = strdup(line + strlen(access_point_id);
return result;

Symbian C++ was written before people really knew how to do object-oriented programing. They completely botched all the APIs. The horrible descriptors were designed to counter memory overflows, which C functions don't check. Nowadays they just clutter the code. Symbian takes pride in being a microkernel OS, so they require the programmer to connect to servers to start sessions. This adds further lines. The exception handling with cleanup stacks vomits more useless lines. And if you think this is ugly, you haven't seen anything, like the use of active objects in the socket interface.

One C++ selling point is the syntax for classes. Well, Symbian has a 68-page coding convetions which gives very explicit rules how to name classes and which functions they should at least have. This nitpicking makes classes heavy structures, and decimates any advantage from syntactic sugar. Virtual functions are the only part of C++ which wasn't assaulted. Even templates were banned as too error-prone.

So Posix C really is a higher-level language. Just for comparison, here is the same in Ruby.

# Read a configuration variable in 9 lines of code.
def readAccessPointName(fileName)
file =, "r")
file.each_line do |line|
if (line =~ /accesspoint=(.*)/)
# The (.*) in regular expression caught
# the access point name to $1.
return $1.chomp
raise 'No access point name in file ' + fileName

So it is unsurprising that App store contains 225000 appplicaitons while Ovi store contains just thousands.

GeoWorks attempted to get third party developers but was unable to get much support due to expense of the developer kit — which ran $1,000 just for the manuals — and the difficult programming environment, which required a second PC networked via serial port in order to run the debugger. (source)

But it's the user experience that counts...

Symbian phones are famous for having equal features but lower usability than iPhone. To demonstrate how difficult programming is visible in usability, I'll tell you about usging the file browser to read log files. The plain text viewer has several defects. If you open a large log, it announces out of memory error and shuts down. It underlines randomly some content which it thinks might be a link. It can't choose a small font to show lots of content, so you only see a few lines at a time. Luckily, there has been some progress in plain text viewer. Earlier, it used to crash with medium-sized files. Now it either shows it or announces error.

The way I see it, these defects reflect the difficulty of the programming platform. Usually programmers have some professional pride, which makes them fix errors and usability defects with time. What could be stopping it? We can only speculate.
  • Customizing the UI component which shows text would require too much work, since platform doesn't support dynamic loading and presentation.
  • Low level language necessitates big project sizes. This dilutes responsibility so that no one is responsible for the plain text viewer in the "buck stops here" sense.
  • There is a culture of fixing only showstopper bugs and leaving others there, since there isn't time to fix all bugs, as fixing a single bug is slow.

This way, we get "multimedia computers" which can't display plain text.

It doesn't have to be this way

Nokia does fine in low-end phones, which use the closed Series40 OS. Also the Maemo/Meego platform is promising, however the phone in N900 is still fresh software, creating issues in sound quality and usability. They haven't had time to finalize the phone. In compatibility with major desktop operating systems, Maemo's Linux kernel runs circles around Symbian. This will show up in usability sooner or later. You can always strip down the user interface to produce a simpler phone which is easier to use, but you can't put the solid Linux infrastructure to a Symbian phone.

In the long run, I'm optimistic about Nokia's future. Once they finalize the phone on Maemo and scrap the Symbian platform, they'll be fine. If you want to capitalize on this, the right time to buy Nokia shares is just before they start selling their next Meego phone. However, make sure that the press agrees that Meego has good phone, battery life and usability - if they botch them on Meego, they won't recover. However, I'm not putting my money where my mouth is, because I have enough economic Nokia risk in my life already.