Tuesday, February 20, 2007

Get a Haircut And Get a Real Job

Big brother Simo got a haircut and is having 3 job interviews this week. Meanwhile, my brother is doing fractal art.


fraktaali
Frozen turboflowers


However, the difference is not as big as in the song of George Thorogood, since I have my own semi-creative project of questionable utility.

Monday, February 19, 2007

One Encoding to Rule Them All

In Java, all strings are Unicode strings. The Swing GUI components also support more advanced features of Unicode like Arabic right-to-left writing. It is almost trivial to make a GUI application that support all possible scripts from Katakana, Cyrillic alphabets and Korean Hangul to thousands of Chinese characters and various right-to-left scripts. The only thing you need to take care of is that when you access external interfaces, the data is passed in Unicode.

In traditional GUI applications, this means reading and writing text files in Unicode instead of the default encoding. Here's how you open a file for writing in UTF-8 encoding:


File file = new File( fileName );
FileOutputStream fos = new FileOutputStream( file );
OutputStreamWriter osw = new OutputStreamWriter( fos, "UTF-8" );
BufferedWriter writer = new BufferedWriter( osw );



The difference to opening a file in the default encoding is really small:



File file = new File( fileName );
FileWriter fw = new FileWriter( file );
BufferedWriter writer = new BufferedWriter( fw );



The difference in reading a file is equally small. Not too difficult, huh? Turns out that it is for those who haven't confronted foreign character sets before. Before starting to roll my own flashcard program, I tried some free ones, some of them written with Java. Turned out that many of them didn't support Unicode. Well, that's one way of making sure that communists don't use your program.


don't support unicode


You may also want to write unit tests to ensure that the Unicode support really works. To embed Unicode into the source files of the test cases, you need to signal the compiler that the source files are encoded with UTF8.



C:\>javac -encoding UTF8 MySource.java



Ditto with the editor (in this case, Eclipse)

eclipse and unicode

With web services, it's a bit different issue, since there are so many external interfaces. First of all, the source file UTF8 switch must be embedded to the Ant build file (assuming that you use Ant):


<javac destdir="${build.dir}"
target="1.5"
debug="true"
deprecation="false"
optimize="false"
failonerror="true"
encoding="UTF8">



Now, let's look at the generated XHTML page. Firstly, to make it valid XML, you have to define the encoding in the first line that describes the nature of the document as XML.



<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="FI" lang="FI">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>



The second instance deserves more explanation. When a web page is delivered from a server to a browser, it is delivered by the HTTP protocol. The HTTP protocol includes HTTP headers that describe the package. Usually, the default values of the headers are just fine and you can forget them - except for the character encoding. The <meta> element is trying to say that the document IS encoded in UTF8, not matter what the HTTP headers say.

Unfortunately, browsers tend to believe HTTP headers rather than the <meta> tag. This code snippet is from the Perl exercise work where I first run into this problem:


# The final part of this line (charset=UTF-8) is
# absolutely essential. Without that line,
# the UTA webserver somehow convinces the
# browser that the encoding isn't UTF8,
# even if the .html says otherwise.
print "Content-type: text/html; charset=UTF-8\n\n";



In servlets, the equivalent code is:



public void doPost(
HttpServletRequest request,
HttpServletResponse response)
throws ServletException, IOException
{
response.setContentType("text/html; charset=UTF-8");

PrintWriter out = response.getWriter();
out.println( "<html>" );
out.println( "<head>" );
out.println( "<title>Helo World</title>" );
out.println( "</head>" );
out.println( "<body><h1>Helo World</h1></body>" );
out.println( "</html>" );
}



In addition to writing data in UTF8, we also need to read characters typed by the user. This is done by setting the encoding propery of the request object (the object that contains the form data from the user):



request.setCharacterEncoding( "UTF-8" );



The Java documentation says that the encoding of the request object must be set before you pull any data that the user just typed. Sometimes the servlet is provided by some third party, and you can't set the encoding just-in-time before reading data. For example, if you use Spring's SimpleFormController to structure to validate forms, Spring automatically reads the form data to a more convenient structure before giving it to you. In these cases, you have to configure a filter that is run before the servlet. Filters are classes that are modify the input or the output of a servlet. This snippet is written to the deployment descriptor, web.xml.



<filter>
<filter-name>CharacterEncodingFilter</filter-name>
<filter-class>org.springframework.web.filter.CharacterEncodingFilter</filter-class>
<init-param>
<param-name>encoding</param-name>
<param-value>UTF-8</param-value>
</init-param>
<init-param>
<param-name>forceEncoding</param-name>
<param-value>true</param-value>
</init-param>
</filter>

<filter-mapping>
<filter-name>CharacterEncodingFilter</filter-name>
<servlet-name>springflash</servlet-name>
</filter-mapping>



Finally, if you use JSP to generate the final pages, you need to put the following headers to the JSP files.



<%@ page contentType="text/html; charset=UTF-8" %>
<%@ page pageEncoding="UTF-8" %>



Now, after shouting 11 times "USE THE DAMN UNICODE!!!" the system finally believes. We're still waiting for the one encoding switch to rule them all.


The encodings not so blest as thee,
Shall in their turns to limitations fall;
While thou shalt flourish great and free,
The dread and envy of them all.

Rule, Unicode! Unicode, rule the scripts:
Britons never shall need more glyphs.

Friday, February 16, 2007

Configuration Considered Harmful

Half a year ago I downloaded my first Java 2 Enterprise Edition, also known as J2EE, the version of Java you use to build websites. It uses a very complex build system named Java Blueprints, spanning 26 files and 2000 lines of XML and settings. Coming from Symbian background, I though that the build process really is incomprehensibly complex: In Symbian, the build process uses ~100 or so perl modules, and you are not supposed to modify them. To give you an idea of how much 2000 lines is, note that this applet, which packs some real functionality, takes ~1500 lines.

In fact the J2EE build process is really simple. First, you build a jar file with a specific directory structure. Then you deploy it to some application server. Knowing this is useful when you switch from one application server to another (in my case, from Sun's server to Tomcat). Therefore, it's really stupid to use a huge build system.

Complex systems are OK when they really work and make things easier. Anyone who has converted a medium-sized application from manual building to Make or Ant knows this. However, when they become complex enough without being nicely encapsulated, they introduce an extra moving part. Useless moving parts merely make things more difficult.

Java is famous for giving good and accurate error reports in the form of exceptions and stack traces. The compiler checks the code extensively before you can execute it. There is also a standard way to produce good documentation for Java modules that are delivered to third parties (the Javadoc tool). Configuration systems tend to be exactly opposite: They use custom formats, often XML, which are often poorly documented. Their error annoucements leave you wondering what the hell went wrong, if they give any in the first place. Often, you can't be sure whether the error is in the configuration files or in the code.

Next, I'll present a truly minimal hello world J2EE application. Here is the list of files:



build.xml - the Ant build file
src\HelloServlet.java - the source code
web\WEB-INF\web.xml - the deployment descriptor



Here is the source code. When you type the address of the web service (usually http://localhost/hello or something similar), the program calls the get or post method of this servlet. The request object contains the things that the user sent to you: for example, if the user filled a form and pressed submit, the contents of the form are in the request. The response represents the HTML page that you return to the user.



/* Minimal Hello World servlet */
import java.io.*;
import javax.servlet.*;
import javax.servlet.http.*;

public class HelloServlet extends HttpServlet {

public void doGet(
HttpServletRequest request,
HttpServletResponse response)
throws ServletException, IOException
{
doPost( request, response );
}

public void doPost(
HttpServletRequest request,
HttpServletResponse response)
throws ServletException, IOException
{
response.setContentType("text/html; charset=UTF-8");

PrintWriter out = response.getWriter();
out.println( "<html>" );
out.println( "<head><title>Helo World</title></head>" );
out.println( "<body><h1>Helo World</h1></body>" );
out.println( "</html>" );
}
}



The deployment descriptor tells that the application should be deployed at "hello" subpage (http://localhost/hello) and that the HelloServlet generates the page (there may be several servlets).



<?xml version="1.0" encoding="UTF-8"?>

<web-app>
<display-name>hello</display-name>

<servlet>
<display-name>HelloServlet</display-name>
<servlet-name>HelloServlet</servlet-name>
<servlet-class>HelloServlet</servlet-class>
</servlet>

<servlet-mapping>
<servlet-name>HelloServlet</servlet-name>
<url-pattern>/</url-pattern>
</servlet-mapping>

</web-app>



The Ant build file build.xml is still somewhat more complex than needed, but a far cry from 2000 lines. There's nothing worth reading happening there, it is here merely for completeness. It has been simplified from the spring tutorial build file, which strikes a very good balance between simplicity and functionality.



<?xml version="1.0"?>

<project name="flash1" basedir="." default="build">

<property name="src.dir" value="src"/>
<property name="web.dir" value="web"/>
<property name="build.dir" value="${web.dir}/WEB-INF/classes"/>
<property name="name" value="hello"/>
<property name="tomcat.dir" value="/tools/tomcat5.5"/>
<property name="deploy.path" value="${tomcat.dir}/webapps"/>

<path id="master-classpath">
<fileset dir="${web.dir}/WEB-INF/lib">
<include name="*.jar"/>
</fileset>
<!-- We need the servlet API classes: -->
<!-- for Tomcat 4.1 use servlet.jar -->
<!-- for Tomcat 5.0 use servlet-api.jar -->
<!-- for Other app server - check the docs -->
<fileset dir="${tomcat.dir}/common/lib">
<include name="*.jar"/>
</fileset>
<pathelement path="${build.dir}"/>
</path>

<target name="usage">
<echo message=""/>
<echo message="${name} build file"/>
<echo message="-----------------------------------"/>
<echo message=""/>
<echo message="Available targets are:"/>
<echo message=""/>
<echo message="build --> Build the application"/>
<echo message="deploywar --> Deploy application as a WAR file"/>
<echo message=""/>
</target>

<target name="build" description="Compile main source tree java files">
<mkdir dir="${build.dir}"/>
<javac destdir="${build.dir}"
target="1.5"
debug="true"
deprecation="false"
optimize="false"
failonerror="true"
encoding="UTF8">
<src path="${src.dir}"/>
<classpath refid="master-classpath"/>
</javac>
</target>

<target name="deploywar" depends="build" description="Deploy application as a WAR file">
<war destfile="${name}.war"
webxml="${web.dir}/WEB-INF/web.xml">
<fileset dir="${web.dir}">
<include name="**/*.*"/>
</fileset>
</war>
<copy todir="${deploy.path}" preservelastmodified="true">
<fileset dir=".">
<include name="*.war"/>
</fileset>
</copy>
</target>

</project>



This example gives us another case about configuration and its discontents: The deployment descriptor. When you do traditional graphical user interfaces with Swing, you create the elements by code. Imagine that instead of writing an XML deployment descriptor you would configure the application by code, applet style, demolishing the second big instance of configuration in J2EE. What would it look like?



import javax.servlet.*;

public class HelloApp extends Weblet {

public void init() {
setName( "hello" );
addServlet( "HelloServlet", HelloServlet.class );
mapServlet( "HelloServlet", "/" );
}
}



In what way is this inferior to web.xml?

Thursday, February 15, 2007

Time stopped?

In the good old days of the old Republic machines doubled their speed yearly. My three-year-old 2GHz laptop is getting slow, so checked the new machines. To my surprise, the low-end is almost the same. Something fundamental has changed while I didn't look.

This post is being written on AMD Athlon 2600+, and here they sell AMD Athlon 3000+. For the first time in my life, a memory upgrade may be enough to transform and old machine into a modern one.

Eclipse takes 100MB, Firefox takes another 100MB and the app I run on Tomcat takes 100MB. And then you should compile. Not something that fits to 500MB.