Weijian's Technical Notes: 2009

Tuesday, December 29, 2009

Capture and Analyse Network Packets

tcpdump is the standard packet capturing facility available on most Linux systems, which is based on command line. Wireshark, formerly called Ethereal, is another popular packet capturing facility, free and GUI-based. Both tcpdump and Wireshark are based on pcap, so it is possible to combine them in capturing and analysing network packets, to take advantages of both.

For instance, I use the following tcpdump command to capture the traffic to and from www.google.com using http protocol:

sudo tcpdump -i wlan0 -w td.dat -nnvvXSs 1514 host www.google.com

Note:
sudo: It may require root privilege to capture packets.
-i wlan0: By default tcpdump captures packets on the eth0 interface. Since I am using wireless, I need to specify the wireless interface wlan0. When using VPN, the interface should be ppp0 instead usually.
-w td.dat: write all captured packets to the file td.dat.
-nn: no hostname and port resolving.
-vv: very verbose.
-X: print in both hex and ascii.
-S: absolute sequence.
-s 1514: tcpdump takes the first 68 bytes of data from a packet by default. Here the first 1514 bytes are taken.
host www.google.com: this is the expression which says capturing packets whose dst host or src host is www.google.com.

See this tcpdump tutorial for more info about tcpdump usage.

Now Wireshark can be used to analyse the captured packets by tcpdump. Here Wireshark's GUI is exploited.

Use Wireshark to open td.dat, and apply the preset http filter. The http traffic can be easily browsed.

Saturday, December 12, 2009

Buzzwords in Job Descriptions

In these days, I am looking at job descriptions for senior Java developer position. Here is the list of buzzwords appearing inside. The list will grow when I come across more buzzwords.

JMeter: a Java framework for measuring server performance. Server types include Web, Web service, database (via JDBC), LDAP, JMS, mail (POP3 ...

Selenium: a Firefox extension that allows composing web tests inside Firefox, replaying tests and generating tests in many different programming languages such as Java, C#, Ruby, Groovy ... This is a good example for what extra functionalities Firefox extension can bring to the browser.

Saturday, December 05, 2009

Memory Overhead of Java Objects

First, each Java object has two implicit references: one to its monitor (lock), the other to its method dispatch table. Each reference occupies 4 bytes, so that is 8 bytes overhead.

Second, byte alignment needs to be taken into consideration. On a 32-bit machine, object needs to be aligned at 4-byte boundary. On a 64-bit machine, object needs to be aligned at 8-byte boundary. Nowadays, 64-bit machines become popular. For instance, my laptop has a 64-bit Intel Core 2 Duo CPU T7500.

So on my laptop if I create an object that has only one byte field, then the actual size of the object will be 16 bytes. That is 93.75% overhead. If I create an object that has three int fields, then the actual size of the object will be 24 bytes. That is 50% overhead.

So be very careful when creating a huge number of small objects, because significant extra memory will be required for object overhead.

How to measure the size of an object

Write a simple program consisting of an infinite loop. Inside the loop, create the object whose size is to be measured. Then use "jmap -histo pid" to measure the size. pid is the process id of the Java program. Because the loop is infinite, it gives plenty of time for jmap to connect to the Java process.

Friday, October 23, 2009

Adjustable Timer Task

Java provides TimerTask for tasks that can be scheduled for one-time or repeated execution by a Timer. But it has some serious drawbacks.

First, TimerTask can only be scheduled either for one-time or for repeated execution of a roughly fixed period. It can not be scheduled with changing periods, for example, at the following moments since its start: 0, 5, 6, 12, 100, 101, 102 seconds ... If it is scheduled twice, for example, first scheduled after 1 seconds, then 3 seconds after the first execution, no matter in which thread the second schedule is requested, inside or outside the Timer thread, a java.lang.IllegalStateException: Task already scheduled or cancelled will be thrown.

Second, each TimerTask is served with an individual background thread. Thus it makes TimerTask not a very scalable solution if a lot of TimerTasks are needed but each of them is not very heavy weighted.

Since 1.5, Java provides the java.util.concurrent package, which includes an Executor Framework. The basic idea of Executor Framework is to separate the concerns of tasks and the mechanism to execute tasks. So programmers define tasks and then leave tasks to be executed by the Executor Framework, which can be configurable to use a single thread, or a thread pool to execute the tasks.

The following Java code illustrates how to use ScheduledExecutorService to implement adjustable timer task.


public class Test {

 static private ScheduledExecutorService scheduler =
     Executors.newSingleThreadScheduledExecutor();

 static private Runnable pig = new Runnable() {
     public void run() {
         System.out.println("This is pig");
         scheduler.schedule(this, 1,
            TimeUnit.SECONDS);
     };
 };

 static private Runnable bear = new Runnable() {
     Random random = new Random();
     public void run() {
         System.out.println("This is bear");
         scheduler.schedule(this,
            2 + random.nextInt(3),
            TimeUnit.SECONDS);
     };
 };

 /**
  * @param args
  */
 public static void main(String[] args) {
     scheduler.schedule(pig, 0, TimeUnit.SECONDS);
     scheduler.schedule(bear, 0, TimeUnit.SECONDS);
 }

}

Wednesday, October 21, 2009

Globus Toolkit's Future

Today Ian Foster sent out an email about Globus' future plans, which contains many interesting points.

First, a GRAM5 (for job submission) will come out as a replacement for both GRAM2 (old fashion) and GRAM4 (web service based). It confirms my feeling that web service based job submission such as GridSAM does not pick up enough users. The old fashion job submission just can not be retired at this moment.

Second, the Java web service core will be re-implemented to leverage state-of-the-art technologies. Apache CXF is selected as the web service development kit. See here for a very informative comparison between several popular latest the web service development kits including Apache CXF, Apache Axis2 and Metro (the JAX-WS RI). Apache CXF is favored over Apache Axis2 because Axis2 uses proprietary deployment model and lacks the support for IoC containers such as Spring Framework.

Third, globus.org will provide RFT (reliable file transfer) online service, which provides reliable, high-performance, end-to-end, fire-and-forget data transfer. globus.org has moved into the era of cloud computing!

Fourth, MDS (monitoring and discovery service) will be refactored to separate monitoring and service/resource discovery. Monitoring will be left to mature dedicated monitor softwares. MDS itself will focus on acting as a service registry. I certainly agree this decision since it favors the idea of separating concerns. The recently published GLUE2 spec still mixes up monitoring data and service metadata for discovery. It is interesting to see if the future development of GLUE will reflect the decision made on MDS.

Thursday, October 15, 2009

Jar Service Provider

Jar File Spec provides a simple service provider mechanism: a provider configuration file is located inside META-INF/services/, with the service interface or abstract class name as the configuration file name, which contains the list of implementing class names. Java 1.6 has a ServiceLoader class for looking up service providers.

It will be handy to define a well known interface, and to use the provider configuration file to list all implementations. Thus at runtime, ServiceLoader can be used to iterate all available service providers.

Monday, September 14, 2009

Set up HTTP Cookie in Java

In my Java programming, I need to set up some pre-defined HTTP cookie so that I can tell the website what content I prefer on using HttpURLConnection, for example. This is what I do:

// Create my cookie
HttpCookie cookie = new HttpCookie("cookie_name", "cookie_value");
// So that the cookie applies to all pages
cookie.setPath("/");

CookieManager cookieManager = new CookieManager();
cookieManager.setCookiePolicy(CookiePolicy.ACCEPT_ALL);

CookieStore cookieStore = cookieManager.getCookieStore();
cookieStore.add(uri, cookie);

// Set my cookie manager that contains my cookie to be used system-wide
CookieHandler.setDefault(cookieManager);

// From now on my cookie will be used for all connections to the web site denoted by uri.

Wednesday, August 26, 2009

Web Scraping in Java

There are at least three ways to do web scraping in Java.

First, "manually" use string matching and regular expression to extract information from downloaded HTML.

Second, use JTidy to transform HTML to XHTML, and then use XQuery (e. g., Saxon, ...) over XHTML to extract required information.

Third, which is what I prefer:

Create a TagSoup HTML parser, which provides an SAX interface;
Use XOM to build a DOM from HTML using the TagSoup SAX parser;
Use the built-in XPath query facility inside XOM (i.e., Jaxen) to parse the XOM DOM document.

A sample code skeleton looks like:

// Create a TagSoup SAX parser.
XMLReader parser = new org.ccil.cowan.tagsoup.Parser();

// Use the TagSoup parser to build an XOM document from HTML.
Document doc = new Builder(parser).build(new File("index.html"));

// Do some XPath query: find all "table" elements.
Nodes nodes = doc.query("//*[local-name()='table']");

Wednesday, July 08, 2009

Tomcat Admin Web App and JMX

Tomcat has provided an Admin webapp, which sits inside $CATALINA_HOME/server/webapps in order to access classes contained in Tomcat jars, to make it easy to configure webapps, for instance, to add a DataSource to a webapp.

As seen from its soource code, the Admin webapp simply creates JMX MBeans (managed bean), and save them to a MBean server. Tomcat MBean server then rewrites Tomcat server.xml and webapps/webapp/META-INF/context.xml.

The JMX Remote API specification details how an LDAP server can be used to store and retrieve information about JMX connectors exposed by JMX agents. JNDI is used to talk to an LDAP server.

MBeans can be viewed in the MBeans tab of jconsole.

Further digging on MBeans in Tomcat

All key constructs in Tomcat, such as Server, Engine, Host and Context, are implemented as MBeans, see package org.apache.catalina.core.
Since Tomcat makes use of Apache Commons Modeler to deliver the Model MBean support, mbeans-descriptors.xml (read by Apache Commons Modeler) appears in many packages that contain MBeans.
Tomcat uses the MBean server implementation provided by JVM.
The Server and Context MBeans support operations to store their configurations, which are delegated to the StoreConfig MBean that implements the logic of rewriting various Tomcat configuration files, see package org.apache.catalina.storeconfig in container/modules/storeconfig.
org.apache.catalina.storeconfig implements a StoreConfigLifecycleListener that registers the StoreConfig MBean right after Tomcat is started. StoreConfigLifecycleListener is configured in Tomcat server.xml.

The information above is based on Tomcat 5.5.27.

Tuesday, July 07, 2009

Troubleshooting Remote Java Application

Previously, I was using SSH+VPN to reduce the task of troubleshooting a remote Java application to the task of troubleshooting a local Java application. This is a generic approach, but has its own disadvantages. One of them is that sometimes the proper debugging/profiling/monitoring tools can not be run in remote machine due to the restraints in the deployment environment. In that case, we have to run tools locally and perform a genuine remote troubleshooting.

Remote jconsole

To start a Java application that supports remote jconsole is easy: just add the following into Java command line arguments:

-Dcom.sun.management.jmxremote.port=portNum -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false

Then we can ask jconsole to simply connect to hostname:portNum, given that a firewall is not set up on the remote machine.

jconsole uses JMX which is built on RMI. The portNum we specify is the port number used by the RMI registry. The actural RMI connection will be opened using another port that can not be specified as a Java command line argument. When the firewall on the remote machine is disabled, and the local jconsole is successfully connected to the remote Java application, we can use "lsof -p pid | grep TCP" to check which port is used by RMI.

See this and that for a programmatic approach to get jconsole through the firewall. It basically starts up a customised RMI registry that open a RMI channel on a pre-defined port, which is implemented as a Java Agent. Here is an almost "official" tutorial to achieve that.

Remote jvisualvm

First of all, according to VisualVM's document, VisualVM can retrieve monitoring information on remote applications but it cannot profile remote applications. VisualVM requires jstatd running on the remote machine. Since jstatd is based on RMI, so VisualVM suffers from the same issue as jconsole when facing firewall.

Remote YourKit

Maybe it is easier to set up YourKit Java profiler to troubleshoot a remote Java application. And YourKit is claimed to be so lightweight that it can be used when the application is running in the production mode.

YourKit provides some means to integrate with a remote JEE/servlet container, such as Tomcat, JBoss, WebSphere, WebLogic ... For Tomcat, the integration creates a startup_with_yjp.sh based on startup.sh, which simply adds the following magic Java options:

"-agentpath:YJP_HOME/bin/linux-x86-32/libyjpagent.so=disablestacktelemetry,
disableexceptiontelemetry,delay=10000,
port=16666,sessionname=Tomcat"

"port=portNum" can be used to specified a number instead of the default 10001. To connect the remote Java application with YourKit agent turned on, simply ask to connect to serverName:portNum in the local YourKit UI. Since all profiling data go in the specified port number, it is easy to set up an SSH tunnel if that port is blocked by firewall.

More investigation needs to be done to understand the profiling overhead given that setting.

Monday, June 29, 2009

More on Java Troubleshooting

JDK has provided better and better troubleshooting facilities, among other tools. Obviously they are extremely important to develop serious Java applications, which usually have to offer specified performance under resource restraint.

These tools include

jinfo pid : check Java command line options and system properties.
jstack pid : print thread's stack the their status (e.g., blocking on some object). This is great for knowing what's going on inside the Java application at runtime. And there is no need to set up JVM specially for this purpse.
jvisualvm : profile and monitor Java process. It seems JDK on Linux has jvisualvm by default. But you need to install jvisualvm separately on Windows.

as well as some other tools already mentioned before:

jconsole
jmap
jhat

Thursday, June 25, 2009

Clean up Harddisk before Dispose of Computer

After 3 years' working as a heater, my Sony VAIO laptop is finally retiring. There is one critical task that must be done before it can be disposed of: cleaning up its harddisk. Since I graduated from a computer lab that researched and developed one of the very first harddisks in China, I long know not only deleted files but also overwritten files can be recovered. Nevertheless, this blog as well as its comments still give me a lot of information. There are some useful points:

On Linux, shred can be used to clean up disk. By default it overwrites the harddisk for 25 times. Call it like this: shred -vz /dev/hda (-z: finally overwrite with all zero to disguise the shred process.)
Or, use DBAN, which is used by many governments. DBAN stands for "Darik's Boot And Nuke". The software is a boot CD/DVD image that is used to boot the computer and then do the cleaning up.
Here is the seminal paper describing the theory behind securely deleting harddisk data.

It will be useful to know how long it takes for shred or DBAN to process a harddisk of a certain size.

On my Sony VAIO VGN-SZ3XWP/C, shred takes about 3.5 hours to 4 hours to run a random pass of a 39 GiB disk partition! A random pass overwrites the disk with random bits. Not sure how random it is. The 1st and the 13th (possibly the last one, the 25th pass) passes are random passes. The other passes use different short fixed pattern to overwrite. Each costs about half an hour.

Further updates for running "shred -vz /dev/sda" on a 94GiB harddisk. The 1st, 13th and 25th passes are using random data generated from /dev/urandom. The other passes write fixed but different bit patterns. It takes 49h33m26s to finish.

Monday, June 22, 2009

IDE for C++

I may work on a C++ project in future. So I am looking for an IDE for C++. I know A good IDE can double the productivity of programming.

Unlike Java, C++ is not cross platform and has many variants though it does have a standard. Thus which IDE to choose is dependent on which C++ to use.

On Windows, Microsoft Visual Studio is the choice for Visual C++ :)

I would like to choose Eclipse CDT for GNU C++. CDT is only IDE. To build and debug C++ code, CDT requires external toolchain, e.g., GNU toolchain. CDT on Liunx will automatically pick up GNU toolchain, which usually is there. But to make CDT on Windows work, we need to install MinGW (Minimalist GNU for Windows). Here is a good tutorial on installing MinGW. Once MinGW is properly installed and added into PATH, CDT can pick up MinGW as a toolchain automatically.

Friday, June 19, 2009

Troubleshooting OutOfMemoryError

When a large running Java application throws out an OutOfMemoryError, it indicates either the existence of a memory leak bug, or simply the fact that the maximum heap size has been reached. Don't panic, it is straightforward to do troubleshooting.

Most importantly, DO NOT SHUTDOWN the problematic JVM. Keep the crime scene.

Use jmap to dump the heap. Under JDK 1.5, the dumped heap, in the format of hprof, is always put under the home directory and under the name "heap.bin". In addition, jmap can also be used to show the heap object histogram, a quick way to see the classes occupying the most space.
Use jhat to analyze the heap dumped by jmap. If the heap is dumped by a JDK 1.5 jmap, invoke jhat (that is only available in JDK6+) in this way: jhat -J-mxNm -stack false heap_dump_file. "-stack false" turns off tracking object allocation call stack because the allocation site information is not available in the heap dump. N should be larger than the maximum heap size used when running the problematic Java process. jhat takes quite some time to analyze the heap dump.
jhat starts up a web server after finishes analyzing the heap dump. Now we can use a browser to point to the jhat web server, and find out who use up all the memory.
At the very end of the front page, click the link "Show heap histogram". it takes quite some time to generate the histogram.
In the histogram, the classes whose instances occupy most of the heap can be easily identified. Obviously they are suspects of the crime.
Clicking one of the suspects brings us to the page showing the referers of the suspect. In this way, we can track down which part of our code hold references to objects that use up heap. Now by using our knowledge of the program logic, the problem cause can be finally located.

See, the method is staightforward. All we need is patience and a good understanding of the source code in order to find the problem cause.

Friday, June 12, 2009

Direction+

I am working on a web app, Direction+, running on Google App Engine for Java. Yes, it is another direction service that suggests a driving route based on your chosen source and destination. But it has some add-on features as its name suggests: Direction Plus.

First, you can personalise Direction+ by providing your car's information, in particular, the fuel consumption measurement of your car, for urban cycle, extra-urban cycle and combined cycle. Thus, when calculating the route, Direction+ is also able to calculate how much fuel you will use and how much it costs (based on the fuel price you set). The fuel consumption calculation is measured on each individual step of the route. For instance, driving on a motor way will be much more fuel-efficient than driving in the city center. Isn't it good to know the cost of the driving before hand? In addition, the calculation is done in your own browser and your personalisation is stored in your browser as a cookie. So there is no privacy concern at all.

Second, Direction+ is a mashup that combines Google Maps Service and BBC travel news service together. When a driving route is calculated, the travel news, e. g., accidents news, along the route are displayed as well on the map. Isn't it good to know the traffic situation before driving? Currently too many travel news are returned especially when the route is very long, efforts are being made to improve the algorithm of filtering the not-so-relevant news.

How quick is Direction+ updated with BBC travel news? Please see its status page. A complete update takes about 30 minutes. For instance, at the time of writing, the last update took 26 minutes to finish. It read 2297 news from BBC, among which only 50 happened after the previous update, and among the 50 news that newly happened, 12 have cached coordinate in the data store. So though the number of news is large, Direction+ manages to reduce the communication to a small amount.

One of the tricky things about GAE for java is that every servlet request must be served within 30 seconds. So the update of news is implemented in an incremental and on-the-fly way:

Incremental: the complete update is divided into many small steps, and each step can be finished in a controlled time slot;
On-the-fly: the news updated in each small step is available immediately, long before the complete update is done.

Direction+ is designed for UK users because I live in UK. Enjoy it and let me know what you think!

Monday, June 08, 2009

Monitoring/Profiling/Debugging Java Process Remotely

Usually there are two difficulties involved:

How to communicate between the remote (target) machine and the local (console) machine? This varies per monitoring/profiling/debugging method. E.g., the built-in JMX console (jconsole) uses remote RMI.
How to penetrate the firewall if existing?

The generic approach is:

Know how to do it locally;
Run a vncserver on the remote machine (In fact, vncserver is lightweight, and easy to install in case there is no vncserver installed on the remote machine);
Use ssh to set up a tunnel between the remote machine and the local machine, e.g., ssh -L5901:remote_machine:5901 userid@remote_machine (ssh is very user-friendly w.r.t. firewall);
Start a vnc verview locally, pointing to localhost:1, and follow the procedure established in step 1.

In this way, remote monitoring/profiling/debugging tasks are doable following a step-by-step procedure.

Of course, sometimes, there is no need to set up a vncserver as long as the communication between the remote machine and the local machine is done in TCP and the port number is known.

Friday, May 29, 2009

Logging in Java

Yes, this is a very basic issue in Java programming, but sometimes it can be really confusing to find where the logs are and how to declaratively configure what to log.

Log4j is widely used and is configured using log4j.properties. But some programmers choose to use Apache commons logging instead of using log4j API directly. Commons logging is a thin-wrapper for other pluggable logging tools, such log4j and Sun logging facility. The configuration guide of commons logging describes a five step procedure to find the underlying logging mechanism, among which, step 3 says:

"If the Log4J logging system is available in the application class path, use the corresponding wrapper class (Log4JLogger)."

In other words, usually even commons logging API is used in the program, the actual logging service is provided by log4j. Thus log4j.properties is used to configure how logging should be done.

Another logging configuration file is logging.properties, which is used by java.util.logging.

Thursday, May 28, 2009

Best Java Decompiler

Recently I am working on a UK e-Science project which uses BPEL to orchestrate web services and submits computational jobs to a condor pool. The source code of some of those web services is missing (great!), so I have to decompile the class files in order to make necessary changes.

In my pursuit of the "best" java decompiler, I first tried JD. It is free. But in my use case, it generates some Java code that can not be compiled. Then I tried DJ. It is good, but it only allows for 10 free trials. Each invocation of DJ is considered to be one trial. After that you have to purchase it.

I hadn't been very happy until I found Jad. Jad did my job. And it turns out that Jad is behind many Java decompiler GUIs such as DJ, Cavaj, and JadClipse, an Eclipse plugin. I am quite happy with Jad even without a GUI to drive it.

It is interesting to see JD and Jad were implemented in C++. Won't it be better for Java programmers to have a Java decompiler written in Java?

Monday, May 04, 2009

Subversive Problem

I am using Subversive - the Eclipse subversion plugin - 0.7.7 plus its subversion connector 2.1.0.

Today an exception jumped out of nowhere, making all my subversion based projects unable to synchronise with the server.

java.lang.NoSuchMethodError: org.eclipse.team.svn.core.connector.SVNChangeStatus.(Ljava/lang/String;Ljava/lang/String;IJJJLjava/lang/String;IIIIZZLjava/lang/String;Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;JZLjava/lang/String;Ljava/lang/String;Ljava/lang/String;JLorg/eclipse/team/svn/core/connector/SVNLock;JJILjava/lang/String;ZZLorg/eclipse/team/svn/core/connector/SVNConflictDescriptor;)V
at org.tigris.subversion.javahl.ConversionUtility.convert(ConversionUtility.java:287)
at org.tigris.subversion.javahl.ConversionUtility$4.doStatus(ConversionUtility.java:146)
at org.tigris.subversion.javahl.SVNClient.status(Native Method)
at org.polarion.team.svn.connector.javahl.JavaHLConnector.status(JavaHLConnector.java:406)
at org.eclipse.team.svn.core.extension.factory.ThreadNameModifier.status(ThreadNameModifier.java:606)
at org.eclipse.team.svn.core.utility.SVNUtility.status(SVNUtility.java:330)
at org.eclipse.team.svn.core.utility.SVNUtility.getSVNInfoForNotConnected(SVNUtility.java:803)
at org.eclipse.team.svn.core.SVNTeamProvider.uploadRepositoryResource(SVNTeamProvider.java:241)
at org.eclipse.team.svn.core.SVNTeamProvider.connectToProject(SVNTeamProvider.java:172)
at org.eclipse.team.svn.core.SVNTeamProvider.getRepositoryResource(SVNTeamProvider.java:71)
at org.eclipse.team.svn.core.svnstorage.SVNRemoteStorage.loadLocalResourcesSubTreeSVNImpl(SVNRemoteStorage.java:628)
at org.eclipse.team.svn.core.svnstorage.SVNRemoteStorage.loadLocalResourcesSubTree(SVNRemoteStorage.java:521)
at org.eclipse.team.svn.core.svnstorage.SVNRemoteStorage.getRegisteredChildren(SVNRemoteStorage.java:273)
at org.eclipse.team.svn.core.synchronize.AbstractSVNSubscriber.resourcesStateChangedImpl(AbstractSVNSubscriber.java:212)
at org.eclipse.team.svn.core.synchronize.AbstractSVNSubscriber.resourcesStateChanged(AbstractSVNSubscriber.java:169)
at org.eclipse.team.svn.core.svnstorage.SVNRemoteStorage$3.runImpl(SVNRemoteStorage.java:152)
at org.eclipse.team.svn.core.operation.AbstractActionOperation.run(AbstractActionOperation.java:77)
at org.eclipse.team.svn.core.operation.LoggedOperation.run(LoggedOperation.java:38)
at org.eclipse.team.svn.core.utility.ProgressMonitorUtility.doTask(ProgressMonitorUtility.java:104)
at org.eclipse.team.svn.core.utility.ProgressMonitorUtility.doTaskExternal(ProgressMonitorUtility.java:90)
at org.eclipse.team.svn.core.utility.ProgressMonitorUtility$1$1.run(ProgressMonitorUtility.java:60)
at org.eclipse.core.internal.resources.Workspace.run(Workspace.java:1800)
at org.eclipse.team.svn.core.utility.ProgressMonitorUtility$1.run(ProgressMonitorUtility.java:58)
at org.eclipse.core.internal.jobs.Worker.run(Worker.java:55)

The only possible reason that I can think of is I accidentally copied some ".svn" directories into some non-subversion-based project.

On eclipse.technology.subversive Newsgroup, there is a discussion on the same exception, but happen on Subversive 0.7.8.

I am very lucky since after I changed the subversion connector in use from Native JavaHL to SVN Kit, the problem seems gone! Otherwise, all my development work has to stop!

Monday, April 27, 2009

MVC Practice in Browser Application

I am working on a browser application, called Direction+. I am trying to enforce the MVC pattern in order to achieve better code manageability. Basically, it is all about separating concerns.

View

All presentation goes into HTML code. Better no HTML code generated in JavaScript. Furthermore, style is separated into CSS. For the concern of performance, CSS could be embedded into HTML to save a pair of request and response.

Model

The states should be kept in JavaScript structs, objects or variables. HTML markups SHOULD NOT be used to store any state other than user input. Using HTML markup for non-user-input state, for instance, a hidden field in form, is a necessary communication mechanism between browser and web server, but not necessary in a browser application.

By keeping all states in JavaScript, it can be assured to only have a single copy of state in a centralised place and the HTML view is purely a presentation of the state.

Controller

Business logic is implemented in JavaScript. A minimal set of code, which are basically event handlers, are embedded into HTML, as entry points into the business logic.

JavaScript should be written in an object-oriented style.

Wednesday, April 22, 2009

Override Tomcat Session Cookie

Tomcat uses HTTP cookie to track browser sessions. By default Tomcat 5.5 generates session cookies without an expiration date (Expires=...), like:

Set-Cookie: JSESSIONID=A39F8F3623D20EF9E66D309E298E87E0; Path=/

using Cookie.setMaxAge(-1).

Without an expiration date, this cookie should be deleted by the browser when it is closed, which is what IE7 does. Thus, even the session has a lifetime, say 12 hours, at the Tomcat side, if the browser was restarted, the session would be lost.

Firefox keeps the session cookie when it restarts.

I use the following code to override this behavior:

// after users log in
// HttpServletResponse response
response.setHeader("Set-Cookie", "JSESSIONID=" + request.getSession().getId()
+ "; Expires=" + getCookieExpiresFormat().formatByAge(age)
+ "; Path=/");

It generates something like

Set-Cookie: JSESSIONID=A39F8F3623D20EF9E66D309E298E87E0; Expires=Thu, 22-Apr-2010 20:07:56 GMT; Path=/

Thus the session can be kept live for any time period even when browser restarted.

Tuesday, April 21, 2009

JavaScript and Multi-Threading

Several internet sources, such as this, suggest that JavaScript in browser, runs in a single thread that is also responsible for updating browser UI; and JavaScript code is triggered by events. If this is true, it is not a surprise to see why AJAX has to be asynchronous, otherwise network IO will block the browser UI, i.e., make the UI irresponsible.

There were/are several activities to make JavaScript multi-threaded. For instance, the popular JavaScript toolkit Dojo has some support for multi-threading. And there is even a paper talking about multi-threading in JavaScript.

JavaScript now really becomes a serious programming language. Not only there are a lot of libraries/frameworks available so that programmers do not need to write everything from scratch, but also there are a very good programming support, such as profiling (YUI), unit testing (YUI), logging (YUI), debugging (Venkman for Firefox and for Microsoft Script Editor for IE).

Monday, April 20, 2009

Browser Application

I am working on a browser application, which I already made some good progress and will put online after it is refactored in object oriented JavaScript.

What I called browser application should have the following three characteristics.

A browser application's runtime environment consists of a web browser and internet only. So if you can surf the net, you can run it. Of course nothing prevents it from being deployed on a web server for people to access. Note. in that case, the web server is not required to have any extra support except static HTTP hosting.
A browser application uses AJAX to provide functionalities. With so many powerful AJAX APIs around on the internet, a browser application can have some amazing functionalities.
A browser application can easily be a rich internet application given that now JavaScript, as well as other client side technologies, have already become so rich in terms of presentation capability.

In summary, a browser application is a browser (only) based rich internet application using AJAX.

Monday, April 06, 2009

Check Java Thread CPU Usage

Today I came across a question: if an application occupies 100% CPU time, and its source code is extremely large so that reading its source code to find what is going on may not be an option; what should we do to find out the problem?

Let's assume the application is written in Java.

First, we can use jstack combined with jps to print out threads' stack trace, which gives us a good idea about which methods are being executed.

We can also use jconsole with some plugin to display threads' CPU usage. The jconsole plugin is based on JTop (

/demo/management/JTop).

Add to Version Control before Commit

I am using Eclipse 3.4.2 Ganymede plus Subversive 0.7.7. If I try to commit some newly added files, i.e., they are not under subversion control yet, Eclipse simply becomes irresponsive.

New files MUST be added into version control first, before they can be committed.

Monday, March 09, 2009

Invalid Virtual Machine

Today I tried to add a VMware VM created by a VMware workstation to my VMware Server 2 on Windows, but failed because it was considered to be invalid. VMware server told me "either the vmx file is corrupted or the VM was created using a newer version of VMware product." Seems both were not the case.

However, it turns out the vmx file could really be "corrupted". In vmx, the first line is

.encoding = "windows-1252"

After change it to be

.encoding = "UTF-8"

The VM is no longer considered to be invalid.

Then after I changed the "numvcpus" from 2 to 1, the VM is able to run on my VMware server, which unfortunately only has one single core CPU.

But interestingly, the original VM can run on a VMware server 2 on Linux.

I also need to change VM's network to be NAT instead of Bridged, and restart VM, to enable networking.

The guideline is that the virtual hardware for a particular Guest OS, such as the number of CPUs and the network type, has to be compatible with the hardware spec of the VMware server installation. For instance, you can not run a 2-CPU VM on a VMware server supporting only one CPU; and if there is no DHCP on the bridged network, then not surprisingly the VM can not acquire an IP if using the bridged network.

Wednesday, March 04, 2009

JMeter

Apache JMeter is a GUI that allows to compose a performance test without writing a single line of code! Basically what needs to do, in order to implement a performance test, is to properly insert test components such as "thread group, "loop control", "http requests" and "performance data graph", into the test plan. Of course the parameters of these test components should be specified, for instance, how many iterations of the http request are wanted. Then simply start it and a curve as well as data is presented!

With the help of JMeter, I am able to tell in a very simple test, when 27 urlrewrite rules are enabled, the OMII-UK website has a throughput of 3030.915 per minute, i.e., 19.796 ms per request; when rules disable, the throughput is 3005.259 per minute, i.e., 19.965 ms per request. In other words, in this setting, where urlrewrite 2.6 is used, urlrewrite rules pose a 0.9% slowdown. Though the credibility of this particular test result needs to be established.

A Portlet Class Loading Scheme

According JVM Spec 5.3 Creation and Loading, there are two situations for a class D to load or to initiate loading another class C: either there are references to C in D's constant pool, or D creates C using reflection. In either case, if D was defined by a user-defined class loader, then that same user-defined class loader initiates loading of C.

Let's assume the portal framework is a web application, and it has the following directory structure:

WEB-INF/
    classes/
    lib/
    portlets/
            portlet1/
                    classes/
                    lib/
            portlet2/
                    classes/
                    lib/

The web application class loader W will create a portlet class loader P for each portlet, that has W as its parent class loader and is responsible to load classes from portletx/classes and portletx/lib. W will use P to instantiate a portlet entry point using reflection. From the portlet entry point, all required classes will be then loaded by P, but still following the delegation model. It means W will try to load class first, only after that fails, P will load.

This scheme does not violate Servlet Spec 2.4, which states "The Web application class loader must load classes from the WEB-INF/ classes directory first, and then from library JARs in the WEB-INF/lib directory. Also, any requests from the client to access the resources in WEB-INF/ directory must be returned with a SC_NOT_FOUND(404) response." It does not prevent loading class from elsewhere.

The advantages of such a scheme are:

It takes the view that the portal framework is a web application; its portlets are just its components instead of web applications themselves.
It respects the class loading scheme of the web application server. The portal framework jars can be kept inside the web application, and the portlets share those jars naturally.
It also facilitates deploying and undeploying portlets on the fly, because the control is purely inside the portal web application.

Friday, February 27, 2009

Embed OSGi to Monolithic Application as a Dynamic Plugin Mechanism

Eclipse Equinox is used as OSGi framework.

Run Equinox as a standalone application

Read Equinox tutorial.

C:\eclipse\plugins>java -jar org.eclipse.osgi_3.3.0.v20070530.jar -console

osgi> install file:d:\documents\temp\plugins\org.silentsquare.osgi.test.bundle.h
elloworld_1.0.0.jar
Bundle id is 1

osgi> ss

Framework is launched.

id      State       Bundle
0       ACTIVE      org.eclipse.osgi_3.3.0.v20070530
1       INSTALLED   org.silentsquare.osgi.test.bundle.helloworld_1.0.0

osgi> start 1
Hello World!

osgi> ss

Framework is launched.

id      State       Bundle
0       ACTIVE      org.eclipse.osgi_3.3.0.v20070530
1       ACTIVE      org.silentsquare.osgi.test.bundle.helloworld_1.0.0

osgi> close

Goodbye World!

Startup Equinox programmatically

See this blog article: "Starting Equinox from a Java application".

With org.eclipse.osgi_3.x.x.jar on classpath, which is just an ordinary jar file plus extra OSGi information, the code looks like

public class App {
   public static void main(String[] args) throws Exception {
       String[] equinoxArgs = {"-console"};
 BundleContext context = EclipseStarter.startup(equinoxArgs,null);
 Bundle bundle = context.installBundle("http://www.eclipsezone.com/files/jsig/bundles/HelloWorld.jar");
       bundle.start();
 for (Bundle b : context.getBundles()) {
     System.out.println(b);
 }
   }
}

The next step would be to build Grimoires XMLView over OSGi and implement each pair of translators as OSGi bundle. But classloading could be tricky: translators need to use classes in Grimoires. How?

What OSGi Can Do For Me?

Two interesting usages of OSGi could be explored in my development context.

One is to use OSGi to componentize an existing complex application. Consider the OMII-UK website. As a web application, it has 3 logical functional units: the old repository that holds projects and releases and is almost obsolete, the new web front that also reads information from database about projects and releases, and the wiki based on JSPWiki. It will be nice to break them into 3 OGSi bundles to take advantage of the dynamics of OSGi bundles.

The other is to use OSGi to implement a dynamic extension framework for an originally monolithic application. In Grimoires, there is an XMLView interface that allows publishing any application domain specific service descriptions as long as they are in XML. Inside the implementation of XMLView, there is a pair of translators for each domain-specific description, translating to and from Grimoires' UDDI+WSDL+Metadata service description model. It will be good that we build XMLView over OGSi framework, and then each pair of translators will become an OSGi bundle. So if we want to support a new type of domain-specific service description, we add a corresponding bundle without bringing down Grimoires. If later this support is no longer needed, we remove the bundle without bringing down Grimoires.

These can be done because OSGi is claimed to be humble: it does not take over the whole JVM, and one Java application can even host multiple OSGi frameworks.

According to OSGi,

If you are developing software in Java then OSGi technology should be a logical next step because it solves many problems that you might not even be aware can be solved. The advantages of OSGi technology are so numerous that if you are using Java, then OSGi should be in your tool chest.

This seems reasonable because componentization looks like a natural approach towards software complexity.

Thursday, February 12, 2009

Google Protocol Buffer

Google Protocol Buffer can be used to serialise and deserialise structured data, either to/from a file, or to/from network. Protocol Buffer acts as a (network) protocol parser generator. It can generate a parser that understands a specific protocol format. Of course this can be implemented using a general purpose parser generator such as ANTLR.

Several existing solutions are available to address the same problem.

The problematic Java Serialization.
The DIY approach. It is surely painful in particular for complex data structure.
Serialise data to XML. In Java, this is supported by JAXB. The advantages of using XML as transmission format are that, XML is language independent, is platform independent, and is human readable. The disadvantage is the associated performance cost.

According to the Protocol Buffer's tutorial for Java, it has several selling points:

It has bindings for Java, C++, and Python. Thus the data can be transmitted between Java, C++, and Python applications.
Its performance is expected to be good, because not only it is not based on XML, but also it can boost the performance by not using Java reflection.
As long as following certain rules, when the protocol is updated, the new code is compatible with the old code.

Tuesday, February 03, 2009

java.lang.NullPointerException

In one of my previous blogs, I mentioned a map of downloads based on Google Maps API. To create such a map, first I collect IP addresses from where the downloads were initiated, then I use GeoLite City to translate the IP address to a geographic location on the Google map. GeoLite City is a free library with Java API, but of less precision than its commercial counterpart.

It was working fine for many month until recently it threw out the infamous java.lang.NullPointerException. NullPointerException might be the No.1 runtime exception. It is certainly for me.

A little investigation explains why NullPointerException happened. When I passed an IP address to the GeoLite City library, I naively assumed that GeoLite must be able to translate it into a geographic location, thus I did not check whether the Location object returned by GeoLite is null. So I was pulished. In fact, IP address allocation is dynamic, and an outdated GeoLite City library may not contain all the IP address information.

It was not rare I made such a naive mistake. The fundamental principle to avoid NullPointerException is that, when given an object reference, no matter it is from my own code or from outside my code, thinking defensively, reason the possibility that it may be null. To play safe, always check nullity.

An object reference could be given as a returned value of a method invocation, as method arguments when inside a method, or as a field member of a non-null object.