Weijian's Technical Notes: 2008

Wednesday, December 17, 2008

Parallel Array Expected in Java SE7

Two Year in Review articles on JavaWorld, "Java in 2008" and "What to expect in Java SE7", are worth reading.

Among the new features expected in SE7, which is due on early 2010, the most surprising one to me is the parallel processing support. Though it is also the one I am mostly looking forward to since I get my PhD working on the topic of parallel computing and cluster computing in Java.

The parallel processing support not only provides a fork and join computing paradigm that might be tuned into a MapReduce mode, but also supports parallel array. Parallel array is a long existing feature in parallel computing languages such as High Performance Fortran. Inspired by HPF, HPJava supports parallel array as well.

The idea of parallel array is quite simple. It represents a large-sized data array, and the array is partitioned to many parts, with each part allocated to a single process. In such a way, each process can work on its own partition of the large array in a parallel way with regard to others, thus achieving speedup.

However, it is still under consideration whether the parallel array will be included as part of the JDK in Java SE 7 or whether it will be released as an external library.

A parallel array code example looks like:

// Instantiate the ForkJoinPool with a concurrency level
ForkJoinPool fj = new ForkJoinPool(16);
Donut[] data = ...
ParallelArray donuts = new ParallelArray(fj, data);

// Filter
Ops.Predicate hasSprinkles = new Ops.Predicate() {
 public boolean op(Donut donut) {
   return donut.hasSprinkles();
 }
};

// Map from Donut -> Integer
Ops.Predicate daysOld = new Ops.ObjectToInt() {
 public int op(Donut donut) {
   return donut.age();
 }
};

SummaryStatistics summary =
 orders.withFilter(hasSprinkles)
       .withMapping(daysOld)
       .summary();

System.out.println("with sprinkles: " + summary.size());
System.out.println("avg age: " + summary.average());

Friday, December 12, 2008

Use Eclipse RAP to Build Rich Internet Application

It is a pity that I had not heard of Eclipse RAP (Rich Ajax Platform) when last year I developed Grimoires Browser, an Eclipse RCP based client for the Grimoires Service Registry.

RAP has joined the family of Eclipse UI toolkits, as existing ones such as SWT for Windows, SWT for Liunx, and SWT for Mac. In some sense, RAP can be considered as SWT for Web.

RAP is ued to develop Rich Internet Application (RIA). To develop an RAP-based web application, programmers are still working in Java, in SWT, and in JFace. And RAP makes sure the UI is built on top of HTML, JavaScript and Ajax. Remind you something? Yes, GWT. There are some very good introductory articles [1, 2] about RAP on IBM Developer Works.

It is claimed that in a minimal effort, we can transform an existing RCP application to RAP; or we can even build an application which is largely independent of RAP or RCP. So it can run as both a web application and a desktop application.

No doubt, RAP is the way to go to make Grimoires Browser a web application. It will then totally relieve users the pain to install some software on their machine in order to access Grimoires.

Thursday, December 11, 2008

My Safe

My Safe is an Eclipse RCP based software I developed last year. As suggested by its name, it is a secure file manager.

It is easy to use. You must input the correct password in order to enter My Safe. Inside My Safe, you can see all directories and files as they originally are. But these directories and files are encrypted in the file system. And they are decrypted on demand in My Safe's interface. The password you use to enter My Safe is the key for encryption and decryption. AES is used as the cipher.

I want to argue that such software as My Safe is necessary even on our personal desktops or laptops. So that other people may possess our computer but can not possess our data.

My Sony laptop does have a secure drive called My Safe as well, but it is not perfect. It is a proprietary technology. It means the data is tightly bound to the Sony laptop. So if the software is broken, there is no way for me to recover the encrypted data. And I can not decrypted the encrypted data on another computer.

Therefore I developed my very own My Safe. And it turns out to be very useful!

Yahoo User Interface

Programming in JavaScript was very painful. You have to make sure your code is compatible with all major browsers such as IE and Firefox. Now thanks to well-developed JavaScript libraries, JavaScript programming is much more pleasant.

JavaScript libraries, such as Yahoo User Interface (YUI), jQuery, Dynamic Drive, and Dojo, not only deal with cross browser compatibility issues, but also make common tasks easy.

I have used several components of YUI in developing the OMII-UK website:

CSS Reset (neutralizes browser CSS styles)
CSS Base (applies consistent style foundation for common elements)
CSS Fonts (foundation for typography and font-sizing)
CSS Grids (page layout)
The YAHOO Global Object (base requirement for all YUI components)
DOM Collection (convenience methods for DOM interactions)
Event Utility (event normalization and custom events)
Button
TabView

BTW, browsershots.org allows you to see how your website looks like in tens of browsers.

Classloading of GridSphere

JSR-168 portlet applications developed for GridSphere 3.1 has a $CATALINA_HOME/webapps/$PORTLET_WEBAPP/WEB-INF/lib/gridsphere-portletservlet-3.1.jar, which contains a single org.gridsphere.provider.portlet.jsr.PortletServlet class. Originally I thought this class is intended to override the class with the same name in $CATALINA_HOME/shared/lib, according to the rules of Tomcat class loading, by some "clever" Java programmer. Later I found out this class is exactly the same class as in $CATALINA_HOME/shared/lib. So though it replaces the one in $CATALINA_HOME/shared/lib, but it changes nothing. In fact, it does change one thing: the defining class loader.

So if removing $CATALINA_HOME/webapps/$PORTLET_WEBAPP/WEB-INF/lib/gridsphere-portletservlet-3.1.jar, running GridSphere will raise such an exception:

16825:ERROR:(PortletServlet.java:loadJSRPortletWebapp:102)
Unable to create jsr portlet instance: org.gridsphere.gsexamples.portlets.HelloWorld

java.lang.ClassNotFoundException: org.gridsphere.gsexamples.portlets.HelloWorld
at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:164)
at org.gridsphere.provider.portlet.jsr.PortletServlet.loadJSRPortletWebapp(PortletServlet.java:87)
at org.gridsphere.provider.portlet.jsr.PortletServlet.service(PortletServlet.java:182)

So basically it complains it can not find the portlet class. Let me explain why.

First org.gridsphere.provider.portlet.jsr.PortletServlet is defined in the portlet web.xml as the generic servlet. Not surprisingly, the portlet application is just a web servlet application in GridSphere.

Second, on line 87 of PortletServlet, there is
Portlet portletInstance = (Portlet) Class.forName(portletClass).newInstance();

It says to use the defining class loader of the current class, i.e., PortletServlet, to load the portlet class. So if PortletServlet is inside $CATALINA_HOME/shared/lib, then there is no way to find the portlet class. PortletServlet has to be put inside $CATALINA_HOME/webapps/$PORTLET_WEBAPP/WEB-INF/lib/.

I am not very convinced that this is a very elegant solution towards portlet classloading. Somehow I have a feeling that it might be better if the portal and portlets stay in the same web application, and the webapp classloader acts as the parent of portlet classloaders. Thus, at least, classloading of portlets is not affected by Tomcat classloading rules.

Wednesday, December 10, 2008

ABC of P2P

On 2007, the top two largest P2P networks (protocols) are BitTorrent and Gnutella. BitTorrent was increasing steadily.

The majority of Gnutella users are using LimeWire. Implemented in Java, LimeWire has a basic version which is open source. It supports BitTorrent as well.

Vuze, originally named Azureus, is a BitTorrent client. Also implemented in Java, Vuze (Azureus) is the No. 2 most popular download in SourceForge.

It is worth mentioning TorrentRelay. It is a purely web-based BitTorrent client. It does not require any software installed on your PC. The BitTorrent downloading is done at the TorrentRelay website. But as a free user, you can not download any file whose size is larger than 800MB.

Implement Principles of Modern OS inside Java Application

There are some architectural similarities between a modern OS and an OSGi-based Java application.

For instance, there is a kernel in OS, which provides services to multitasking processes. A shell is there after logging in.

In an OSGi-based Java application, the OSGi acts as the kernel providing fundamental services to bundles. A shell is there after starting up OSGi. Of course multitasking is available in the form of multithreading.

In OS, for the safety and security reason, processes are running in their own memory space.

In OSGi, bundles run in their own code (class) space.

In OS, SELinux can be used to enforce security policies.

In Java, SecurityManager can be used to enforce security policies.

It seems to me that as a trend, principles of modern OS are being implemented in complicated, component-based Java software.

Tuesday, December 02, 2008

Access UK NGS

UK NGS provides national wide grid service. The following is what I have done in order to access NGS facilities.

Apply for a UK e-Science certificate.
Apply for an NGS account.
Install Globus Toolkit 4.2.1. export $GLOBUS_LOCATION=/usr/local/globus-4.2.1; configure --prefix=$GLOBUS_LOCATION; make; make install Because my linux box is running RHEL5, I chose to install from source to make use of openssl 0.9.8. Before I installed GT, I also installed a perl-XML-parser which had been missing. The compilation of GT from source took a couple of hours. So be patient.
Install my UK e-Science certificate: see the section below the heading "Installing your e-Science Certificate and Private Key".
Install UK e-Science CA certificate and signing policy. On the NGS CA certificate page, download CA certificate files 367b75c3.0 and 98ef0ee5.0, and signing policy files 367b75c3.signing_policy and 98ef0ee5.signing_policy. Save them into $HOME/.globus/certificates.
Set up running environment. source $GLOBUS_LOCATION/etc/globus-user-env.sh
Create a proxy certificate. grid-proxy-init -verify -debug This should not display any error message. To display the current proxy certificate: grid-proxy-info. To destroy the current proxy certificate: grid-proxy-destroy
Use gsissh to connect to NGS nodes. gsissh -p 2222 ngs.rl.ac.uk It uses the proxy certificate, so does not prompt for username and password.
Upload the proxy certificate to the MyProxy server. myproxy-init -s myproxy.grid-support.ac.uk -l wjfang Use "-l" to specify a unique username used in the myproxy server. During the execution, you will be prompted to enter a MyProxy pass phrase. Together with the username, the pass phrase allows to access the proxy certificate stored in the MyProxy server.
Now I can log into NGS Application Repository even on a different machine using my MyProxy username and pass phrase.

Monday, November 24, 2008

OSGi: a Dynamic Component System for Java

I know OSGi since I finally adopted Eclipse as my IDE for Java four years ago. Recently, two things re-ignite my strong interests in OSGi.

One is GridSphere. GridSphere is a portal framework implementing JSR-168, running on Tomcat. I am evaluating some portlets running in GridSphere. I found out deploying both GridSphere and portlets in GridSphere need to restart the JVM. It is clear to me this is not a desirable feature that an enterprise software should have. The selling point of the portal framework is that the portlets (i.e., the components) can be developed asynchronously and distributed, while can work in harmony in a portal. Let's say we have a portal that contains 25 portlets. Each portlet each year releases two major updates and two security patches. If we need to restart the portal every time when we update a portlet, then we need to restart the portal 100 times every year. This is not a good news to the administrator. Clearly we should have better support for components in architecting server-side software. At least, we should allow individual component be deployed and undeployed dynamically without affecting the remaining part of the system. By the way, 10 years ago in an interview with Huawei, I was asked the question how to patch a software when it is still running.

The other is SpringSource dm server, which uses OSGi. It seems to me OSGi is the answer to the architectural challenges in any complex, server-side, enterprise, or component-based software.

A further look at the OSGi website reveals that OSGi is behind many web application servers and J2EE servers: IBM Websphere, SpringSource Application Server, Oracle (formerly BEA) Weblogic, Sun's GlassFish, and Redhat's JBoss. And not surprisingly, the top two showcases for adopting OGSi are Eclipse and Spring.

Adopting OSGi is claimed to have many benefits, including dynamic update of bundle, security, support for dependencies and versioning, simple API and small footprint. To achieve those, OSGi has the module and service concepts deep in design.

In OSGi, the sharing between modules, like importing jars from other modules and exporting own jars for other modules to use, must be declared explicitly. By default, nothing is shared.

There is a service registry in OSGi. A service can be implemented using any POJO. While the OSGi Alliance publishes the Compendium specifications, which define a large number of standard services, from a Log Service to a Measurement and State specification.

In terms of the implementation, the two most popular ones are Apache Felix and Eclipse Equinox. My next step would be doing some hands on exercise with either of them.

Friday, November 21, 2008

Tweak the Delegation Model of Java Class Loading

The Java class loader architecture [1, 2] affords a Java developer a tremendous amount of flexibility in the way that an application is assembled and extended. Basically, class loaders form a hierarchy where the root is bootstrap, who is the parent of sun.misc.Launcher$ExtClassLoader, who is the parent of sun.misc.Launcher$AppClassLoader. When a class loader is asked to load a class, it will first delegate the request to its parent. This is why it is called the delegation model.

But, occalsionally, we do need to tweek this delegation model in some way that is more suitable to our requirement. A good example is in the implementation of Java Servlet Specification. For instance, in Tomcat 6, the web application class loader attempts to load its classes before delegate the request to its parent, the common class loader.

The delegation model is implemented in java.lang.ClassLoader's protected method loadClass(String name, boolean resolve):

protected synchronized Class loadClass(String name,
boolean resolve) throws ClassNotFoundException {
// First, check if the class has already been loaded
Class c = findLoadedClass(name);
if (c == null) {
   try {
   if (parent != null) {
       c = parent.loadClass(name, false);
   } else {
       c = findBootstrapClass0(name);
   }
   } catch (ClassNotFoundException e) {
       // If still not found, then invoke findClass
       // in order to find the class.
       c = findClass(name);
   }
}
if (resolve) {
   resolveClass(c);
}
return c;
}

The following program shows how to override the protected method loadClass to tweak the delegation model. In the program, the classes inside the package "somewhere" will be loaded from somewhere else even the classes with the same name exist in the application classpath. For loading all the other classes, the delegation model is still respected.

import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.lang.reflect.Method;

import somewhere.Greet;

public class ClassLoaderTest {
public static class MyClassLoader extends ClassLoader {
@Override
protected synchronized Class loadClass(
String name, boolean resolve)
throws ClassNotFoundException {
  Class c = findLoadedClass(name);
  if (c == null) {
    if (name != null && name.startsWith("somewhere."))
      c = findClass(name);
    else
      c = this.getParent().loadClass(name);
  }
  if (resolve) {
    resolveClass(c);
  }
  return c;
}

@Override
protected Class findClass(String name)
throws ClassNotFoundException {
byte[] buf = loadFromSomewhere(name);
return defineClass(name, buf, 0, buf.length);
}

private byte[] loadFromSomewhere(String name)
throws ClassNotFoundException {
String className = name.substring("somewhere.".length());
FileInputStream fis = null;
try {
fis = new FileInputStream("./class/somewhere/" + className
  + ".class");
} catch (FileNotFoundException e) {
throw new ClassNotFoundException(e.getMessage());
}
byte[] buffer = new byte[8192];
int n = 0;
try {
int c = fis.read(buffer);
n = c;
while (c != -1) {
 c = fis.read(buffer, n, buffer.length - n);
 n += c;
}
fis.close();
n++;
} catch (IOException e) {
throw new ClassNotFoundException(e.getMessage());
}
byte[] rv = new byte[n];
System.arraycopy(buffer, 0, rv, 0, n);
return rv;
}
}

/**
* @param args
* @throws ClassNotFoundException
*/
public static void main(String[] args) throws Exception {
MyClassLoader myClassLoader = new MyClassLoader();
Class clazz = myClassLoader.loadClass("somewhere.Greet");
Object object = clazz.newInstance();
Method method = clazz.getMethod("hello");
// Greet from somewhere, print out "hello world"
method.invoke(object);

Greet greet = new Greet();
// local Greet, print out "good morning"
greet.hello();
// java.lang.ClassCastException:
// somewhere.Greet cannot be cast to somewhere.Greet
greet = (Greet) object;
}

}

In fact, A loaded class in a JVM is identified by its fully qualified name and its defining class loader. Consequently, each class loader in the JVM can be said to define its own namespace.

Therefore, leveraging the Java class loader architecture, components inside a Java process are able to have their own, separated code (class) space. This partly forms the foundation for that individual component can be plugged and played as well as hot swapped without affecting the other components in the same process.

Wednesday, November 19, 2008

Let Java SSL Trust All Certificates without Violating Security Manager

Java SSL by default does not trust self-signed certificate. Wikibooks:Programming reveals a way to allow connection to secure HTTP server using self-signed certificate. The magic looks like:

// Create a trust manager that does not validate certificate chains
TrustManager[] trustAllCerts = new TrustManager[]{
new X509TrustManager() {
public java.security.cert.X509Certificate[] getAcceptedIssuers() {
return null;
}

public void checkClientTrusted(
java.security.cert.X509Certificate[] certs, String authType) {
// do nothing
}

public void checkServerTrusted(
java.security.cert.X509Certificate[] certs, String authType) {
// do nothing
}
}
};

// Install the all-trusting trust manager
SSLContext sc = null;
try {
sc = SSLContext.getInstance("SSL");
sc.init(null, trustAllCerts, new java.security.SecureRandom());
} catch(GeneralSecurityException gse) {
throw new IllegalStateException(gse.getMessage());
}
HttpsURLConnection.setDefaultSSLSocketFactory(
sc.getSocketFactory());

However, HttpsURLConnection.setDefaultSSLSocketFactory(...) will throw a SecurityException (a RuntimeException) if a security manager exists and its checkSetFactory method does not allow a socket factory to be specified. The thrown SecurityException looks like

Exception in thread "main" java.security.AccessControlException: access denied (java.lang.RuntimePermission setFactory)
at java.security.AccessControlContext.checkPermission(AccessControlContext.java:323)
at java.security.AccessController.checkPermission(AccessController.java:546)
at java.lang.SecurityManager.checkPermission(SecurityManager.java:532)
at java.lang.SecurityManager.checkSetFactory(SecurityManager.java:1612)
at javax.net.ssl.HttpsURLConnection.setDefaultSSLSocketFactory(HttpsURLConnection.java:308)
at SecurityManagerTest.main(SecurityManagerTest.java:50)

A workaround to avoid such a SecurityException is as below:

URL url = new URL("https://engage.ac.uk");
HttpsURLConnection conn = (HttpsURLConnection) url.openConnection();
conn.setSSLSocketFactory(sc.getSocketFactory());
conn.getInputStream();

The trick is to use the instance method setSSLSocketFactory instead of the static method setDefaultSSLSocketFactory. The former does not throw a SecurityException.

Note: need to use conn.getInputStream() instead of url.openStream(), otherwise the customised SocketFactory won't be used.

Of course to allow to connect the secure web site, the following permission should be added in the Java security policy file:

permission java.net.SocketPermission "engage.ac.uk:443", "connect";

Use Security Manager to Control Java Web Application Behavior

In a Java web application server like Tomcat, we have the Tomcat container and multiple web applications running in the same JVM. It is thus very important to cut the unexpected interactions between applications and between applications and Tomcat. Ideally it is expected the behavior of one web application should not affect the behavior of other applications as well as Tomcat, in a bad way.

One of such unexpected interactions is caused by modifying shared classes and objects, for instance, changing system properties, changing system classes' behavior.

Tomcat 5.5's class loaders are organised as:

      Bootstrap
  |
System
  |
Common
/      \
Catalina   Shared
     /   \
Webapp1  Webapp2 ...

Tomcat 6.0's class loaders are organised as:

      Bootstrap
   |
System
   |
Common
/     \
Webapp1   Webapp2 ...

In both cases, web applications and Tomcat share the classes managed by class loaders Bootstrap, System and Common.

Let's say one web application use the following code to tell Java runtime to use the XSLT implementation shipped with JDK.

System.setProperty("javax.xml.transform.TransformerFactory", "com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl");

Other web applications may choose to use other XSLT implementations in a similar way. If a certain web application's correct behavior is depending on a particular XLST implementation, then we may get a problem, because System.setProperty("javax.xml.transform.TransformerFactory", ...) will change the JVM-wide XSLT implementation.

Therefore it is advisable to use Java security manager to control the permissions granted to web application code.

Note $CATALINA_HOME/bin/startup.sh does not start up a Tomcat with the security manager. To enable the security manager, use $CATALINA_HOME/bin/startup.sh -security. It will append "-Djava.security.manager -Djava.security.policy=..." to the JVM arguments.

A simple Java program testing security manager and policy file

public class SecurityManagerTest {
public static void main(String[] args) throws Exception {
System.setProperty("greeting", "hello world!");
System.out.println(System.getProperty("greeting"));
}
}

Run this program, you will see "hello world!" printed out.

Prepare a policy file called lab.policy:

grant {
permission java.util.PropertyPermission "*", "read";
};

It says any code can only read system properties, but not write.

By the way, the policy file can be created using policytool.

Then run the program like this:

java -Djava.security.manager -Djava.security.policy=lab.policy SecurityManagerTest

You will see an excpetion pop up:

Exception in thread "main" java.security.AccessControlException: access denied (java.util.PropertyPermission greeting write)
at java.security.AccessControlContext.checkPermission(AccessControlContext.java:323)
at java.security.AccessController.checkPermission(AccessController.java:546)
at java.lang.SecurityManager.checkPermission(SecurityManager.java:532)
at java.lang.System.setProperty(System.java:727)
at SecurityManagerTest.main(SecurityManagerTest.java:11)

See the security manager is working. It is important to give the right location for the policy file. Because when the security manager is enabled, by default no permission is granted. So any permission will be denied except those defined in the policy file. In case the policy file cannot be found, then the code is given no permission at all.

Friday, November 14, 2008

Update RHEL5

See here (within intranet, may require authentication) for how to register RHEL4 for up2date and RHEL5 for yum.

After that, then can use yum update for a complete update.

To install a certain package: yum install package_name

To see what a package does: yum provides package_name

To find packages containing some keyword: yum list | grep keyword

To check already installed packages that contains some keyword: rpm -qa | grep keyword

Tuesday, November 11, 2008

Make a Java-enabled Virtual Machine Using Ubuntu JeOS

As suggested in the best practice to build a virtual appliance by VMware, Ubuntu JeOS is used as the operating system. In around 100MB, it provides a just enough OS. After installed, its VMware virtual disk file size is about 380MB.

By default, Ubuntu does not provide a root password. The root privilege is carried out using the "sudo" command. To enable root login: sudo passwd root.

Also, see how to prepare a virtual appliance from the Ubuntu community.

Adding a new virtual disk

To add a new virtual disk, using the "Add Hardware" command in VMware web interface.
Need to restart the virtual machine to detect the new virtual disk.
Let's say the newly added hard drive is /dev/sdb. Use the following command to partition it: fdisk /dev/sdb. To create a single new partition, the entire size of /dev/sdb: n ENTER p ENTER 1 ENTER (default) ENTER (default) ENTER w ENTER.
To format it: mkfs.ext3 /dev/sdb.
To mount it on an existing location, say /data: mount /dev/sdb /data.

See how to format a new hard disk on ubuntu forums.

Making a Java appliance

The next step will be to make it a Java appliance, with JDK, ant as well as Tomcat installed, running some Java application like Grimoires.

Ubuntu JeOS does not pre-install Open SSH server. Use the following command to install Open SSH: apt-get install openssh-server.

Then install

JDK 1.6.0_10
Apache Ant 1.7.1
Apache Tomcat 5.5.27
Grimoires 2.0.0

The used disk space shown by "df" inside virtual machine is about 675MB.

Setting up NAT

Assume the virtual machine has been configured to support NAT. Add the following line to /etc/vmware/vmnet8/nat/nat.conf, under the [incomingtcp] section:

6660 = 172.16.59.132:8080

where 172.16.59.132 is the IP address of the virtual machine.

According to the VMware server user guide, clicking "Refresh Network List" in the Virtual Infrastructure web interface should bring up the modified network configuration. But it does not work in my case. I have to restart VMware to enforce the new NAT configuration: service vmware restart. (Better shut down vm before restart vmware server!)

After restart VMware, and make sure the port 6660 is opened in the firewall of the host machine, access http://hostname_of_the_host:6660/grimoires from another machine.

It seems that vmware server runs a separate daemon to handle the address translation when vm is set to use NAT. So this blocks iptables' NAT configuration.

Cloning the Virtual Machine

This is something a little bit tricky.

I used the conventional way to clone it: copy the vmdk and vmx files; then add the clone virtual machine to the inventory ("Add Virtual Machine to Inventory"); when asked whether you copied it or moved it, answer copied it.

The clone was able to start up. But there was no eth0. The routing table was empty (shown by route). There was an eth1 with the correct MAC address but without IP information (shown by ifconfig). Not surprisingly, TCP/IP was not working.

My judgement was that there was no reason for the virtual network card to go wrong, so it should be due to some configuration issue.

Hinted by a post about adding a second network card to Ubuntu, I used dmesg | grep eth to check the kernel messages on booting. Then I found an interesting message: "udev: rename eth0 to eth1". After a little bit googling and investigation, I solved the problem!

udev is the device manager for the Linux 2.6 kernel series. In /etc/udev/rules.d/70-persistent-net.rules, there were two lines:
SUBSYSTEM=="net", ...... ATTR{address}=="00:0c:29:07:6f:37", ...... NAME="eth0"
SUBSYSTEM=="net", ...... ATTR{address}=="00:0c:29:cf:c8:64", ...... NAME="eth1"

The MAC address in the first line is the one of the original vm, while the MAC address in the second line is the correct MAC address of the clone. Clearly, the first line has no effect, because there is no such a device in the system. And the second line adds the network card as eth1.

Looking at /etc/network/interfaces, it only defines lo and eth0. eth0 is defined like this:
auto eth0
iface eth0 inet dhcp

There is no wonder why only eth1 was visible but did not support TCP/IP. After commenting out the first line and changing eth1 to eth0 in the second line in /etc/udev/rules.d/70-persistent-net.rules, and restarting the system, TCP/IP is working!

Summary

This is a convenient way to prepare a small footprint Virtual Machine for running Java. It can be used to deliver Java software, or as a test or evaluation environment for Java product.

Thursday, November 06, 2008

OpenSSL

Recently I used the openssl utility to generate key pairs and certificates. For example,

To generate a private key: openssl genrsa -out ca.key 2048
To create a self-signed certificate: openssl req -new -key ca.key -x509 -days 365 -out ca.crt
To create a certificate signing request: openssl req -new -key temp.key -out temp.csr
To create a certificate from a certificate signing request: openssl x509 -req -in temp.csr -CA ca.crt -CAkey ca.key -CAcreateserial -out temp.crt
To display a certificate: openssl x509 -text -in temp.crt
To display the content of a pkcs12 formatted certificate (the displayed private key and certificate are in PEM format, which can be used in the above commands): openssl pkcs12 -in old_uk_escience.p12 -out old.txt
To convert from pkcs12 format to PEM format: openssl pkcs12 -in cred.p12 -out cert.pem -nodes -clcerts -nokeys, openssl pkcs12 -in cred.p12 -out key.pem -nodes -nocerts
To create pkcs12 format certificate using PEM format private key and certificate: openssl pkcs12 -in temp.crt -inkey temp.key -out temp.p12 -export

Self-Decrypting HTML Page

This is something worthy doing. Let's call it self-decrypting HTML page.

It is a publicly accessible, encrypted HTML page, that can be hosted in any website, but can only be decrypted by authorized persons who know the key. JavaScript can be used to perform encryption and decryption locally at the browser. Thus what is transmitted on the net and stays at the public website is the encrypted information, which guarantees the confidentiality of the information. Some symmetric key algorithm is preferred.

A self-decrypting HTML page generator is needed to convert the to-be-protected information to a self-decrypting HTML page.

It is noted that a United States Patent 7003800 has been filed on "Self-decrypting web site pages". It seems it talks about a method pertinent to a web site.

A self-decrypting email utility performs the similar task, which uses RC4 as the cipher. Here is the JavaScript code for RC4.

Compared with another web-based confidential information system, i.e., some information is hosted on a HTTPS server and is protected by some authentication method, self-decrypting HTML page does not require an authentication method, thus no need for user management. All needed to access the confidential information is the URL of the self-decrypting HTML page and the key.

Will Google App Engine be a good platform for this? Will it be possible to use Google AdSense to generate revenue from this application?

Tuesday, November 04, 2008

Handle Chinese in Java

(This is one of my old Google Notes.)

In Java, Reader and Writer are used to handle character (char) stream, and InputStream and OutputStream are used to handle byte streams. InputStreamReader and OutuptStreamWriter are bridges between byte streams and character streams.

Characters can have many different coding schemes, such as ASCII, GB2312, UTF-8 (Unicode Transformation Format), when they are represented in bytes. While characters in Java, char or String, are Unicode only.

A Charset is a named mapping between sequences of sixteen-bit Unicode, which is the character representation in Java, and sequences of bytes. A Charset knows how to convert a byte sequence to a (Unicode) char sequence, and vice versa, following the standard it implements.

Not surprisingly, both InputStreamReader and OutuptStreamWriter can be configured to use a specified Charset, but there is no concept of Charset in Reader and Writer.

The following Java program demonstrates how to handle Chinese in Java.

import java.io.BufferedWriter;
import java.io.OutputStreamWriter;

public class ChineseTest {
public static void main(String[] args) throws Exception {
String chinese = "\u4eca\u65e5\u83dc\u6839\u8c2d"; // 1
System.out.println(chinese); //2
BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(System.out, "GB2312")); //3
bw.write(chinese); //4
bw.close(); //5
}
}

Code comments:

This is the Unicode for 今日菜根谭, generated by "native2ascii -encoding GB2312 c.txt", where the content of c.txt is 今日菜根谭 encoded in GB2312. The utility native2ascii converts a file with native-encoded characters (characters which are non-Latin 1 and non-Unicode) to one with Unicode-encoded characters.
Can't print 今日菜根谭, by using the platform's default Charset.
Create a Writer using the GB2312 Charset.
Now we can print out 今日菜根谭.
Flush the output. Required.

Virtual DOM

(This is one of my old Google Notes. Virtual DOM is a small piece of software I implemented.)

Complying with the W3C DOM interface, Virtual DOM is capable of representing in memory a large XML data which can not be represented using other DOM implementations such as Xerces. Virtual DOM is aiming at allowing off-the-shelf XPath engines such as Jaxen to process XML DOM representation that is too large to be held in memory otherwise. To support such a goal, Virtual DOM must be told how to load DOM element, and then it uses SoftReference to cache the loaded element. To put it in a simple way, the Virtual DOM elements can be garbage collected when Java heap space becomes scarce, and later be reloaded on demand.

The problem of Xerces DOM implementation

Each node has references to its parent, children, and siblings. So as long as there is a single reference to any node of the DOM tree, any parts of the tree can not be garbage collected.

Implementation of Virtual DOM

Virtual DOM adopts a forest mode, where there is a virtual document and a virtual root element. The virtual root element has a number of child elements, which is implemented as CollectableElement extended from CollectableNode, each representing a different DOM document.

Each Virtual DOM node, called CollectableNode, has a reference to DocumentCache, which in turn has a SoftReference to the cached owning document. DocumentCache also has enough information to load and reload the owning document whenever necessary.

Each Virtual DOM node has information specifying how to locate this node from the root node of the owning document.

Each Virtual DOM node also has a SoftReference to its corresponding concrete DOM node for convenience purpose.

The CollectableElement that is the direct child of the virtual root element has an index indicating which child it is in terms of the virtual root element. Thus it is able to retrieve its next sibling.

All exported DOM references should be of Collectable* instead of the default DOM ones to prevent DOM node references from being held externally.

Test

With 64 MB heap, using a revised Jaxen 1.1 (one of Jaxen 1.1's methods unnecessarily retains object references.), Virtual DOM is able to deal with at least 250,000 instances of a certain XML document. In a comparison, Xerces DOM runs out of memory at 39,000 instances.

(It is called VirtualDOM in my Eclipse projects.)

JSPWiki ACL Filter

I am working on the OMII-UK website. JSPWiki has been adopted as the web content management system as well as a wiki in the website. JSPWiki has been customised with an OMII-UK template, and authentication and authorization modules. There are many users, so In-page ACLs are adopted to protect some page from unauthorised editing. For instance, the following ACLs say only members of staff group can view and edit the page containing this ACL:

[{ALLOW edit StaffGroup}]

Because members of staff group can edit this page, members of staff group can also edit the ACL, which is nothing more than a JSPWiki markup. This causes some potential security flaw: any member of staff group can edit ACL, e.g., by mistake, and thus violate the intended access control of this page. Ideally, ACL, though residing in a page, should be treated differently from the other page source.

Thus an ACL Filter is introduced to only allow users with AllPermission to create/edit/delete in-page ACL. For instance, in the above example, even any member of staff group can edit the page, but only users with AllPermission can change the ACL to something other than the above ACL.

Essentially, the ACL Filter is trying to separate two concerns, content and access control over content, which are originally mixed up in the wiki markup. With the ACL Filter, content and access control over content are treated differently: any one authorised by ACLs can edit content, but only some certain super users can edit access control.

This work has been contributed back the JSPWiki community. See here.

Spring JUnit Test and Rollback DB Transaction

In Spring, doing unit test with JUnit4 can be as simple as this:

@RunWith(SpringJUnit4ClassRunner.class)
@ContextConfiguration
public class HibernateDaoTest {
@Autowired
protected RepositoryHibernateDao repositoryHibernateDao;

@Test
@Transactional
public void submitProject() {
Project newp = new Project();
newp.setStatus("PENDING");
int pid = repositoryHibernateDao.save(newp);
assertTrue(pid > 0);
Project p = repositoryHibernateDao.getProjectDetail(pid);
assertEquals(p.getName(), name);
assertEquals(p.getSubmitUser(), owner);
repositoryHibernateDao.delete(p);
}
}

With the annotation @Transactional, when the test is finished, the database is supposed to roll back to the state before the test. But this does not happen in my Spring/Hibernate/MySQL setting.

It turns out to be that in order to support transaction, the MySQL table must be an InnoDB table.

Usually the tables are MyISAM ones, which are non-transactional. The MyISAM table provides high-speed storage and retrieval, as well as fulltext searching capabilities. It is supported in all MySQL configurations, and is the default storage engine unless you have configured MySQL to use a different one by default.

On the other hand, the InnoDB and BDB storage engines provide transaction-safe tables. InnoDB is included by default in all MySQL 5.0 binary distributions. Here describes how to
Convert a MyISAM table to innoDB. Basically, what needs to be done is: ALTER TABLE ... ENGINE=INNODB

Monday, November 03, 2008

Map of Downloads

I have used Google Maps API to display on the map where the downloads of OMII-UK software are from. See the downloads for Grimoires 2.0.0. By clicking on the balloon, more geographic information about the download will pop up. For instance, I know there is a guy in Changsha, Hunan, China has downloaded Grimoires 2.0.0. All the other downloads are from UK, France and Germany.

Hibernate Performance

(This is one of my old Google Notes.)

I have used YourKit to benchmark the performance of Hibernate some time ago.

In my database, I have two tables: Project and Release. Project is associated with release in a one-to-many relationship. In my setting, a SQL statement using JDBC costs from several milliseconds to ~30 milliseconds; serving a JSP costs from 1 second to 3 seconds in the first run, then from tens of milliseconds to ~250 milliseconds in the later run.

In Hibernate,

Getting a single project (as well as its releases) costs 234ms in the first run, and 14ms in average (excluding the first run);
Getting all projects costs 547ms in the first run, and 96ms in average;
Getting some partial information of all projects, costs 282ms in the first run, and 15ms in average.

The reason for that the first run is much slower, is because Hibernate does bytecode instrumentation to generate proxy object on the first touch of ORM, which is quite expensive. However, the late runs approximate JDBC's performance.

Some Hibernate tips:

Chapter 19 of the Hibernate reference describes how to improve its performance.
Use set instead of list. List uses the id as index, thus creates a large data structure with lots of empty cells.
Associated objects are initialized in a lazy way. They need to be touched to bring into memory.

Sunday, November 02, 2008

Dependency Resolution

(This is one of my old Google Notes.)

In the software architecture, one component (caller) is relying on some services provided by another component (callee). Before the caller can use the service of the caller, the caller must hold an instance of the callee. There are several ways to do it.

First, the caller can create an instance of the callee. Sometimes, only one instance of the callee is preferred to exist in the whole system, because it is unique or it is expensive to create, e.g., the database manager. In that case, the singleton pattern can be used. But there are some shortcomings with the singleton pattern. First, because the callee has to implement the singleton pattern and the callers have to invoke the singleton pattern in a proper way, the code of solving dependency is scattering around the whole system. Second, it is inflexible to replace a callee component with one of the same interface.

Second, the caller can go to a "service locator", then use, for instance, a JNDI directory to locate callees. Though the service locator maintains a central place of managing all service providers (callees), the caller still needs to know how to talk to the service locator.

Third, this is state-of-the-art: dependency injection, which is implemented in many frameworks, such as Spring and Google Guice. Compared to the above two approaches, the advantages of dependency injection are obvious. Both caller and callee can be POJO (Plain Old Java Object), with no special control implemented. And there is a single control point. In Spring, that is the Spring XML configuration file. Thus the dependency is solved in a declarative way! Nowadays we always prefer a declarative way to a programmatic way to do things.

In a service-oriented architecture, service workflow is a multi-component system. Any component service can be a caller, a callee, or both. There is some similarity between dependency injection and how a workflow is composed: both have a central control point for solving dependency in a declarative way. In workflow, that is the workflow description file, e.g., written using BPEL. On the other hand, in service-oriented architecture, the service registry shares a similar idea as the service locator pattern.

Friday, October 31, 2008

Tomcat Troubleshooting

The OMII-UK website is a Java web application running in Tomcat, which sits behind an Apache web server. Several methods are there for monitoring and troubleshooting it.

Monitoring

First, a simple web monintor is running on another machine, and checking the contents of several important web pages every 15 minutes. If anything goes wrong, I will get an email notification. Of course, this task can also be done by Nagios. The advantage of using my own web monitor instead of a complicated monitor system like Nagios, is that I have full control over what to inspect. For instance, I am able to detect that although Tomcat is still OK but the database connection is down.

Second, in the Java web applicaiton, which is based on the Spring Framework, Spring AOP (Aspect Oriented Programming) is leveraged to monitor the performance of all servlets. If it takes any servlet more than a pre-defined threshold to serve a request, I'll get an email notification. The threshold, currently set to 200ms, is defined in Spring XML configuration file.

Third, Tomcat is started with "-Dcom.sun.management.jmxremote", which enables local jconsole connection. JDK 1.5 is used. In JDK 1.6, jconsole support is default, so no need to set a JVM option to enable it.

What if something goes wrong ...

Check Apache web server only. http://hostname/server-status. You may need to modify httpd.conf to enable this.
Check Tomcat only. http://hostname:8080.
Check jconsole->Memory. See if JVM runs out of memory.
Check Tomcat logs, Apache logs and database log.
Last but not least important, use Chainsaw to check log4j.xml. Chainsaw is a GUI, which makes it easy to browse log4j's log. The log4j configuration may look like:

log4j.appender.FileLog = org.apache.log4j.RollingFileAppender
log4j.appender.FileLog.MaxFileSize = 10MB
log4j.appender.FileLog.MaxBackupIndex = 14
log4j.appender.FileLog.File=$CATALINA_HOME/logs/log4j.xml
log4j.appender.FileLog.layout = org.apache.log4j.xml.XMLLayout
log4j.rootCategory=WARN,FileLog

Let's say there is a memory leak or something inside the Java web application is wrong ...

There are various JVM options that allow us to debug or profile Java web application:

Frequently used server VM options: -server -Xms512m -Xmx512m
jconsole support: -Dcom.sun.management.jmxremote
Dump heap on running out of memory: -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/root/heapdump
Enable debugging: -Xdebug -Xrunjdwp:transport=dt_socket,server=y,address=8000
Enable YourKit profiling: -agentlib:yjpagent

YourKit is a very good Java profiling tool, and it has a free licence for open source projects. That is what I am using. To enable profiling manually, see here.

On Windows, 32-bit, add \bin\win32 to the (SYSTEM) PATH.
On Linux x86, 32-bit, add /bin/linux-x86-32 to the LD_LIBRARY_PATH.
Check whether it is working: java -agentlib:yjpagent=help
Profiling: java -agentlib:yjpagent
Do not forget to force garbage collection before capture memory.

I also have a stress test tool that replays saved Apache/Tomcat access logs.

Thursday, October 30, 2008

Pass Apache Authentication to Tomcat

It costs me several hours to figure this out. So I think it is worthy writing it down.

I have an Apache web server (2.2.3) sitting in front of Tomcat (5.5.27). In Apache configuration, I have:

<location>
AuthType Basic
AuthName "Secure Service"
AuthUserFile /etc/httpd/conf/user.db
require valid-user
</location>

If the authentication succeeds, the HTTP request is passed to Tomcat by mod_proxy_ajp:

ProxyPass /secure/ ajp://localhost:8009/secure/

In Tomcat server.xml, I disable the Tomcat authentication in the AJP connector (both tomcatAuthenticaiton and request.tomcatAuthentication work):

<Connector port="8009" enablelookups="false" redirectport="8443" protocol="AJP/1.3" address="127.0.0.1" tomcatauthentication="false">

If the authentication succeeds, Apache will create an HTTP head:

REMOTE_USER = omii

But in Tomcat, I do not see the REMOTE_USER header. Instead, I see

authorization = Basic b3ip9kd9dkekd9

It turns out that Tomcat puts the Apache authentication information in the form of a user principal, which can be accessed by the following code inside a JSP page:

java.security.Principal pr = request.getUserPrincipal();
if (pr != null) String r = pr.getName(); // r.equals("omii")

Shibboleth: Federated Trust

Having been developed for around 3 years, Shibboleth provides web single sign on and attribute exchange, which together build up a trust federation.

Like many ideas in computer science, Shibboleth is also about separation of concerns. Two pairs of concerns have been identified and separated in architecting Shibboleth: Service Provider (SP) has been separated from Identity Provider (IdP), and identity has been separated from attributes.

By separating IdP from SP, an SP, i.e., a Shibboleth protected web application, is free of maintaining a user database and perform authentication, which are delegated to some appropriate IdP. Though SP may still needs to enforce local authorization decisions. the WAYF (Where Are You From) service is leveraged as a means to locate a suitable IdP that is able to authenticate a user who want to access SP.

By separating attributes from identity, flexibility is achieved on how to represent an authenticated user. Let's say I try to access an SP and I use University of Southampton's IdP to authenticate myself. As a result of the authentication, I will get two attributes: one is eduPersonPrincipalName, i.e, my name; the other is eduPersonRole, in my case, that is research staff. My IdP does not need to give out my eduPersonPrincipalName to the SP. Instead giving out the eduPersonRole attribute might sufficiently entitle me to the service. In this way, my privacy is somehow protected even when I am granted the access.

Shibboleth is built over SAML. If you look at Shibboleth technical specs, SAML specs add up to several hundred pages, while the Shibboleth architecture specification has only 19 pages. SAML defines core (XML schema for SAML assertions and protocol message elements), protocol (what is transmitted), binding (how the protocol messages are transmitted) and profiles ( a concrete manifestation of a defined use case using a particular combination of assertions, protocols, and bindings).

Shibboleth promotes a trust federation. It is the responsibility of the federation that decides which SP and IdP can join the federation. Thus the trust between a user and an SP is maintained by the federation. A peace of mind for users.

OpenID is another technology for web single sign on. Different from Shibboleth, there is no such a federation in the OpenID architecture. Thus it is down to users who decide whether they should trust a service provider and use their OpenIDs on the service provider's website. I am not happy with it. I also do not think it is a very good idea to use a single OpenID to access websites with different information confidentiality levels, such as online banking, webmail and a simple web-based game site. Clearly online banking has the highest level in terms of information confidentiality, the game site has the lowest, and webmail sits between them.

In UK, UK Federation manages a Shibboleth federation for eduation can research. Here is the current membership, i.e., who provides SP and IdP. And here is the list of all available services that support UK Federation managed Shibboleth authentication

Edsger W. Dijkstra in his 1974 paper "On the role of scientific thought" explains why separation of concerns is so important:

Let me try to explain to you, what to my taste is characteristic for all intelligent thinking. It is, that one is willing to study in depth an aspect of one's subject matter in isolation for the sake of its own consistency, all the time knowing that one is occupying oneself only with one of the aspects. We know that a program must be correct and we can study it from that viewpoint only; we also know that it should be efficient and we can study its efficiency on another day, so to speak. In another mood we may ask ourselves whether, and if so: why, the program is desirable. But nothing is gained --on the contrary!-- by tackling these various aspects simultaneously. It is what I sometimes have called "the separation of concerns", which, even if not perfectly possible, is yet the only available technique for effective ordering of one's thoughts, that I know of. This is what I mean by "focusing one's attention upon some aspect": it does not mean ignoring the other aspects, it is just doing justice to the fact that from this aspect's point of view, the other is irrelevant. It is being one- and multiple-track minded simultaneously.

Wednesday, October 29, 2008

Use Virtual Machine to Evaluate Software

Here at OMII-UK we need to evaluate software generated in the commissioned software project from time to time. For instance, now it is my duty to evaluate SPAM-GP, some JSR-168 portlets for security stuff. I am using VMware Server to do this job. I think it is great! Basically, I build a virtual machine, in which I install, configure and test the software to be evaluated. Benefits by doing so are obvious:

It is a VM, where the software is running. So it has all advantages as we dedicate a physical machine to the software. This is a working instance, and we can keep it as long as we wish. It is also easy for someone else who wants to see a demo or to play with it.
Compared to a physical machine, the VM only costs around 3GB disk space. Sounds quite large, but it is little in my 200GB harddisk.

VMware Server supports snapshot. A snapshot is a persistent state of the whole VM. It can be used to save the current state of the VM, which we are happy with. Then we can carry on manipulating the VM and come back to the snapshot whenever we want. But unfortunately, in VMware Server, only one snapshot can be taken for each VM. A newly taken snapshot will simply replace the previous one.

My machine has 1GB memory. So sometimes it is slow to run VMware server, particularly when two VMs are running at the same time.