Friday, October 31, 2008

Tomcat Troubleshooting

The OMII-UK website is a Java web application running in Tomcat, which sits behind an Apache web server. Several methods are there for monitoring and troubleshooting it.

Monitoring

First, a simple web monintor is running on another machine, and checking the contents of several important web pages every 15 minutes. If anything goes wrong, I will get an email notification. Of course, this task can also be done by Nagios. The advantage of using my own web monitor instead of a complicated monitor system like Nagios, is that I have full control over what to inspect. For instance, I am able to detect that although Tomcat is still OK but the database connection is down.

Second, in the Java web applicaiton, which is based on the Spring Framework, Spring AOP (Aspect Oriented Programming) is leveraged to monitor the performance of all servlets. If it takes any servlet more than a pre-defined threshold to serve a request, I'll get an email notification. The threshold, currently set to 200ms, is defined in Spring XML configuration file.

Third, Tomcat is started with "-Dcom.sun.management.jmxremote", which enables local jconsole connection. JDK 1.5 is used. In JDK 1.6, jconsole support is default, so no need to set a JVM option to enable it.

What if something goes wrong ...
  1. Check Apache web server only. http://hostname/server-status. You may need to modify httpd.conf to enable this.
  2. Check Tomcat only. http://hostname:8080.
  3. Check jconsole->Memory. See if JVM runs out of memory.
  4. Check Tomcat logs, Apache logs and database log.
  5. Last but not least important, use Chainsaw to check log4j.xml. Chainsaw is a GUI, which makes it easy to browse log4j's log. The log4j configuration may look like:
log4j.appender.FileLog = org.apache.log4j.RollingFileAppender
log4j.appender.FileLog.MaxFileSize = 10MB
log4j.appender.FileLog.MaxBackupIndex = 14
log4j.appender.FileLog.File=$CATALINA_HOME/logs/log4j.xml
log4j.appender.FileLog.layout = org.apache.log4j.xml.XMLLayout
log4j.rootCategory=WARN,FileLog

Let's say there is a memory leak or something inside the Java web application is wrong ...

There are various JVM options that allow us to debug or profile Java web application:
  • Frequently used server VM options: -server -Xms512m -Xmx512m
  • jconsole support: -Dcom.sun.management.jmxremote
  • Dump heap on running out of memory: -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/root/heapdump
  • Enable debugging: -Xdebug -Xrunjdwp:transport=dt_socket,server=y,address=8000
  • Enable YourKit profiling: -agentlib:yjpagent
YourKit is a very good Java profiling tool, and it has a free licence for open source projects. That is what I am using. To enable profiling manually, see here.
  • On Windows, 32-bit, add \bin\win32 to the (SYSTEM) PATH.
  • On Linux x86, 32-bit, add /bin/linux-x86-32 to the LD_LIBRARY_PATH.
  • Check whether it is working: java -agentlib:yjpagent=help
  • Profiling: java -agentlib:yjpagent
  • Do not forget to force garbage collection before capture memory.
I also have a stress test tool that replays saved Apache/Tomcat access logs.

Thursday, October 30, 2008

Pass Apache Authentication to Tomcat

It costs me several hours to figure this out. So I think it is worthy writing it down.

I have an Apache web server (2.2.3) sitting in front of Tomcat (5.5.27). In Apache configuration, I have:

<location>
AuthType Basic
AuthName "Secure Service"
AuthUserFile /etc/httpd/conf/user.db
require valid-user
</location>

If the authentication succeeds, the HTTP request is passed to Tomcat by mod_proxy_ajp:

ProxyPass /secure/ ajp://localhost:8009/secure/

In Tomcat server.xml, I disable the Tomcat authentication in the AJP connector (both tomcatAuthenticaiton and request.tomcatAuthentication work):

<Connector port="8009" enablelookups="false" redirectport="8443" protocol="AJP/1.3" address="127.0.0.1" tomcatauthentication="false">

If the authentication succeeds, Apache will create an HTTP head:

REMOTE_USER = omii

But in Tomcat, I do not see the REMOTE_USER header. Instead, I see

authorization = Basic b3ip9kd9dkekd9

It turns out that Tomcat puts the Apache authentication information in the form of a user principal, which can be accessed by the following code inside a JSP page:

java.security.Principal pr = request.getUserPrincipal();
if (pr != null) String r = pr.getName(); // r.equals("omii")

Shibboleth: Federated Trust

Having been developed for around 3 years, Shibboleth provides web single sign on and attribute exchange, which together build up a trust federation.

Like many ideas in computer science, Shibboleth is also about separation of concerns. Two pairs of concerns have been identified and separated in architecting Shibboleth: Service Provider (SP) has been separated from Identity Provider (IdP), and identity has been separated from attributes.

By separating IdP from SP, an SP, i.e., a Shibboleth protected web application, is free of maintaining a user database and perform authentication, which are delegated to some appropriate IdP. Though SP may still needs to enforce local authorization decisions. the WAYF (Where Are You From) service is leveraged as a means to locate a suitable IdP that is able to authenticate a user who want to access SP.

By separating attributes from identity, flexibility is achieved on how to represent an authenticated user. Let's say I try to access an SP and I use University of Southampton's IdP to authenticate myself. As a result of the authentication, I will get two attributes: one is eduPersonPrincipalName, i.e, my name; the other is eduPersonRole, in my case, that is research staff. My IdP does not need to give out my eduPersonPrincipalName to the SP. Instead giving out the eduPersonRole attribute might sufficiently entitle me to the service. In this way, my privacy is somehow protected even when I am granted the access.

Shibboleth is built over SAML. If you look at Shibboleth technical specs, SAML specs add up to several hundred pages, while the Shibboleth architecture specification has only 19 pages. SAML defines core (XML schema for SAML assertions and protocol message elements), protocol (what is transmitted), binding (how the protocol messages are transmitted) and profiles ( a concrete manifestation of a defined use case using a particular combination of assertions, protocols, and bindings).

Shibboleth promotes a trust federation. It is the responsibility of the federation that decides which SP and IdP can join the federation. Thus the trust between a user and an SP is maintained by the federation. A peace of mind for users.

OpenID is another technology for web single sign on. Different from Shibboleth, there is no such a federation in the OpenID architecture. Thus it is down to users who decide whether they should trust a service provider and use their OpenIDs on the service provider's website. I am not happy with it. I also do not think it is a very good idea to use a single OpenID to access websites with different information confidentiality levels, such as online banking, webmail and a simple web-based game site. Clearly online banking has the highest level in terms of information confidentiality, the game site has the lowest, and webmail sits between them.

In UK, UK Federation manages a Shibboleth federation for eduation can research. Here is the current membership, i.e., who provides SP and IdP. And here is the list of all available services that support UK Federation managed Shibboleth authentication

Edsger W. Dijkstra in his 1974 paper "On the role of scientific thought" explains why separation of concerns is so important:

Let me try to explain to you, what to my taste is characteristic for all intelligent thinking. It is, that one is willing to study in depth an aspect of one's subject matter in isolation for the sake of its own consistency, all the time knowing that one is occupying oneself only with one of the aspects. We know that a program must be correct and we can study it from that viewpoint only; we also know that it should be efficient and we can study its efficiency on another day, so to speak. In another mood we may ask ourselves whether, and if so: why, the program is desirable. But nothing is gained --on the contrary!-- by tackling these various aspects simultaneously. It is what I sometimes have called "the separation of concerns", which, even if not perfectly possible, is yet the only available technique for effective ordering of one's thoughts, that I know of. This is what I mean by "focusing one's attention upon some aspect": it does not mean ignoring the other aspects, it is just doing justice to the fact that from this aspect's point of view, the other is irrelevant. It is being one- and multiple-track minded simultaneously.

Wednesday, October 29, 2008

Use Virtual Machine to Evaluate Software

Here at OMII-UK we need to evaluate software generated in the commissioned software project from time to time. For instance, now it is my duty to evaluate SPAM-GP, some JSR-168 portlets for security stuff. I am using VMware Server to do this job. I think it is great! Basically, I build a virtual machine, in which I install, configure and test the software to be evaluated. Benefits by doing so are obvious:
  • It is a VM, where the software is running. So it has all advantages as we dedicate a physical machine to the software. This is a working instance, and we can keep it as long as we wish. It is also easy for someone else who wants to see a demo or to play with it.
  • Compared to a physical machine, the VM only costs around 3GB disk space. Sounds quite large, but it is little in my 200GB harddisk.
VMware Server supports snapshot. A snapshot is a persistent state of the whole VM. It can be used to save the current state of the VM, which we are happy with. Then we can carry on manipulating the VM and come back to the snapshot whenever we want. But unfortunately, in VMware Server, only one snapshot can be taken for each VM. A newly taken snapshot will simply replace the previous one.

My machine has 1GB memory. So sometimes it is slow to run VMware server, particularly when two VMs are running at the same time.