Monday, June 29, 2009

More on Java Troubleshooting

JDK has provided better and better troubleshooting facilities, among other tools. Obviously they are extremely important to develop serious Java applications, which usually have to offer specified performance under resource restraint.

These tools include
  • jinfo pid : check Java command line options and system properties.
  • jstack pid : print thread's stack the their status (e.g., blocking on some object). This is great for knowing what's going on inside the Java application at runtime. And there is no need to set up JVM specially for this purpse.
  • jvisualvm : profile and monitor Java process. It seems JDK on Linux has jvisualvm by default. But you need to install jvisualvm separately on Windows.
as well as some other tools already mentioned before:
  • jconsole
  • jmap
  • jhat

Thursday, June 25, 2009

Clean up Harddisk before Dispose of Computer

After 3 years' working as a heater, my Sony VAIO laptop is finally retiring. There is one critical task that must be done before it can be disposed of: cleaning up its harddisk. Since I graduated from a computer lab that researched and developed one of the very first harddisks in China, I long know not only deleted files but also overwritten files can be recovered. Nevertheless, this blog as well as its comments still give me a lot of information. There are some useful points:
  • On Linux, shred can be used to clean up disk. By default it overwrites the harddisk for 25 times. Call it like this: shred -vz /dev/hda (-z: finally overwrite with all zero to disguise the shred process.)
  • Or, use DBAN, which is used by many governments. DBAN stands for "Darik's Boot And Nuke". The software is a boot CD/DVD image that is used to boot the computer and then do the cleaning up.
  • Here is the seminal paper describing the theory behind securely deleting harddisk data.
It will be useful to know how long it takes for shred or DBAN to process a harddisk of a certain size.

On my Sony VAIO VGN-SZ3XWP/C, shred takes about 3.5 hours to 4 hours to run a random pass of a 39 GiB disk partition! A random pass overwrites the disk with random bits. Not sure how random it is. The 1st and the 13th (possibly the last one, the 25th pass) passes are random passes. The other passes use different short fixed pattern to overwrite. Each costs about half an hour.

Further updates for running "shred -vz /dev/sda" on a 94GiB harddisk. The 1st, 13th and 25th passes are using random data generated from /dev/urandom. The other passes write fixed but different bit patterns. It takes 49h33m26s to finish.

Monday, June 22, 2009

IDE for C++

I may work on a C++ project in future. So I am looking for an IDE for C++. I know A good IDE can double the productivity of programming.

Unlike Java, C++ is not cross platform and has many variants though it does have a standard. Thus which IDE to choose is dependent on which C++ to use.

On Windows, Microsoft Visual Studio is the choice for Visual C++ :)

I would like to choose Eclipse CDT for GNU C++. CDT is only IDE. To build and debug C++ code, CDT requires external toolchain, e.g., GNU toolchain. CDT on Liunx will automatically pick up GNU toolchain, which usually is there. But to make CDT on Windows work, we need to install MinGW (Minimalist GNU for Windows). Here is a good tutorial on installing MinGW. Once MinGW is properly installed and added into PATH, CDT can pick up MinGW as a toolchain automatically.

Friday, June 19, 2009

Troubleshooting OutOfMemoryError

When a large running Java application throws out an OutOfMemoryError, it indicates either the existence of a memory leak bug, or simply the fact that the maximum heap size has been reached. Don't panic, it is straightforward to do troubleshooting.

Most importantly, DO NOT SHUTDOWN the problematic JVM. Keep the crime scene.
  1. Use jmap to dump the heap. Under JDK 1.5, the dumped heap, in the format of hprof, is always put under the home directory and under the name "heap.bin". In addition, jmap can also be used to show the heap object histogram, a quick way to see the classes occupying the most space.
  2. Use jhat to analyze the heap dumped by jmap. If the heap is dumped by a JDK 1.5 jmap, invoke jhat (that is only available in JDK6+) in this way: jhat -J-mxNm -stack false heap_dump_file. "-stack false" turns off tracking object allocation call stack because the allocation site information is not available in the heap dump. N should be larger than the maximum heap size used when running the problematic Java process. jhat takes quite some time to analyze the heap dump.
  3. jhat starts up a web server after finishes analyzing the heap dump. Now we can use a browser to point to the jhat web server, and find out who use up all the memory.
  4. At the very end of the front page, click the link "Show heap histogram". it takes quite some time to generate the histogram.
  5. In the histogram, the classes whose instances occupy most of the heap can be easily identified. Obviously they are suspects of the crime.
  6. Clicking one of the suspects brings us to the page showing the referers of the suspect. In this way, we can track down which part of our code hold references to objects that use up heap. Now by using our knowledge of the program logic, the problem cause can be finally located.
See, the method is staightforward. All we need is patience and a good understanding of the source code in order to find the problem cause.

Friday, June 12, 2009

Direction+

I am working on a web app, Direction+, running on Google App Engine for Java. Yes, it is another direction service that suggests a driving route based on your chosen source and destination. But it has some add-on features as its name suggests: Direction Plus.

First, you can personalise Direction+ by providing your car's information, in particular, the fuel consumption measurement of your car, for urban cycle, extra-urban cycle and combined cycle. Thus, when calculating the route, Direction+ is also able to calculate how much fuel you will use and how much it costs (based on the fuel price you set). The fuel consumption calculation is measured on each individual step of the route. For instance, driving on a motor way will be much more fuel-efficient than driving in the city center. Isn't it good to know the cost of the driving before hand? In addition, the calculation is done in your own browser and your personalisation is stored in your browser as a cookie. So there is no privacy concern at all.

Second, Direction+ is a mashup that combines Google Maps Service and BBC travel news service together. When a driving route is calculated, the travel news, e. g., accidents news, along the route are displayed as well on the map. Isn't it good to know the traffic situation before driving? Currently too many travel news are returned especially when the route is very long, efforts are being made to improve the algorithm of filtering the not-so-relevant news.

How quick is Direction+ updated with BBC travel news? Please see its status page. A complete update takes about 30 minutes. For instance, at the time of writing, the last update took 26 minutes to finish. It read 2297 news from BBC, among which only 50 happened after the previous update, and among the 50 news that newly happened, 12 have cached coordinate in the data store. So though the number of news is large, Direction+ manages to reduce the communication to a small amount.

One of the tricky things about GAE for java is that every servlet request must be served within 30 seconds. So the update of news is implemented in an incremental and on-the-fly way:
  • Incremental: the complete update is divided into many small steps, and each step can be finished in a controlled time slot;
  • On-the-fly: the news updated in each small step is available immediately, long before the complete update is done.
Direction+ is designed for UK users because I live in UK. Enjoy it and let me know what you think!

Monday, June 08, 2009

Monitoring/Profiling/Debugging Java Process Remotely

Usually there are two difficulties involved:
  1. How to communicate between the remote (target) machine and the local (console) machine? This varies per monitoring/profiling/debugging method. E.g., the built-in JMX console (jconsole) uses remote RMI.
  2. How to penetrate the firewall if existing?
The generic approach is:
  1. Know how to do it locally;
  2. Run a vncserver on the remote machine (In fact, vncserver is lightweight, and easy to install in case there is no vncserver installed on the remote machine);
  3. Use ssh to set up a tunnel between the remote machine and the local machine, e.g., ssh -L5901:remote_machine:5901 userid@remote_machine (ssh is very user-friendly w.r.t. firewall);
  4. Start a vnc verview locally, pointing to localhost:1, and follow the procedure established in step 1.
In this way, remote monitoring/profiling/debugging tasks are doable following a step-by-step procedure.

Of course, sometimes, there is no need to set up a vncserver as long as the communication between the remote machine and the local machine is done in TCP and the port number is known.