Thursday, November 30, 2006

Google File System

The Google File System is a scalable distributed file system for large distributed data-intensive applications, being used within Google. The paper, published in the 19th ACM Symposium on Operating Systems Principles, is a well written computer engineering paper. System builders should read it.

Quoted from its conclusion:

"We started by reexamining traditional file system assumptions in light of our current and anticipated application workloads and technological environment. Our observations have led to radically different points in the design space. We treat component failures as the norm rather than the exception, optimize for huge files that are mostly appended to (perhaps concurrently) and then read (usually sequentially), and both extend and relax the standard file system interface to improve the overall system."

Lesson learnt:
  • Application driven.
  • Base on the technological environment. Base on the facts of performance characteristics. Performance benchmark is the foundation for system design.
  • Application/file system (or OS, or underlying infrastructure, etc.) co-design.
  • Optimize for what should be optimized.
"Our system provides fault tolerance ..."

Lesson learnt:
  • Reliability is a must for any system. System must work. Never forget reliability, persistence, fault tolerance during system design.
"... This makes possible a simple, centralized master that does not become a bottleneck. ..."

Lesson learnt:
  • Down to earth.
  • ...I had already come to the conclusion that in the practise of computing, where we have so much latitude for making a mess of it, mathematical elegance is not a dispensible luxury, but a matter of life and death. - Edsger Wybe Dijkstra ["My hopes of computing science" (EWD 709)]
elegant: [...]ingeniously simple and effective. - Concise Oxford Dictionary

This elegance makes perfect sense in the domain of system design. May we say in system design, elegance is not a dispensible luxury, but a matter of life and death.

Friday, November 17, 2006

A Korean Artificial Beauty

A authentic dream face based on popular Korean stars.

Thursday, November 16, 2006

A Very Interesting Story

Copied from Ian Foster's blog.


Hadoop on EC2

Here's something neat (and details here).

Hadoop, an open source clone of Google FS and MapReduce, can be run on top of Amazon EC2, a hosting service that allows leasing servers on an hourly basis.

As Greg Linden goes on to say:

Developers may now be able to rapidly bring up hundreds of servers, run a massive parallel computation on them using Hadoop's MapReduce implementation, and then shut down all the instances, all with low effort and at low cost. Very cool.

My colleague Tim Freeman points out that you can run those same VMs on your own resources using the Globus Workspace service.


I got a feeling that parallel computing becomes more and more available, and has better and better programmability.

Provenance of Life

We are moving. Our current landlord claims the carpet is "new". Although it does not look like a new one at all, and I remember it was not very clean when I moved in one year ago, we do not have any evidence to support our point. The landlord also blames us that we did not report problems to them promptly. Having consulted the university accommodation officer, we now learn that we had better write to them and keep a photocopy of the letter.

The evidence matters. The evidence could be pictures, receipts, letters, or anything concrete, not transient, but retrievable. Furthermore, the evidences should not be isolated. They should be logically linked together, and finally lead to some conclusion. Put it in another way, when we see an event in the life, we would like to "see" the complete process that leads to this event. By "see" we really mean to reconstruct it in a convincible way.

My boss is investigating a "provenance" project, which defines "the provenance of a piece of data is the process that led to that piece of data". The aim of the project is to "to conceive a computer-based representation of provenance that allows us to perform useful analysis and reasoning ...".

I argue we also need provenance support in our lives. Important facts should be documented in a retrievable and searchable way, for instance, in a computer-based way. We should keep recording provenance of life.

Google advocates searching instead of organizing. Ideally, as long as we record the provenance of life in our computer or on the internet, there should be a way to query and retrieve it.

I am interested in all techniques to improve productivity. It is great that some software could help to record the provenance of life an query over it, which will definitely improve our productivity of life.

Saturday, November 11, 2006

Wednesday, November 08, 2006

The Outcome of MPT

Here is a visualized outcome of MPT. The x-axis is the number of processors used in the throuthput test (1 - 32). The y-axis is the number of threads per processor (1 - 16). The z-axis is the measured throughput. More precisely, it is the number of service discovery (by name) requests served in 30 minutes when there are 5000 service descriptions registered in GRIMOIRES. When 32 processors and 16 threads per processor are used, GRIMOIRES can serve 62,877 service discovery requests in 30 minutes , i.e., 34.9 per second. Each request costs 28.6 milliseconds.

Tuesday, November 07, 2006


星期六一早起床, 阳光明媚, 万物复苏. 乘机定下今后N年的旅游大计. 以求心中有数, 目标明确.


(一) 回顾过去, 成绩很大.

香港, 东方之珠, 生活5年, 非常好;
倫敦, 逝去帝国的首都, 非常好;
巴黎, 文化艺术之都, 非常好;
瑞士, 美丽的高山, 湖泊, 非常好;
芝家哥, the mile of magnificence, 不错;
加拿大BC, 温哥华, 维多利亚, 不错, 从温哥华到维多利亚的水路很美;
洛山基, 听说治安很差, 硬是没敢进downtown;
胜地呀哥, 还是不错的, 还看到了航母.

(二) 面对现实, 压力很大.

这里又要谈两点, 一如何在时间上创造机会, 二是如何在经济上开源节流. 以后有时间再版详细谈.

(三) 展望将来, 希望很大.

埃及, 非常想去看看古埃及的伟大;
日本, 又爱又恨, 要去走走;
台湾, 另类中国, 要去看看;
纽约, 要去;
拉丝围家四, 要去;
欧洲, 那是要经常过去看看的, 意大利, 西班牙, 德国, 北欧, ..., 法国和瑞士也要再去;
非洲, 南美, 大洋洲, 有机会都要去踩上几个脚印.

The Reasons to Have a Blog

According to Wikipedia, the term "blog" is a contraction of "web log". From its original name, blog can be inferred to have two characteristics that correspond to the two purposes for me to blog.
  • Blog is a log that is organized as a chronicle. I can use blog to track events in my life and to record my thoughts. In this sense, the blog is just like a diary.
  • Blog is web based thus it is intended to be read by people. Blog can be a good communication tool to present the blogger. Blog is actually a bilateral communication tool because it allows other to comment.

Thursday, November 02, 2006

Monitor the Blog and Benchmark the Program

I am using Google Analytics to monitor visits to my blog. See the picture, in which the dots indicates where somebody has visited my blog. There is even one returning visitor. Not bad!

It is a good practice to start monitoring since the website is established. Just like it is a good practice to start benchmarking since the program is prototyped.

Monitoring is an inherent requirement of a website. Thus it should become a part of website infrastructure. I.e., when you establish a website, your website is automatically being monitored. No webmaster effort is involved.

Just like benchmarking is an inherent requirement of a program. Benchmarking should become a part of development environment. When you prototype a program, it costs you zero or little effort to benchmark the program. Some benchmark specific code should be automatically added into your program. Thus according to the benchmark data, you can make a decision on whether to take a certain refactoring or not. Will Aspect-Oriented Programming help on this?

Wednesday, November 01, 2006

Use VMware to Release Server-Side Software

I am using WMware's free VMware Server to create a GRIMOIRES virtual appliance. I admit it is not very easy to install GRIMOIRES. And I believe it is not easy to install any server-side software, probably because of the tedious and error prone procedure to configure, for instance, backend database, and security.

While VMware is able to relieve this pain. We create a VMware virtual appliance, which includes OS and our software. Our customer simply downloads it, and is able to play it, thus avoiding the painful installation procedure. The size of the VMware virtual appliance could be much much bigger than the size of our software. But who cares. Our computer spends more time in downloading, but we spend much less time in installation.