Wednesday, December 17, 2008

Parallel Array Expected in Java SE7

Two Year in Review articles on JavaWorld, "Java in 2008" and "What to expect in Java SE7", are worth reading.

Among the new features expected in SE7, which is due on early 2010, the most surprising one to me is the parallel processing support. Though it is also the one I am mostly looking forward to since I get my PhD working on the topic of parallel computing and cluster computing in Java.

The parallel processing support not only provides a fork and join computing paradigm that might be tuned into a MapReduce mode, but also supports parallel array. Parallel array is a long existing feature in parallel computing languages such as High Performance Fortran. Inspired by HPF, HPJava supports parallel array as well.

The idea of parallel array is quite simple. It represents a large-sized data array, and the array is partitioned to many parts, with each part allocated to a single process. In such a way, each process can work on its own partition of the large array in a parallel way with regard to others, thus achieving speedup.

However, it is still under consideration whether the parallel array will be included as part of the JDK in Java SE 7 or whether it will be released as an external library.

A parallel array code example looks like:
// Instantiate the ForkJoinPool with a concurrency level
ForkJoinPool fj = new ForkJoinPool(16);
Donut[] data = ...
ParallelArray donuts = new ParallelArray(fj, data);

// Filter
Ops.Predicate hasSprinkles = new Ops.Predicate() {
public boolean op(Donut donut) {
return donut.hasSprinkles();
}
};

// Map from Donut -> Integer
Ops.Predicate daysOld = new Ops.ObjectToInt() {
public int op(Donut donut) {
return donut.age();
}
};

SummaryStatistics summary =
orders.withFilter(hasSprinkles)
.withMapping(daysOld)
.summary();

System.out.println("with sprinkles: " + summary.size());
System.out.println("avg age: " + summary.average());

No comments: