Tuesday, November 04, 2008

Virtual DOM

(This is one of my old Google Notes. Virtual DOM is a small piece of software I implemented.)

Complying with the W3C DOM interface, Virtual DOM is capable of representing in memory a large XML data which can not be represented using other DOM implementations such as Xerces. Virtual DOM is aiming at allowing off-the-shelf XPath engines such as Jaxen to process XML DOM representation that is too large to be held in memory otherwise. To support such a goal, Virtual DOM must be told how to load DOM element, and then it uses SoftReference to cache the loaded element. To put it in a simple way, the Virtual DOM elements can be garbage collected when Java heap space becomes scarce, and later be reloaded on demand.

The problem of Xerces DOM implementation

Each node has references to its parent, children, and siblings. So as long as there is a single reference to any node of the DOM tree, any parts of the tree can not be garbage collected.

Implementation of Virtual DOM

Virtual DOM adopts a forest mode, where there is a virtual document and a virtual root element. The virtual root element has a number of child elements, which is implemented as CollectableElement extended from CollectableNode, each representing a different DOM document.

Each Virtual DOM node, called CollectableNode, has a reference to DocumentCache, which in turn has a SoftReference to the cached owning document. DocumentCache also has enough information to load and reload the owning document whenever necessary.

Each Virtual DOM node has information specifying how to locate this node from the root node of the owning document.

Each Virtual DOM node also has a SoftReference to its corresponding concrete DOM node for convenience purpose.

The CollectableElement that is the direct child of the virtual root element has an index indicating which child it is in terms of the virtual root element. Thus it is able to retrieve its next sibling.

All exported DOM references should be of Collectable* instead of the default DOM ones to prevent DOM node references from being held externally.

Test

With 64 MB heap, using a revised Jaxen 1.1 (one of Jaxen 1.1's methods unnecessarily retains object references.), Virtual DOM is able to deal with at least 250,000 instances of a certain XML document. In a comparison, Xerces DOM runs out of memory at 39,000 instances.

(It is called VirtualDOM in my Eclipse projects.)

No comments: