Wednesday, March 31, 2010

Two Useful Eclipse Plugins

FindBugs uses static analysis to look for bugs in Java code.

ANTLR IDE provides support for ANTLR parser generator.

Saturday, February 06, 2010

1<<32 = ?

Let 1 be a 32 bit integer in Java, what should 1<<32 be?

0? Because the "1" bit will be shifted to the far left?

In fact, it is 1. See in the Java programming spec, "If the promoted type of the left-hand operand is int, only the five lowest-order bits of the right-hand operand are used as the shift distance. It is as if the right-hand operand were subjected to a bitwise logical AND operator & with the mask value 0x1f. The shift distance actually used is therefore always in the range 0 to 31, inclusive." So << can be considered as a cyclic left shift operation, the bits shifted on the left end will reappear on the right end.

Monday, January 18, 2010

Ways of Implementing DSL

DSL, domain-specific language, or simply our very own language intended to solve our own specific problem, may not be limited to a programming or scripting language, it can also be, for instance, a file format, or a network protocol. In any case, the language has a syntax which defines the legal vocabulary in the language, and a grammar which defines rules of composing words in the vocabulary to meaningful sentences and paragraphs. So in order to understand the language and to take the right action based on the language, what we need are a lexer that translates language stream into token stream, and a parser that maps tokens to the grammar rules.

There are many ways of implementing DSL.

First, if the language is really simple, we can just code the lexer and parser manually. I.e., manually separate character stream into tokens, and manually construct an LL recursive descent parser. Obviously this method won't work well if the language becomes complex.

Second, make the DSL an XML language. For instance, BPEL is a web service workflow language. Though BPEL is nothing more than an XML language, it surely can fulfill complex tasks. The advantages of making DSL as XML are: first we can use XML schema (XSD) to design the language, so we can also use XML schema validation to validate if something is valid according to the schema; second there are mature and popular tools for validating and parsing XML possibly in every progamming languages.

Third, we can go to the conventional way to design a complex language, i.e., to use a parser generator, such as Antlr. Antlr is implemented in Java, but it can be used to generate code in many other languages than Java, such as C++, Python, C# ... One of the reasons for Antlr is quite popular is that Antlr has a very sophisticated tooling for designing, visualising and even debugging grammar: AntlrWorks, which is quite impressive.

We can even combine the above two methods together when designing a DSL. For instance, XSLT, XSL Transformation, is an XML-based language for transforming XML documents into other documents. XSLT makes use of XPath, which can be implemented using Antlr.

Fourth, nowadays, modern dynamic languages like Ruby and Groovy can be used to implement many DSLs. Because such DSLs will live inside Groovy, they have to obey Groovy's grammar rules. Compared to languages implemented using Antlr, their flexibility is somehow restricted. See here for an introduction to designing DSL with Groovy.

Sunday, January 10, 2010

Protocol Verification Using Tcpdump and Wireshark

It is a must to use Tcpdump (to capture) and/or Wireshark (to analyse) to verify protocol implementation when doing TCP/IP network programming.

First, it can verify if the message sent by me is as intended, thus it can expose any bug in my protocol implementation. Today a bug has been found in this sense: I use Java DataOutputStream.write(int) to send a short, so only the lowest byte of the short has been sent.

Second, it can detect any inconsistency between the protocol specification, i.e., what I think the message should look like, and the actual messages that are sent by the other party, i.e., out of my control. Today, several inconsistencies have been found in this way. For instance, the length of a message (a TCP packet) is different from that defined in the spec. Clearly the spec in my hand is out of dated.

Without Tcpdump and Wireshark, I would be like walking in the dark when implementing my protocol.