parboiled 0.9.8 released

Posted by Mathias in [parboiled]

12 Aug 2010

Just about a month after the last release parboiled 0.9.8 has hit its github download page today.
It is once again a major step forward, with quite a few changes and additions, some of them being significant enough to accompany the release with a small series of blog posts. They should be of interest to both, people currently using previous parboiled releases and developers looking into parboiled for the first time.

Apart from a few bug fixes and legacy cleanups (some of them being breaking changes!) that you can read about in the change log parboiled 0.9.8 introduces three major novelties:

  • A Scala Facade
  • The Value Stack
  • The ProfilingParseRunner

The first point, the Scala facade, is such an important addition that it merits a special post just for itself (which I should get to in the next few days). From now on parboiled will be available in two editions, one for Java and one for Scala. Both contain the exact same parsing engine but differ in the “frontend”, the internal DSL you build your parser rules with.
For Java users the existing Java DSL hasn’t changed. You still define your parser rules via methods returning Rule instances defined in a BaseParser-derived class. In most cases your rule methods consist of nothing more than one rule building expression optionally containing boolean expressions, which are being converted into parser actions during parser extension.
Scala developers, however, now have the option of leveraging the multiple powers of their source language of choice by constructing their parser rules with parboileds brand new “Scala facade”. More on this in the next post.

The second point is the most radical novelty that existing parboiled users have to migrate through, should they decide to upgrade to v0.9.8. It constitutes a rather fundamental change in the programming model underlying parboiled parser actions.
In previous parboiled versions the construction of custom objects (e.g. AST nodes) during a parsing run was heavily centered around the parse tree. In version 0.9.6 and before you had to take the “parboiled way” of decorating parse tree nodes if you wanted to create your own custom object structure. Even though this worked and had certain advantages it also had a number of drawbacks, the biggest one being the time and memory cost for building an otherwise unneeded parse tree structure, often containing many thousand nodes.
In an attempt to reap additional performance increases parboiled 0.9.7 introduced the option for “parse-tree-less parsing” and action variables, however, without going the whole way and properly embracing a completely new concept for handling custom value objects during a parsing run.
In version 0.9.8 this is now corrected with the introduction of the value stack. The value stack serves as a “work bench” for your parser actions that is independent of the parse tree. Building a parse tree has now become completely optional and is in fact not the default behavior of a parboiled parser anymore (in 0.9.7 and before you had to switch off parse tree building, now you have to switch it on).
The introduction of the value stack comes with the following benefits:

  1. A unified action programming model.
    No separation between “parse-tree-based” and “parse-tree-less” actions anymore.
  2. Simplification and improved readability of action expressions.
    No more UP2(value().set(new Node(value(), DOWN(value("op")))) constructs.
  3. Less coupling between parser actions and rule structure. Before you had to know about “rule levels” to be able to properly address ancestor rules. Now you don’t anymore. Among other things this enables less verbose rule construction. For example you can now say
    Optional(A(), B(), C()) instead of Optional(Sequence(A(), B(), C())).
  4. Less need for the use of action variables. Although these are still available and do serve a clear purpose they are not needed anymore just for temporarily storing the value objects of “unreachable” nodes in a former parse-tree-less setting.
  5. The foundation for value handling in the Scala facade. Even though this is not something a parboiled for Java user immediately benefits from a few of the primary features of the Scala facade would not have been possible in the old, parse-tree-centric world.

Of course, as everything, the value stack also brings with it a few challenges of its own. In parboiled for Java it is sometimes a bit more difficult now to find the origin of an error in your parser if one of your rules does not adhere to the “stable behavior” principle as discussed in the documentation chapter about Working with the Value Stack.
However, in total the introduction of the value stack hopefully moves parboiled further down the path towards becoming a serious contestant for the crown of parsing frameworks on the JVM… :^)

Coming to an end I quickly want to add a bit more detail to the last point of my initial list of important novelties in parboiled 0.9.8: The ProfilingParserRunner. This new ParseRunner implementation delivers a detailed analysis of a parboiled parsers runtime behavior and can provide important insights as to where a grammars “hot spots”, with regard to parsing performance, lie. Additionally it allows for a good approximation of the expected benefits of “packrat” memoization, something that parboiled currently does not (yet) support. Some of my findings in this arena also deserve their own post that I hope to get to in the not too distant future.

So long, I hope you enjoy working with parboiled 0.9.8 …


View Comments