Abstract
Domain specific languages should support syntax that is comfortable for specialist users. We discuss the impact of the standard deterministic parsing techniques such as LALR(1) and LL(1) on the design of programming languages and the desirability of more flexible parsers in a development environment. We present a new bottom-up nondeterministic parsing algorithm (GRMLR) that combines a modified notion of reduction with a Tomita-style breadth-first search of parallel parsing stacks. We give experimental results for standard programming language grammars and LR(0), SLR(1) and LR(1) tables; the weaker tables generate significant amounts of nondeterminism. We show that GRMLR parsing corrects errors in the standard Tomita algorithm without incurring the performance overheads associated with other published solutions. We also demonstrate that the performance of GRMLR is upper-bounded by the performance of Tomita's algorithm, and that for one realistic language grammar GRMLR only needs to search around 74% of the nodes. Our heavily instrumented development version of the algorithm achieves parsing rates of around 4,000?10,000 tokens per second on a 400MHz Pentium II processor. Proof of correctness and details of our implementation are omitted here for space reasons but are available in an accompanying technical report.