Proceedings. 2005 Australian Software Engineering Conference
Download PDF

Abstract

The ability to parse source code, in order to extract information from it, is an essential element of software engineering research and practice. However, serious practical difficulties often arise from factors such as ambiguities in standard grammars and inflexible parsing tools. For example, substantial changes to standard grammars may be needed in order to meet the limitations of parsing engines; this threatens the accuracy, completeness and consistency of the information available. Good parsing on its own is not enough: an inadequate semantic model limits the information available to tool builders. In our work, we have developed an approach to parsing and semantic modelling which addresses issues such as these. Our approach is based on a more flexible LR parser generator which includes the use of Generalised LR (GLR) parsing to accommodate ambiguous grammars. This allows us to decouple syntactic and semantic analysis. In this paper, we present our parser generator, yakyacc, and our semantic model for Java, JST, and discuss the benefits of their use in software engineering research. The resulting parsers may be used in a variety of contexts, either as the basis for integrated applications or as components in an application pipeline containing a number of other tools. We illustrate the benefits of our approach with representative examples from two of our current software engineering research projects.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles