Abstract
This paper is concerned with the transparent parallelization of declarative database queries, based on theoretical principles. We have designed an entire database architecture suitable for use on any general-purpose parallel machine. This architecture addresses the shortcomings in flexibility and scalability of commercial parallel databases. A substantial benefit is that the mathematical principles underlying our framework allow provably correct parallel evaluations and optimizations, using compile-time transformations. We address parallelism in a language-independent way through the choice of monoids as a formulation for expressing and modeling queries. Queries expressed in our declarative language are transformed into applications of a higher-order function, the monoid homomorphism. The evaluation of this function is partitioned at run-time, giving a tree-like processor topology, the depth and breadth of which can be varied with a declarative execution plan. Leaf nodes within the evaluation tree operate on their own data partitions and forward results to the appropriate interior nodes. Due to the nature of our language, the functions that are necessary to combine results from independent parallel evaluations are generated automatically at compile-time from a monoid definition dictionary, additions to which can be made to extend the system's data types. We have built a complete prototype of our system, which uses Swiss Radio Corporation's entire recorded music catalogue, on a general-purpose AP1000, 128-cell parallel computer at the IFPC.