Abstract
The available Instruction Level Parallelism in Java bytecode (Java-ILP) is not readily exploitable due to dependencies involving stack operands. The sequentialization due to stack dependency can be overcome by identifying bytecode-traces, which are sequences of bytecode instructions that when executed leave the operand-stack in the same state as it was at the beginning of the sequence. Instructions from different bytecode-traces have no stack-operand dependency and hence can be executed in parallel on multiple operand-stacks. We propose a simultaneous multi-trace instruction-issue (SMTI) architecture for a processor that can issue instructions from multiple bytecode-traces to exploit Java-ILP. Extraction of bytecode-traces and nested bytecode folding are done in software during the method verification stage. SMTI combined with nested folding resulted in an average ILP speedup of 54% over the base in-order single-issue Java processor, when experimented with SPECjvm98, Scimark and Linpack benchmarks.