Abstract
This paper presents a technique that exploits close collaboration between the compiler and the speculative multi-threaded hardware to explore aggressive optimizations and parallelization for scalar programs. The compiler aggressively optimizes the frequently executed code in user programs by predicting an execution path or the values of long-latency instructions. Based on the predicted hot execution path, the compiler forms regions of greatly simplified data and control flow graphs and then performs aggressive optimizations on the formed regions. Thread level speculation (TLS) helps expose program parallelism and guarantees program correctness when the prediction is incorrect. With the collaboration of compilers and speculative multi-threaded support, the program performance can be significantly improved. The preliminary results with simple trace regions demonstrate that the performance gain on dynamic compiler schedule cycles can be 33% for some benchmark and about 10%, on the average, for all the eight SpecInt95 benchmarks. For SpecInt2k, the performance gain is up to 23% with the conservative execution model. With a cycle accurate simulator with the conservative execution model, the overall performance gain by considering runtime factors (e.g., cache misses and branch misprediction) for vortex and m88ksim is 12% and 14.7%, respectively. The performance gain can be higher with more sophisticated region formation and region-based optimizations.