First International Symposium on Cyber Worlds, 2002. Proceedings.
Download PDF

Abstract

Within the current decade. process technology is promising one billion transistors on a single die, operating at frequency of from 6 to 106Hz. As a direct result of the fundamental trends of increasing transistors density and switching speeds, newer technological and microarchitectural design constrains are introduced. Among them, wire delays will become critical. To take the benefit of the VLSI technology, we proposed Trident processor, which emphasizes on local communication. Like vector architectures, Trident processor extends a scalar core with parallel lanes; each lane contains an execution dalapath and a slice of register file. However, Trident processor uses ring and communication registers, which are based on local communication, to store and cyclically shift l-D data within and across the lanes, respectively. By using parallel datapaths, ring, and communication registers, Trident processor can effectively process not only vector but also matrix data. In this paper, the perfonnance of the Trident processor on singular value decomposition (SVD) algorithm is evaluated. On 500×500 input matrix, four lanes Trident processor significantly reduces the number of instructions (44 times less), loop overhead (30 times less), and load/store operations (3 times less) comparing with a scalar code. Moreover, Trident processor is scalable and its scalability needs only replicating lanes to process longer vectors or larger matrices (eight lanes can speedup SVD by 2.5 times over four lanes).
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!