|
Published Articles >> Table of Contents >> Abstract
18th International Parallel and Distributed Processing Symposium (IPDPS'04) - Workshop 3
p. 149a
A High-Performance and Energy-Efficient Architecture for Floating-Point Based LU Decomposition on FPGAs
Gokul Govindu, University of Southeren California
Seonil Choi, University of Southeren California
Viktor Prasanna, University of Southeren California
Vikash Daga, Satyam Computer Services Ltd.
Sridhar Gangadharpalli, Satyam Computer Services Ltd.
V. Sridhar, Satyam Computer Services Ltd.
Full Article Text:
 
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/IPDPS.2004.1303134
Send link to a friend
| Abstract |
|
In this paper, we first develop a novel architecture for fixed-point LU decomposition of streaming input matrices, on FPGAs. Our architecture, based on a circular linear array, achieves the minimal latency and is resource-efficient. We then extend it, by using a stacked matrices approach, to a floating-point based architecture which achieves the minimal effective latency. Our design objective was to develop high-throughput and energy-efficient architectures for applications which require computing LU decomposition. We analyze (1) the impact of high-throughput, pipelined floating-point units (with different depths of pipelining and different performance) on the architectures performance, and (2) the impact of algorithm level design on the system-wide energy dissipation. We analyze the energy dissipation by capturing algorithm and architectural details of the target FPGA device. We analyze and compare our architecture with a state-of-art architecture implemented on FPGAs with respect to latency, area and energy. Our designs achieve a 10%-60% reduction in energy over that of the state-of-art architecture.
|
Additional Information
|
Citation:
Gokul Govindu, Seonil Choi, Viktor Prasanna, Vikash Daga, Sridhar Gangadharpalli, V. Sridhar,
"A High-Performance and Energy-Efficient Architecture for Floating-Point Based LU Decomposition on FPGAs,"
ipdps,
p. 149a,
18th International Parallel and Distributed Processing Symposium (IPDPS'04) - Workshop 3,
2004
|
|