| Abstract |
|
The Grid Datafarm (Gfarm) architecture is designed for
global petascale data-intensive computing. It provides a
global parallel filesystem with online petascale storage,
scalable I/O bandwidth, and scalable parallel processing,
and it can exploit local I/O in a grid of clusters with tens
of thousands of nodes. Gfarm parallel I/O APIs and
commands provide a single filesystem image and manipulate
filesystem metadata consistently. Fault tolerance and load
balancing are automatically managed by file duplication or
recomputation using a command history log. Preliminary
performance evaluation has shown scalable disk I/O and
network bandwidth on 64 nodes of the Presto III Athlon
cluster. The Gfarm parallel I/O write and read operations
has achieved data transfer rates of 1.74 GB/s and 1.97
GB/s, respectively, using 64 cluster nodes. The Gfarm parallel
file copy reached 443 MB/s with 23 parallel streams on the
Myrinet 2000. The Gfarm architecture is expected to enable
petascale data-intensive Grid computing with an I/O bandwidth
scales to the TB/s range and scalable computational
power.
|
Additional Information
|
Citation:
Osamu Tatebe, Youhei Morita, Satoshi Matsuoka, Noriyuki Soda, Satoshi Sekiguchi,
"Grid Datafarm Architecture for Petascale Data Intensive Computing,"
ccgrid,
p. 102,
2nd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID'02),
2002
|