Abstract
Parallel computing has been in the spotlight with the advent of multi-core computers. The popular multithreading model does not scale very well when there are hundreds or thousands of cores, since it can only help exploit coarse-grained parallelism. There exist a lot of fine-grained parallelism to be exploited in I/O tasks and memory accesses during execution of a thread. Our Counter-Amdahl's Law tells us that it is more effective to parallelize the serial fraction of a parallel algorithm rather than the parallelized fraction in order to maximize the speedup. In this paper, we have proposed a Virtual Aggregated Processor that is aiming at speeding up execution of a thread through exploiting the fine-grained parallelism in I/O tasks and memory accesses. We have proposed and implemented two techniques, helper thread and I/O specialization, to demonstrate the potential effectiveness of the Virtual Aggregated Processor technology.