<?xml version="1.0" encoding="ISO-8859-1"?>
<?xml-stylesheet href="/css/rss20.xsl" type="text/xsl"?>
<rss xmlns:pheedo="http://www.pheedo.com/namespace/pheedo" version="2.0">
	<channel>
		<title>IEEE Transactions on Computers</title>
		<link>http://www.computer.org/tc</link>
		<description>The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field. It publishes papers, brief contributions, and comments on research in areas of current interest to the readers. These areas include, but are not limited to, the following: a) computer organizations and architectures; b) operating systems, software systems, and communication protocols; c) real-time systems and embedded systems; d) digital devices, computer components, and interconnection networks; e) specification, design, prototyping, and testing methods and tools; f) performance, fault tolerance, reliability, security, and testability;
g) case studies and experimental and theoretical evaluations; and h) new and important applications and trends.	</description>
		<language>en-us</language>
		<pubDate>Tue, 7 Oct 2008 10:00:03 GMT</pubDate>
		<image>
			<url>http://csdl.computer.org/common/images/logos/tc.gif</url>
			<title>IEEE Computer Society</title>
			<description>List of recently published journal articles</description>
			<link>http://www.computer.org/tc</link>
		</image>
		<item>
			<title>PrePrint: Modeling Detection Latency with Collaborative Mobile Sensing Architecture</title>
			<link>http://www.pheedo.com/click.phdo?i=777e1b0f3b6b43b9f86d75f341b70342</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.189</pheedo:origLink>
			<description>Detection latency, which is defined as the time from the target arrival to the time of the first detection, is an important metric for the performance of sensor networks carrying out target detection, especially when the target is malicious or hostile. It characterizes the efficiency of detecting the presence of a target in a region of interest. Traditionally, stationary sensor networks are used to perform such sensing tasks. Consequently, nearly all research literature for the target detection problem has focused on stationary sensor networks. This paper addresses the problem of detecting the presence/absence of a target using a mobile sensor network. An analytic method is proposed to model the detection latency based on a collaborative sensing architecture. Detection latency for different node mobilities are presented. The accuracy of the analytic model is verified by simulations. The paper also compares the performance of mobile and stationary sensor networks. The comparison shows if the target is present at the worst possible location in a given deployment, then detection latency of mobile sensor networks is considerably shorter as compared to that of stationary networks with the same number of nodes.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=777e1b0f3b6b43b9f86d75f341b70342&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=777e1b0f3b6b43b9f86d75f341b70342&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.189</guid>
		</item>
		<item>
			<title>IEEE Transactions on Computers - November 2008 (Vol. 57, No. 11)</title>
			<link>http://www.pheedo.com/click.phdo?i=b9b0cbeb664e1cc2fdb4b8c7691d9398</link>
			<pheedo:origLink>http://opac.ieeecomputersociety.org/opac?year=2008&amp;amp;volume=57&amp;amp;issue=11&amp;amp;acronym=tc</pheedo:origLink>
			<description>IEEE Transactions on Computers&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=b9b0cbeb664e1cc2fdb4b8c7691d9398&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=b9b0cbeb664e1cc2fdb4b8c7691d9398&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://www.computer.org/portal/site/tc/</guid>
		</item>
		<item>
			<title>PrePrint: An Improved Search Method for Accumulator-Based Test Set Embedding</title>
			<link>http://www.pheedo.com/click.phdo?i=23ae89f21b8be72a10ab92a800812572</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.182</pheedo:origLink>
			<description>In this paper we present a new search method for test set embedding using an accumulator driven with an additive constant C. We formulate the problem of finding the location of a test pattern in the generated sequence in terms of a linear Diophantine equation with two variables, which is known to be solved quickly in linear time. We show that only one Diophantine equation needs to be solved per test set irrespective of its size. Next we show how to find the starting state, for a given constant C and test set T, such that the generated sequence can reproduce T with minimum length. Finally, we show that the best constant Copt (in terms of shortest test length) for the embedding of T using an accumulator of size n can be found in O(2n+F|T|) steps, instead of O(n(2^n)|T|) steps of a previous approach, where F depends on the particular test set and can be significantly smaller than its worst case value of 2^(n-2). The value of F can also be further reduced while providing a guaranteed approximation bound of the shortest test length. Experimental results show the computational improvements.&lt;br style=&quot;clear: both;&quot;/&gt;
      &lt;a href=&quot;http://www.pheedo.com/feeds/ht.php?t=c&amp;amp;i=23ae89f21b8be72a10ab92a800812572&quot;&gt;&lt;img src=&quot;http://www.pheedo.com/feeds/ht.php?t=v&amp;amp;i=23ae89f21b8be72a10ab92a800812572&quot; border=&quot;0&quot; /&gt;&lt;/a&gt;
  &lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=23ae89f21b8be72a10ab92a800812572&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.182</guid>
		</item>
		<item>
			<title>PrePrint: Efficient Multidimensional Packet Classification with Fast Updates</title>
			<link>http://www.pheedo.com/click.phdo?i=b7f2972758425f6ab3728f109e5dc98d</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.181</pheedo:origLink>
			<description>Packet classification has continued to be an important research topic for high-speed routers in recent years. In this paper, we propose a new packet classification scheme based on the binary range and prefix searches. The basic data structure of the proposed packet classification scheme for multi-dimensional rule tables is a hierarchical list of sorted ranges and prefixes that allows the binary search to be performed on the list at each level to find the best matched rule. We also propose a set of heuristics to further improve the performance of the proposed algorithm. We test our schemes by using rule tables of various sizes generated by ClassBench and compare them with the existing schemes, EGT, EGT-PC, and HyperCuts. The performance results show that in a test using a two-dimensional segmentation table, the proposed scheme not only performs better than the EGT, EGT-PC, and HyperCuts in classification speed and memory usage, but also achieves faster table update operations that are not supported in the existing schemes.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=b7f2972758425f6ab3728f109e5dc98d&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=b7f2972758425f6ab3728f109e5dc98d&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.181</guid>
		</item>
		<item>
			<title>PrePrint: A Homogeneous Architecture for Power Policy Integration in Operating Systems</title>
			<link>http://www.pheedo.com/click.phdo?i=350bb445033700134eba61619c68074f</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.180</pheedo:origLink>
			<description>A significant volume of research has concentrated on operating system-directed power management. The primary focus of previous research has been the development of better policies. In this paper, we provide evidence that one policy may outperform another under different conditions. Hence, it is difficult, or even impossible, to design the "best" policy for all computers. We explain how to select the best policies at run-time without user or administrator intervention by using a software framework called the Homogeneous Architecture for Power Policy Integration (HAPPI). This architecture is portable across different platforms running Linux. HAPPI specifies common requirements for policies and provides an interface to simplify the implementation of policies in a commodity OS. Our approach allows these policies to be compared simultaneously to select the best policy among a set of distinct policies at run-time. Experimental results indicate that HAPPI achieves energy savings within 4 percent of the best individual policy for each device in several computing systems without a priori knowledge of workloads.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=350bb445033700134eba61619c68074f&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=350bb445033700134eba61619c68074f&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.180</guid>
		</item>
		<item>
			<title>PrePrint: Error Correcting Codes for Ternary Content Addressable Memories</title>
			<link>http://www.pheedo.com/click.phdo?i=ab963a35ed9cc1cdc85276e6404eabd1</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.179</pheedo:origLink>
			<description>As VLSI silicon technology continues its relentless advance and memory densities increase, the problem of soft errors&amp;#x2014;bit upsets caused by alpha particles or neutron hits&amp;#x2014;demands solutions. Error Correcting Codes (ECCs) are routinely used on Random Access Memories (RAMs) to increase soft error tolerance&amp;#x2014;codewords (error correcting code bits concatenated to the data) are written to and read from memory, and the read codeword is decoded to correct errors. Content addressable memories (CAMs) also demand error mitigation measures. The method employed for RAMs is also applicable to CAMs: the match-line sense amplifier is modified to function as a comparator [1], codewords are stored and searched for. We investigate the extension of this method to TCAMs. Ternary CAMs (TCAMs) cannot employ the efficient ECCs (known as linear block codes&amp;#x2014;LBCs) used with RAMs and CAMs. We develop the error correcting codes necessary to implement error-resilient TCAMs.We prove that the rate (ratio of data bits to total number of bits in the codeword) of the specialized ECCs necessary for TCAMs cannot exceed 1/t, where t is the number of bit errors the code can correct (in contrast, LBCs asymptotically have rate one); simple majority codes are the best. Keywords: ECCs, error, correcting, codes, TCAMs, LBCs.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=ab963a35ed9cc1cdc85276e6404eabd1&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=ab963a35ed9cc1cdc85276e6404eabd1&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.179</guid>
		</item>
		<item>
			<title>PrePrint: Memory-MISER: Improving Main Memory Energy Efficiency in Servers</title>
			<link>http://www.pheedo.com/click.phdo?i=138f557380dbc10bd60c63709a248f22</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.177</pheedo:origLink>
			<description>Main memory power in volume and mid-range servers is growing as a fraction of total system power. The resulting energy consumption increases system cost and the heat produced reduces reliability. Emergent memory technology will provide systems with the ability to dynamically turn-on (online) and turn-off (offline) memory devices at runtime. This technology, coupled with slack in memory demand, offers the potential for significant energy savings in servers. However, to gain general acceptance in the server community, power-aware techniques must maintain performance and scale to thousands of memory devices. We propose a Memory Management Infra-Structure for Energy Reduction (Memory MISER) that is transparent, performance-neutral, and scalable. Memory MISER provides: 1) a prototype Linux kernel that manages memory at device granularity, and 2) a userspace daemon that tracks systemic memory demand and implements energy- and performance-constrained device controller policies. Experiments on an 8-node cluster of servers show our Memory MISER conserves memory energy up to 56.8% with no performance degradation for scientific codes that utilized the entire cluster. For multi-user workloads, we achieved memory energy savings of up to 67.94% with no performance degradation. Normalizing to total system energy consumption, our power-aware memory approach reduced energy between 18.81% and 39.02%.&lt;br style=&quot;clear: both;&quot;/&gt;
      &lt;a href=&quot;http://www.pheedo.com/feeds/ht.php?t=c&amp;amp;i=138f557380dbc10bd60c63709a248f22&quot;&gt;&lt;img src=&quot;http://www.pheedo.com/feeds/ht.php?t=v&amp;amp;i=138f557380dbc10bd60c63709a248f22&quot; border=&quot;0&quot; /&gt;&lt;/a&gt;
  &lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=138f557380dbc10bd60c63709a248f22&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.177</guid>
		</item>
		<item>
			<title>PrePrint: Fair Round Robin: A Low Complexity Packet Schduler with Proportional and Worst-Case Fairness</title>
			<link>http://www.pheedo.com/click.phdo?i=a64af2e6825a085d7d807920b7226657</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.176</pheedo:origLink>
			<description>Round robin based packet schedulers generally have a low complexity and provide long-term fairness. The main limitation of such schemes is that they do not support short-term fairness. In this paper, we propose a new low complexity round robin scheduler, called Fair Round Robin (FRR), that overcomes this limitation. FRR has similar complexity and long-term fairness properties as the stratified round robin scheduler, a recently proposed scheme that arguably provides the best quality-of-service properties among all existing round robin based low complexity packet schedulers. FRR offers better short-term fairness than stratified round robin and other existing round robin schedulers.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=a64af2e6825a085d7d807920b7226657&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=a64af2e6825a085d7d807920b7226657&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.176</guid>
		</item>
		<item>
			<title>PrePrint: Generalized Elastic Scheduling for Real-Time Tasks</title>
			<link>http://www.pheedo.com/click.phdo?i=2d6ac4445f903834815dcd2742d84158</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.175</pheedo:origLink>
			<description>The elastic task model is a powerful model for adapting periodic real-time systems in the presence of uncertainty. This work generalizes the existing elastic scheduling approach in several directions. First, it presents a general framework, which formulates a trade-off between task schedulability and a specific performance metric as an optimization problem. Such a framework allows real-time systems under overloads to graciously adapt by adjusting their performance level. Second, it is shown in this work that the well-known task compression algorithm in fact solves a quadratic programming problem that seeks to minimize the sum of the squared deviation of a task's utilization from initial desired utilization. This finding indicates that the task compression algorithm may be applied to efficiently solve other similar types of problems that often arise in real-time applications. In particular, an iterative approach is proposed to solve the period selection problem for real-time tasks with deadlines less than respective periods. Further, the framework is adapted to solve the deadline selection problem, which is useful in some control systems with fixed periods.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=2d6ac4445f903834815dcd2742d84158&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=2d6ac4445f903834815dcd2742d84158&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.175</guid>
		</item>
		<item>
			<title>PrePrint: Reducing Area Overhead for Error-Protecting Large L2/L3 Caches</title>
			<link>http://www.pheedo.com/click.phdo?i=e46080483dcc7c84be402cab9f37f086</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.174</pheedo:origLink>
			<description>Due to increasing concern about various errors, current processors adopt error protection mechanisms for their on-chip components. Especially, protecting caches in current processors incur as much as 12.5% area overhead due to error correcting codes. Considering large L2/L3 caches employed in current high-performance processors, the area overhead is very high and consumes a large number of on-chip transistors. As an attempt to reduce that overhead, this paper proposes an area-efficient error protection architecture for large L2/L3 caches. First, it selectively applies ECC (Error Correcting Code) to only dirty cache lines and other clean cache lines are protected by using simple parity check codes. Second, the dirty cache lines are periodically cleaned by exploiting the generational behavior of cache lines in order not to increase traffic to off-chip main memory. Experimental results show that the cleaning technique effectively reduces the number of dirty cache lines per cycle. The ECCs of the reduced dirty cache lines can be confined in a small ECC array or ECC cache. Our proposed error-protection architecture has been shown to reduce the area overhead of a 1MB L2 cache for error protection by 59% with less than 1% performance degradation, on the average, using SPEC2000 benchmarks running on a typical four-issue superscalar processor.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=e46080483dcc7c84be402cab9f37f086&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=e46080483dcc7c84be402cab9f37f086&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.174</guid>
		</item>
		<item>
			<title>PrePrint: Correction to: Reduced Length Checking Sequences</title>
			<link>http://www.pheedo.com/click.phdo?i=ac6e9e8f074eaa0b5ac2e4a66f299824</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.173</pheedo:origLink>
			<description>This paper describes corrections to a previous paper, Reduced Length Checking Sequecnes, that appeared in IEEE Transactions on Computers in 2002 (51 9, pp.1111-1117).&lt;br style=&quot;clear: both;&quot;/&gt;
      &lt;a href=&quot;http://www.pheedo.com/feeds/ht.php?t=c&amp;amp;i=ac6e9e8f074eaa0b5ac2e4a66f299824&quot;&gt;&lt;img src=&quot;http://www.pheedo.com/feeds/ht.php?t=v&amp;amp;i=ac6e9e8f074eaa0b5ac2e4a66f299824&quot; border=&quot;0&quot; /&gt;&lt;/a&gt;
  &lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=ac6e9e8f074eaa0b5ac2e4a66f299824&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.173</guid>
		</item>
		<item>
			<title>PrePrint: On the Design of Fault-Tolerant Scheduling Strategies Using Primary-Backup Approach for Computational Grids with Low Replication Costs</title>
			<link>http://www.pheedo.com/click.phdo?i=d4d542305b046fd30b3c6e55c5063b64</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.172</pheedo:origLink>
			<description>Fault-tolerant scheduling is an imperative step for large-scale computational Grid systems, as often geographically distributed nodes co-operate to execute a task. By and large, primary-backup approach is a common methodology used for fault tolerance wherein each task has a primary copy and a backup copy on two different processors. In this paper, we identify two cases that may happen when scheduling dependent tasks with primary-backup approach. We derive two important constraints that must be satisfied. Further, we show that these two constraints play a crucial role in limiting the schedulability and overloading efficiency of backups of dependent tasks. We then propose two strategies to improve schedulability and overloading efficiency, respectively. We propose two algorithms (MRC-ECT and MCT-LRC), to schedule backups of independent jobs and dependent jobs, respectively. MRC-ECT is shown to guarantee an optimal backup schedule in terms of replication cost for an independent task, while MCT-LRC can schedule a backup of a dependent task with minimum completion time and less replication cost. We conduct extensive simulation experiments to quantify the performance of the proposed algorithms.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=d4d542305b046fd30b3c6e55c5063b64&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=d4d542305b046fd30b3c6e55c5063b64&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.172</guid>
		</item>
		<item>
			<title>PrePrint: On Spatial Orders and Location Codes</title>
			<link>http://www.pheedo.com/click.phdo?i=a71a9bd547f23d8f70bce166473850a6</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.171</pheedo:origLink>
			<description>Spatial orders such as the Morton (Z) order, Uorder, or X-order have applications in matrix manipulation, graphic rendering and data encryption. It is shown that these spatial orders are single examples of entire classes of spatial orders which can be defined in arbitrary numbers of dimensions and base values. Secondly, an algorithm is proposed which can be used to transform between these spatial orders and cartesian coordinates. It is shown that the efficiency of the algorithm improves with a larger base value. By choosing a base value that corresponds to the available memory page size, the computational effort required to perform operations such as matrix multiplication can be optimized.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=a71a9bd547f23d8f70bce166473850a6&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=a71a9bd547f23d8f70bce166473850a6&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.171</guid>
		</item>
		<item>
			<title>PrePrint: DIA: A Complexity-Effective Decoding Architecture</title>
			<link>http://www.pheedo.com/click.phdo?i=3fb4f5d02c4bba2440be4edc4939c67b</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.170</pheedo:origLink>
			<description>Fast instruction decoding is a true challenge for the design of CISC microprocessors implementing variable length instructions. A well-known solution to overcome this problem is caching decoded instructions in a hardware buffer. Fetching already decoded instructions avoids the need for decoding them again, improving processor performance. However, introducing such special-purpose storage in the processor design involves an important increase in the fetch architecture complexity. In this paper, we propose a novel decoding architecture that reduces the fetch engine implementation cost. Instead of using a special-purpose hardware buffer, our proposal stores frequently decoded instructions in the memory hierarchy. The address where the decoded instructions are stored is kept in the branch prediction mechanism, enabling it to guide our decoding architecture. This makes it possible for the processor front-end to fetch already decoded instructions from memory instead of the original non-decoded instructions. Our results show that, using our decoding architecture, a state-of-the-art superscalar processor achieves competitive performance improvements, while requiring less chip area and energy consumption in the fetch architecture than a hardware code caching mechanism.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=3fb4f5d02c4bba2440be4edc4939c67b&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=3fb4f5d02c4bba2440be4edc4939c67b&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.170</guid>
		</item>
		<item>
			<title>PrePrint: Testing of SOCs with Hierarchical Cores: Common Fallacies, Test-Access Optimization, and Test Scheduling</title>
			<link>http://www.pheedo.com/click.phdo?i=9e18ce66d8da0fb8715cc1bcf6161366</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.169</pheedo:origLink>
			<description>Many system-on-chip (SOC) integrated circuits today contain hierarchical (parent) cores that have multiple levels of design hierarchy involving "child cores". Hierarchy imposes a number of constraints on the manner in which tests must be applied to parent cores and their child cores. However, most prior work on wrapper design, test access mechanism (TAM) optimization, and test scheduling are hierarchy-oblivious, i.e., these techniques treat all cores in an SOC at the same level of hierarchy. We first show that wrappers, TAMs and test schedules designed for non-hierarchical SOCs are not valid for SOCs with hierarchical cores. We next present two approaches for the efficient testing of SOC with hierarchical cores. In the first approach, an existing wrapper design is modified such that that all constraints imposed by the hierarchy are satisfied and full flexibility is provided for TAM optimization and test scheduling. The second approach is based on a hierarchy-aware wrapper architecture for parent cores that operates in two disjoint modes for the testing of parent and child cores. We show how an existing test-architecture design algorithm can be adapted for use with these two methods. Results for the ITC'02 SOC Test Benchmarks show that the first approach offers lower test application times while the second approach requires less area overhead.&lt;br style=&quot;clear: both;&quot;/&gt;
      &lt;hr /&gt;
&lt;div style=&quot;font-size:xx-small;color:gray;padding-bottom:.5em&quot;&gt;Presented By:&lt;/div&gt;
&lt;div&gt;&lt;a href=&quot;http://www.pheedo.com/feeds/ht.php?t=c&amp;amp;i=9e18ce66d8da0fb8715cc1bcf6161366&quot;&gt;&lt;/a&gt;&lt;/div&gt;&lt;table border=&quot;0&quot; cellpadding=&quot;0&quot; cellspacing=&quot;0&quot;&gt;
&lt;tr&gt;&lt;td valign=&quot;top&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&amp;nbsp;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;
&lt;div style=&quot;font-size:xx-small; padding-top: 1em;&quot;&gt;&lt;span style=&quot;border-top: 1px solid&quot;&gt;
&lt;br style=&quot;display:none&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/&quot;&gt;Ads by Pheedo&lt;/a&gt;
&lt;/span&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/feeds/ht.php?t=v&amp;amp;i=9e18ce66d8da0fb8715cc1bcf6161366&quot;/&gt;
&lt;br/&gt;
&lt;/div&gt;
  &lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=9e18ce66d8da0fb8715cc1bcf6161366&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.169</guid>
		</item>
		<item>
			<title>PrePrint: Distributed Loop Controller for Multi-threading in Uni-threaded ILP Architectures</title>
			<link>http://www.pheedo.com/click.phdo?i=5ba4d06734bf03380777b76496314ae3</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.168</pheedo:origLink>
			<description>Reduced energy consumption is one of the most important design goals for embedded application domains like wireless communication, multimedia and biomedical applications. The instruction memory hierarchy has been proven to be one of the most power hungry parts of the system. This paper introduces an architectural enhancement for the instruction memory to reduce energy consumption and improve performance. The proposed distributed instruction memory organization requires minimal hardware overhead and supports the execution of multiple incompatible loops in parallel in a uni-processor system. We present different methods to implement the loop controller architecture, compare them, and show that distributing the instruction memory helps to reduce the interconnect cost as well. This architecture enhancement can reduce the energy consumed in the instruction memory hierarchy by 59% and improve the performance by 22% compared to hardware based enhanced SMT based architectures.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=5ba4d06734bf03380777b76496314ae3&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=5ba4d06734bf03380777b76496314ae3&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.168</guid>
		</item>
		<item>
			<title>PrePrint: A Response Time Bound in Fixed-Priority Scheduling with Arbitrary Deadlines</title>
			<link>http://www.pheedo.com/click.phdo?i=cc2a4ea03491f7d42621058da2f33086</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.167</pheedo:origLink>
			<description>Since worst-case response times must be determined repeatedly during the interactive design of real-time application systems, repeated exact computation of such response times would slow down the design process considerably. In this research, we identify three desirable properties of estimates of the exact response times: continuity with respect to system parameters; efficient computability; and approximability. We derive a technique possessing these properties for estimating the worst-case response time of sporadic task systems that are scheduled using fixed priorities upon a preemptive uniprocessor.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=cc2a4ea03491f7d42621058da2f33086&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=cc2a4ea03491f7d42621058da2f33086&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.167</guid>
		</item>
		<item>
			<title>PrePrint: A New Hierarchical Data Cache Architecture for iSCSI Storage Server</title>
			<link>http://www.pheedo.com/click.phdo?i=6056cf60ef3299b4ff6ce4fe63ca9398</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.166</pheedo:origLink>
			<description>With the emergence of data intensive applications, recent years have seen a fast growing volume of I/O traffic propagated through the local I/O interconnect bus. In this paper, we present a hierarchical Data Cache Architecture called DCA to effectively slash local interconnect traffic and thus boost the storage server performance. A popular iSCSI storage server architecture is chosen as an example. DCA is composed of a read cache in NIC card called NIC cache and a read/write unified cache in host memory called Helper cache. NIC cache services most portions of read requests without fetching data via PCI bus, while Helper cache 1) supplies some portions of read requests per partial NIC cache hit; 2) directs cache placement for NIC cache and 3) absorbs most transient writes locally. We develop a novel State Locality Aware cache Placement algorithm called SLAP to improve NIC cache hit ratio for mixed read and write workloads. To demonstrate the effectiveness of DCA, we develop a DCA prototype system and evaluate it with an open source iSCSI implementation under representative storage server workloads. Experimental results showed that DCA can boost iSCSI storage server throughput by up to 121% and reduce the PCI traffic by up to 74% compared with an iSCSI target without DCA.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=6056cf60ef3299b4ff6ce4fe63ca9398&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=6056cf60ef3299b4ff6ce4fe63ca9398&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.166</guid>
		</item>
		<item>
			<title>PrePrint: A Recursive Paradigm To Solve Boolean Relations</title>
			<link>http://www.pheedo.com/click.phdo?i=ca7382164e9af9adba966cf842933135</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.165</pheedo:origLink>
			<description>A Boolean relation can specify some types of flexibility of a combinational circuit that cannot be expressed with don't cares. Several problems in logic synthesis, such as Boolean decomposition or multi-level minimization, can be modeled with Boolean relations. However, solving Boolean relations is a computationally expensive task. This paper presents a novel recursive algorithm for solving Boolean relations. The algorithm has several features: efficiency, wide exploration of solutions and customizable cost function. The experimental results show the applicability of the method in logic minimization problems and tangible improvements with regard to previous heuristic approaches.&lt;br style=&quot;clear: both;&quot;/&gt;
      &lt;hr /&gt;
&lt;div style=&quot;font-size:xx-small;color:gray;padding-bottom:.5em&quot;&gt;Presented By:&lt;/div&gt;
&lt;div&gt;&lt;a href=&quot;http://www.pheedo.com/feeds/ht.php?t=c&amp;amp;i=ca7382164e9af9adba966cf842933135&quot;&gt;&lt;/a&gt;&lt;/div&gt;&lt;table border=&quot;0&quot; cellpadding=&quot;0&quot; cellspacing=&quot;0&quot;&gt;
&lt;tr&gt;&lt;td valign=&quot;top&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&amp;nbsp;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;
&lt;div style=&quot;font-size:xx-small; padding-top: 1em;&quot;&gt;&lt;span style=&quot;border-top: 1px solid&quot;&gt;
&lt;br style=&quot;display:none&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/&quot;&gt;Ads by Pheedo&lt;/a&gt;
&lt;/span&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/feeds/ht.php?t=v&amp;amp;i=ca7382164e9af9adba966cf842933135&quot;/&gt;
&lt;br/&gt;
&lt;/div&gt;
  &lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=ca7382164e9af9adba966cf842933135&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.165</guid>
		</item>
		<item>
			<title>PrePrint: Comments on "Low Diameter Interconnections for Routing in High-Performance Parallel Systems," with Connections and Extensions to Arc Coloring of Coset Graphs</title>
			<link>http://www.pheedo.com/click.phdo?i=9a8e85aafa14ef799a3af0c5ef40562d</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.164</pheedo:origLink>
			<description>Recently, Melhem presented a "new" class of low-diameter interconnection (LDI) networks, (IEEE Trans. Computers, Vol. 56, No. 4, pp. 502-510). We note that LDI networks are the same as the previously known generalized de Bruijn graphs, point out an error in the decomposition of LDI networks into permutations, and find that the correct decomposition scheme is an instance of arc coloring for coset graphs. Hence, we pursue a number of general results on arc coloring of coset graphs that can be applied to this particular decomposition problem as well as within many other contexts, including complete arc coloring and normality of coset graphs.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=9a8e85aafa14ef799a3af0c5ef40562d&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=9a8e85aafa14ef799a3af0c5ef40562d&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.164</guid>
		</item>
		<item>
			<title>PrePrint: A Highly Accurate Method for Assessing Reliability of Redundant Arrays of Inexpensive Disks (RAID)</title>
			<link>http://www.pheedo.com/click.phdo?i=e2f890f8272c43c289b1624ead21752e</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.163</pheedo:origLink>
			<description>Abstract - The statistical bases for current models of RAID reliability are reviewed and a highly accurate alternative is provided and justified. This new model corrects statistical errors associated with the pervasive assumption that system (RAID group) times to failure follow a homogeneous Poisson process, and corrects errors associated with assuming the time-to-failure and time-to-restore distributions are exponentially distributed. Statistical justification for the new model uses theory for reliability of repairable systems. Four critical component distributions are developed from field data. These distributions are for times to catastrophic failure, reconstruction and restoration, read errors, and disk data scrubs. Model results have been verified and predict between 2 to 1,500 times as many double disk failures as estimates made using the mean time to data loss method. Model results are compared to system level field data for RAID group of 14 drives and show excellent correlation and greater accuracy than either MTTDL.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=e2f890f8272c43c289b1624ead21752e&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=e2f890f8272c43c289b1624ead21752e&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.163</guid>
		</item>
		<item>
			<title>PrePrint: Coordinated En-Route Web Caching in Multi-Server Networks</title>
			<link>http://www.pheedo.com/click.phdo?i=ca7601add34d8f6180fb428375bfb05f</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.162</pheedo:origLink>
			<description>With the emergence of various advanced networks that comprise a group of geographically distributed servers, such as Content Delivery Networks (CDNs) and Peer-to-Peer (P2P) systems, coordinated en-route web caching in multi-server networks becomes increasingly attractive but remains of great challenge as solutions for single-server networks become invalid here. In this paper, we first establish mathematical formulation for this problem that takes into account all requests (to any server) that pass through the intermediate nodes on a response path and caches the requested object optimally among these nodes so that system's total benefit is maximized. Then we derive efficient dynamic programming based methods for finding optimal solutions to the problem for the unconstrained case and two QoS-constrained cases respectively. For each case, we present a caching scheme to illustrate application of the corresponding method. Finally we evaluate the proposed schemes on different performance metrics through extensive simulation experiments. The experiment results show that our proposed schemes can yield a steady performance improvement and achieve desired QoS in a multi-server network. To the best of our knowledge, solutions presented in this paper are the first for the problem of coordinated en-route web caching in multi-server networks&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=ca7601add34d8f6180fb428375bfb05f&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=ca7601add34d8f6180fb428375bfb05f&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.162</guid>
		</item>
		<item>
			<title>PrePrint: An Enhanced Universal NxN Fully Non-Blocking Quantum Switch</title>
			<link>http://www.pheedo.com/click.phdo?i=71ce644a5a719fd93ab3efde17bdcfe8</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.161</pheedo:origLink>
			<description>This study develops a quantum switching device with fully non-blocking properties. Although previous studies have also presented quantum-based solutions for the blocking problem, the proposed schemes are characterized by an increased packet loss, a large number of quantum SWAP gates and an increased propagation delay time complexity. The current study overcomes these drawbacks by designing an NxN fully non-blocking quantum switch, in which the packet payload is passed through quantum SWAP gates while the packet header is passed through quantum control gates designed by applying a modified quantum Karnaugh mapping method. The allocation of quantum SWAP gates to the different layers within the switch is solved using a Perfect Matching in Complete Graph (PMiCG) algorithm. A symmetry-based heuristic method is proposed to reduce the time complexity of the search process for all the perfect matching pairs to a time complexity of O(N^2). The performance of the proposed quantum switch is compared with that of a quantum self-routing packet switch and a quantum switching / quantum merge sorting scheme, respectively, in terms of the hardware complexity, the propagation delay time complexity, the auxiliary qubit complexity and the packet loss probability.&lt;br style=&quot;clear: both;&quot;/&gt;
      &lt;a href=&quot;http://www.pheedo.com/click.phdo?s=71ce644a5a719fd93ab3efde17bdcfe8&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=71ce644a5a719fd93ab3efde17bdcfe8&quot;/&gt;&lt;/a&gt;
  &lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=71ce644a5a719fd93ab3efde17bdcfe8&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.161</guid>
		</item>
		<item>
			<title>PrePrint: Many-to-Many Disjoint Path Covers in the Presence of Faulty Elements</title>
			<link>http://www.pheedo.com/click.phdo?i=57fb42a9b7f64703c54bb2722464b026</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.160</pheedo:origLink>
			<description>A many-to-many k-disjoint path cover (k-DPC) of a graph G is a set of k disjoint paths joining k sources and k sinks in which each vertex of G is covered by a path. It is called a paired many-to-many disjoint path cover when each source should be joined to a specific sink, and it is called an unpaired many-to-many disjoint path cover when each source can be joined to an arbitrary sink. In this paper, we discuss about paired and unpaired many-to-many disjoint path covers including their relationships, application to strong hamiltonicity, and necessary conditions. And then, we give a construction scheme for paired many-to-many disjoint path covers in the graph H_0 + H_1 obtained from connecting two graphs H_0 and H_1 with |V(H_0)|=|V(H_1)| by |V(H_0)| pairwise nonadjacent edges joining vertices in H_0 and vertices in H_1, where H_0 = G_0 + G_1 and H_1 = G_2 + G_3 for some graphs G_j's. Using the construction, we show that every m-dimensional restricted HL-graph and recursive circulant G(2^m, 4) with f or less faulty elements have a paired k-DPC for any f and k &amp;#x2265; 2 with f+2k &amp;#x2264; m.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=57fb42a9b7f64703c54bb2722464b026&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=57fb42a9b7f64703c54bb2722464b026&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.160</guid>
		</item>
		<item>
			<title>PrePrint: Wire-Speed TCAM-Based Architectures for Multi-Match Packet Classification</title>
			<link>http://www.pheedo.com/click.phdo?i=d39c92f0085e91921c1b67998aa759e0</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.159</pheedo:origLink>
			<description>Most conventional packet classifiers find only the highest priority filter that matches the arriving packet. However, new networking applications such as network intrusion detection systems and load balancers require all (or the first few) matching packets during classification. In this paper, two TCAM-based architectures for multi-match search are introduced. The first one is a renovated TCAM design that can find all or the first r matches in a packet filter set. The second architecture is a novel partitioning scheme based on filter intersection properties allowing us to use off-the-shelf TCAMs for multi-match packet classification. Our classifier engine finds all matches in exactly one conventional TCAM cycle while reducing the power consumption by at least two orders of magnitude, which is far better than the existing hardware based designs.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=d39c92f0085e91921c1b67998aa759e0&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=d39c92f0085e91921c1b67998aa759e0&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.159</guid>
		</item>
		<item>
			<title>PrePrint: Using Node Diagnosability to Determine t-diagnosability under the Comparison Diagnosis Model</title>
			<link>http://www.pheedo.com/click.phdo?i=d20b2d2ad7da8d502884d8351cd2d4de</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.158</pheedo:origLink>
			<description>Diagnosis is an essential subject for the reliability of a multiprocessor system. In this paper, we present a novel idea on system diagnosis called node diagnosability. The node diagnosability can be viewed as a local strategy toward system diagnosability. There is a strong relationship between the node diagnosability and the traditional diagnosability. For this local sense, we focus more on a single processor, and require only identifying the status of this particular processor correctly. Under the comparison diagnosis model, we propose a sufficient condition to determine the node diagnosability of a given processor. Furthermore, we propose an useful local structure called an extended star at a given processor to guarantee its node diagnosability, and provide an efficient algorithm to determine the faulty or fault-free status of each processor based on this structure. For a multiprocessor system with total number of processors $N$, the time complexity of our algorithm to diagnose a given processor is $O(\log N)$ and to diagnose all the faulty processors is $O(N\log N)$ under the comparison model, provided that there is an extended star structure at each processor, and that the time for looking up the testing result of a comparator in the syndrome table is constant.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=d20b2d2ad7da8d502884d8351cd2d4de&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=d20b2d2ad7da8d502884d8351cd2d4de&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.158</guid>
		</item>
		<item>
			<title>PrePrint: Hardware Designs for Decimal Floating-Point Addition and Related Operations</title>
			<link>http://www.pheedo.com/click.phdo?i=a9da2d3246aec8721aaf0fc26cdfb8d5</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.147</pheedo:origLink>
			<description>Decimal arithmetic is often used in commercial, financial, and Internet-based applications. Due to the growing importance of decimal floating-point (DFP) arithmetic, the IEEE 754 Draft Standard for Floating-Point Arithmetic (IEEE P754) includes specifications for DFP arithmetic. This paper gives an overview of DFP arithmetic in IEEE P754 and discusses previous research on decimal fixed-point and floating-point addition. It also presents novel designs for a DFP adder and a DFP multifunction unit (DFP MFU) that comply with IEEE P754. To reduce their delay, the DFP adder and MFU both use decimal injection-based rounding, a new form of decimal operand alignment, and a fast flag-based method for rounding and overflow detection. Synthesis results indicate that the proposed DFP adder is roughly 21% faster and 1.6% smaller than a previous DFP adder design, when implemented in the same technology. Compared to the DFP adder, the DFP MFU provides six additional operations, yet only has 2.8% more delay and 9.7% more area. A pipelined version of the DFP MFU has a latency of six cycles, a throughput of one result per cycle, an estimated critical path delay of 12.9 fanout-offour (FO4) inverter delays, and an estimated area of 0.2953mm2.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=a9da2d3246aec8721aaf0fc26cdfb8d5&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=a9da2d3246aec8721aaf0fc26cdfb8d5&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.147</guid>
		</item>
		<item>
			<title>PrePrint: Replacing Associative Load Queues: A Timing-Centric Approach</title>
			<link>http://www.pheedo.com/click.phdo?i=2917a12547660a2bff6f3d2238ec04d3</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.146</pheedo:origLink>
			<description>One of the main challenges of modern processor design is the implementation of a scalable and efficient mechanism to detect memory access order violations as a result of out-of-order execution. Traditional age-ordered associative load queues are complex, inefficient, and power-hungry. In this paper, we introduce two new dependence checking schemes with different design tradeoffs, but both explicitly rely on timing information as a primary instrument to rule out dependence violation. Our timing-centric designs operate at a fraction of the energy cost of an associative LQ and achieve the same functionality with an insignificant performance impact on average. Studies with parallel benchmarks also show that they are equally effective and efficient in a chip-multiprocessor environment.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=2917a12547660a2bff6f3d2238ec04d3&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=2917a12547660a2bff6f3d2238ec04d3&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.146</guid>
		</item>
		<item>
			<title>PrePrint: Hardware Architecture for High-Performance Regular Expression Matching</title>
			<link>http://www.pheedo.com/click.phdo?i=1121aa9ea4574a7fb9715a7167e20420</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.145</pheedo:origLink>
			<description>This paper presents a bitmap based hardware architecture for the Glushkov non-deterministic finite automaton (G-NFA) which recognizes a given regular expression. We show that the inductions of the functions needed to construct the G-NFA can be generalized to include other special symbols commonly used in extended regular expressions such as the POSIX 1003.2 format. Our proposed implementation can detect the ending positions of all sub-strings of an input string T which start at arbitrary positions of T and belong to the language defined by the given regular expression. To achieve high-performance, the implementation is generalized to the NFA which processes K symbols in each operation cycle. We provide an efficient solution for the boundary condition when the length of the input string is not an integral multiple of K. Compared with previous designs, our proposed architecture is more flexible and programmable because the pattern matching engine uses memory rather than logic.&lt;br style=&quot;clear: both;&quot;/&gt;
      &lt;hr /&gt;
&lt;div style=&quot;font-size:xx-small;color:gray;padding-bottom:.5em&quot;&gt;Presented By:&lt;/div&gt;
&lt;div&gt;&lt;a href=&quot;http://www.pheedo.com/feeds/ht.php?t=c&amp;amp;i=1121aa9ea4574a7fb9715a7167e20420&quot;&gt;&lt;/a&gt;&lt;/div&gt;&lt;table border=&quot;0&quot; cellpadding=&quot;0&quot; cellspacing=&quot;0&quot;&gt;
&lt;tr&gt;&lt;td valign=&quot;top&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&amp;nbsp;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;
&lt;div style=&quot;font-size:xx-small; padding-top: 1em;&quot;&gt;&lt;span style=&quot;border-top: 1px solid&quot;&gt;
&lt;br style=&quot;display:none&quot;/&gt;
&lt;a href=&quot;http://www.pheedo.com/&quot;&gt;Ads by Pheedo&lt;/a&gt;
&lt;/span&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/feeds/ht.php?t=v&amp;amp;i=1121aa9ea4574a7fb9715a7167e20420&quot;/&gt;
&lt;br/&gt;
&lt;/div&gt;
  &lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=1121aa9ea4574a7fb9715a7167e20420&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.145</guid>
		</item>
		<item>
			<title>PrePrint: Networks-on-Chip in a Three-Dimensional Environment: A Performance Evaluation</title>
			<link>http://www.pheedo.com/click.phdo?i=4186a18bc89dd4b2fd936498a6ed91f7</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.142</pheedo:origLink>
			<description>The Network-on-Chip (NoC) paradigm has emerged as a revolutionary methodology for integrating a very high number of intellectual property (IP) blocks in a single die. The achievable performance benefit arising out of adopting NoCs is constrained by the performance limitation imposed by the metal wire, which is the physical realization of communication channels. With technology scaling, only depending on the material innovation will extend the lifetime of conventional interconnect systems a few technology generations. According to International Technology Roadmap for Semiconductors (ITRS) for the longer term, new interconnect paradigms are in need. The conventional two dimensional (2D) integrated circuit (IC) has limited floor-planning choices, and consequently it limits the performance enhancements arising out of NoC architectures. Three dimensional (3D) ICs are capable of achieving better performance, functionality, and packaging density compared to more traditional planar ICs. On the other hand, NoC is an enabling solution for integrating large numbers of embedded cores in a single die. 3D NoC architectures combine the benefits of these two new domains to offer an unprecedented performance gain. In this paper we evaluate the performance of 3D NoC architectures and demonstrate their superior functionality in terms of throughput, latency, energy dissipation and wiring area overhead compared to traditional 2D implementations.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=4186a18bc89dd4b2fd936498a6ed91f7&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=4186a18bc89dd4b2fd936498a6ed91f7&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.142</guid>
		</item>
		<item>
			<title>PrePrint: Enhancing Simulation Accuracy through Advanced Hazard Detection in Asynchronous Circuits</title>
			<link>http://www.pheedo.com/click.phdo?i=4616edce203b0ee35bafbb285a3eb4b2</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.141</pheedo:origLink>
			<description>A fast and accurate simulator with elaborate hazard detection capabilities is vital for asynchronous circuits, not only for the purpose of design validation through logic simulation, but even more importantly for the purpose of test validation through fault simulation. Towards this end, we developed SPIN-SIM, a logic and fault simulator built around Eichelberger&amp;#x2019;s classical hazard detection method, yet extended in various ways in order to overcome its limitations. More specifically, in order to improve simulation accuracy and hazard detection, SPIN-SIM i) employs a 13-valued algebra for which it adapts Eichelberger&amp;#x2019;s method, ii) maintains partial orders of causal signal transitions through relative time-stamps, and iii) unfolds time-frames judiciously to distinguish between hazards and actual transitions. Experimental results demonstrate that, at the cost of a negligible increase in computational time over Eichelberger&amp;#x2019;s method, if any at all, SPIN-SIM achieves significantly more accurate logic simulation and, by extension, drastically more efficient fault simulation. Furthermore, while the proposed method was developed and is presented for the class of Speed-Independent circuits, it is easily extendible to various other classes of asynchronous circuits.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=4616edce203b0ee35bafbb285a3eb4b2&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=4616edce203b0ee35bafbb285a3eb4b2&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.141</guid>
		</item>
		<item>
			<title>PrePrint: Localized Independent Packet Scheduling for Buffered Crossbar Switches</title>
			<link>http://www.pheedo.com/click.phdo?i=c381ec4f0816d31fc1c7fd5c810802fd</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.140</pheedo:origLink>
			<description>In a buffered crossbar switch, besides input/output queues, a small buffer is associated with each crosspoint. Due to the introduction of crosspoint buffers, output/input contention is eliminated, and the scheduling process is greatly simplified. Moreover, crosspoint buffers enable the switch to work in an asynchronous mode and easily schedule and transmit variable length packets. Compared with fixed length packet scheduling, variable length packet scheduling has some unique advantages: higher throughput, shorter packet latency and lower hardware cost. We present a fast scheduling scheme for buffered crossbar switches called Localized Independent Packet Scheduling (LIPS). With LIPS, an input/output port makes scheduling decisions based on the state of its local crosspoint buffers. This property makes LIPS suitable for a distributed implementation and thus highly scalable. Since no comparison operation is required in LIPS, scheduling arbiters can be implemented using priority encoders, which can make arbitration decisions quickly in hardware. Another advantage of LIPS is that each crosspoint needs only small buffer space, which minimizes hardware cost of switches. We also analyze the performance of LIPS, and prove that LIPS achieves 100% throughput for any admissible traffic with speedup of two. Simulations are conducted to verify the analytical results and measure the performance of LIPS.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=c381ec4f0816d31fc1c7fd5c810802fd&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=c381ec4f0816d31fc1c7fd5c810802fd&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.140</guid>
		</item>
		<item>
			<title>PrePrint: Draco: Efficient Resource Management for Resource-Constrained Control Tasks</title>
			<link>http://www.pheedo.com/click.phdo?i=21b84d54a88299b4c35aa389a7ada575</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.136</pheedo:origLink>
			<description>In many application areas, including control systems, careful management of system resources is key to providing the best application performance. Traditional control systems with multiple control loops statically allocate a fixed portion of the system resources to each controller based on their average or worst-case resource requirements. However, controllers' resource needs vary depending on the jobs they perform and the state of the systems they control. A controller of a plant operating close to its equilibrium requires fewer resources than a controller of a plant operating far from its equilibrium point. The Draco dynamic rate control system exploits this fact by dynamically allocating resources to control systems based on system state. Our research demonstrates that Draco provides significantly better overall control performance with much less resources than static controllers. Our experimental evaluation shows that in the control scenarios we examined Draco provides up to 25% better control performance with 30% less resources.&lt;br style=&quot;clear: both;&quot;/&gt;
      &lt;a href=&quot;http://www.pheedo.com/click.phdo?s=21b84d54a88299b4c35aa389a7ada575&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=21b84d54a88299b4c35aa389a7ada575&quot;/&gt;&lt;/a&gt;
  &lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=21b84d54a88299b4c35aa389a7ada575&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.136</guid>
		</item>
		<item>
			<title>PrePrint: Sensitivity-Based Optimization of Disk Architecture</title>
			<link>http://www.pheedo.com/click.phdo?i=5cc2bcb48c39156d34c21c1b6cca40fd</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.135</pheedo:origLink>
			<description>Storage plays a pivotal role in the performance of many applications. Many applications, especially those that run on servers, are I/O intensive and therefore require high performance storage systems. These high-end storage systems consume a large amount of power, the bulk of which is due to the disk drives. Optimizing disk architectures is a design time as well as a run time issue and requires balancing between performance and power. There are different figures of merit, such as performance and energy, and a large space of design and runtime "knobs" that can be used to optimize disk drive behavior. Given such a large space, it is desirable to have a systematic methodology to optimally set these knobs to satisfy our figures of merit as efficiently as possible. In this paper we present the sensitivity-based optimization methodology for disk architectures (SODA), which leverages results previously obtained in digital circuit design optimization scenarios. Using detailed models of the electro-mechanical behavior of disk drives and a suite of realistic workloads, we show how SODA can aid in design and runtime optimization of disk drive architectures.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=5cc2bcb48c39156d34c21c1b6cca40fd&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=5cc2bcb48c39156d34c21c1b6cca40fd&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.135</guid>
		</item>
		<item>
			<title>PrePrint: Complexities of Graph-Based Representations for Elementary Functions</title>
			<link>http://www.pheedo.com/click.phdo?i=bb5f8c0f388effea07dd1ea22e08507f</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.134</pheedo:origLink>
			<description>This paper analyzes complexities of decision diagrams for elementary functions such as polynomial, trigonometric, logarithmic, square root, and reciprocal functions. These real functions are converted into integer-valued functions by using fixed-point representation. This paper presents the numbers of nodes in decision diagrams representing the integer-valued functions. First, complexities of decision diagrams for polynomial functions are analyzed, since elementary functions can be approximated by polynomial functions. A theoretical analysis shows that binary moment diagrams (BMDs) have low complexity for polynomial functions. Second, this paper analyzes complexity of edge-valued binary decision diagrams (EVBDDs) for monotone functions, since many common elementary functions are monotone. It introduces a new class of integer functions, Mp-monotone increasing function, and derives an upper bound on the number of nodes in an EVBDD for the Mp-monotone increasing function. A theoretical analysis shows that EVBDDs have low complexity for Mp-monotone increasing functions. This paper also presents the exact number of nodes in the smallest EVBDD for the n-bit multiplier function, and a variable order for the smallest EVBDD.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=bb5f8c0f388effea07dd1ea22e08507f&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=bb5f8c0f388effea07dd1ea22e08507f&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.134</guid>
		</item>
		<item>
			<title>PrePrint: FPC: A High-Speed Compressor for Double-Precision Floating-Point Data</title>
			<link>http://www.pheedo.com/click.phdo?i=8b495fa2756b25a7b7b95a315b3e90ec</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.131</pheedo:origLink>
			<description>Many scientific programs exchange large quantities of double-precision data between processing nodes and with mass storage devices. Data compression can reduce the number of bytes that need to be transferred and stored. However, compression is only likely to be employed in high-end computing environments if it does not impede the throughput. This paper describes and evaluates FPC, a fast lossless compression algorithm for linear streams of 64-bit floating-point data. FPC works well on hard-to-compress scientific datasets and meets the throughput demands of high-performance systems. A comparison with five lossless compression schemes, BZIP2, DFCM, FSD, GZIP, and PLMI, on four architectures and thirteen datasets shows that FPC compresses and decompresses one to two orders of magnitude faster than the other algorithms at the same geometric-mean compression ratio. Moreover, FPC provides a guaranteed throughput as long as the prediction tables fit into the L1 data cache. For example, on a 1.6 GHz Itanium 2 server, the throughput is 670 megabytes per second regardless of what data are being compressed.&lt;br style=&quot;clear: both;&quot;/&gt;
      &lt;a href=&quot;http://www.pheedo.com/click.phdo?s=8b495fa2756b25a7b7b95a315b3e90ec&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=8b495fa2756b25a7b7b95a315b3e90ec&quot;/&gt;&lt;/a&gt;
  &lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=8b495fa2756b25a7b7b95a315b3e90ec&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.131</guid>
		</item>
		<item>
			<title>PrePrint: Limit on the Addressability of Fault-Tolerant Nanowire Decoders</title>
			<link>http://www.pheedo.com/click.phdo?i=1df73dcfd41e0e995bf0225709025639</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.130</pheedo:origLink>
			<description>Although prone to fabrication error, the nanowire crossbar is a promising candidate compoent for next generation nanometer-scale circuits. In the nanowire crossbar architecture, nanowires are addressed by controlling voltages on the mesowires. For area efficiency, we are interested in the maximum number of nanowires $N(m,e)$ that can be addressed by $m$ mesowires, in the face of up to $e$ fabrication errors. Asymptotically tight bounds on $N(m,e)$ are established in this paper. In particular, it is shown that $N(m,e) = \Theta(2^m / m^{e+1/2})$. Interesting observations are made on the equivalence between this problem and the problem of constructing optimal EC/AUED codes, superimposed distance codes, pooling designs, and diffbounded set systems. Results in this paper also improve upon those in the EC/AUEC codes literature.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=1df73dcfd41e0e995bf0225709025639&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=1df73dcfd41e0e995bf0225709025639&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.130</guid>
		</item>
		<item>
			<title>PrePrint: Delay Constrained Multicast Routing Using the Noisy Chaotic Neural Networks</title>
			<link>http://www.pheedo.com/click.phdo?i=e6cbd6cc71e5a732e31f6775ae3b87c5</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.127</pheedo:origLink>
			<description>We present a method to compute the delay constrained multicast routing tree by employing chaotic neural networks. Experimental result shows that the noisy chaotic neural network (NCNN) provides optimal solution more often compared to the transiently chaotic neural network (TCNN) and the Hopfield neural network (HNN). Furthermore, compared with the bounded shortest multicast algorithm (BSMA), the noisy chaotic neural network is able to find multicast trees with lower cost.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=e6cbd6cc71e5a732e31f6775ae3b87c5&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=e6cbd6cc71e5a732e31f6775ae3b87c5&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.127</guid>
		</item>
		<item>
			<title>PrePrint: The Mixed-Radix Chinese Remainder Theorem and Its Applications to Residue Comparison</title>
			<link>http://www.pheedo.com/click.phdo?i=482ae778a167982cb7ff7c936eb47a54</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.126</pheedo:origLink>
			<description>The Chinese remainder theorem (CRT) and mixed-radix conversion (MRC) are two classic theorems used to convert a residue number to its binary correspondence for a given moduli set {P_n, &#183; &#183; &#183; , P_2, P_1}. The MRC is a weighted number system and it requires operations modulo P_i only and hence magnitude comparison is easily performed. However, the calculation of the mixed-radix coefficients in the MRC is a strictly sequential process and involves complex divisions. Thus the residue-to-binary (R/B) conversions and residue comparisons based on the MRC require large delay. In contrast, the R/B conversion and residue comparison based on the CRT are fully parallel processes. However, the CRT requires large operations modulo M = P_n &#183; &#183; &#183; P_2P_1. In this paper, a new mixed-radix CRT is proposed which possesses both the advantages of the CRT and the MRC, which are parallel processing, small operations modulo P_i only, and the efficiency of making modulo comparison. Based on the proposed CRT, new residue comparators are developed for the three-moduli set {2^n &amp;#8722; 1, 2^n, 2^n + 1}. The FPGA implementation results show that the proposed modulo comparators are about 20% faster and smaller than one of the previous best designs.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=482ae778a167982cb7ff7c936eb47a54&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=482ae778a167982cb7ff7c936eb47a54&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.126</guid>
		</item>
		<item>
			<title>PrePrint: Localized Minimum-Energy Broadcasting for Wireless Multihop Networks with Directional Antennas</title>
			<link>http://www.pheedo.com/click.phdo?i=e1fa02fedd4c8bccb6d5d7b7f0267b90</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.125</pheedo:origLink>
			<description>We propose several localized algorithms to achieve energy-efficient broadcasting in wireless multihop networks using directional antennas. Each node needs to know only geographic position of itself and its neighbors. Our first protocol is called DRBOP and it follows the one-to-one communication model to reach to all nodes in the relative neighborhood graph (RNG). Each node that receives a message for the first time from one of its RNG neighbors will rebroadcast it to each of its remaining RNG neighbors separately. The transmission power is adjusted for each transmission to the minimal necessary for reaching the particular neighbor. Next, we describe DLBOP, where RNG is replaced by the localized minimum spanning tree (LMST) graph which is a localized topology resembling the minimum spanning tree. We then observe that, for very dense networks, it is more energy-efficient to reach more than one neighbor at a time. A one-to-many protocol efficient for dense networks is proposed. We then describe an efficient localized protocol which adaptively switches (without any threshold) between one-to-one and one-to-many communication models and is efficient for both sparse and dense networks. Our simulation results show that for different energy models, the adaptive protocol is able to achieve a competitive performance to globalized algorithms while having a fully localized operation.&lt;br style=&quot;clear: both;&quot;/&gt;
      &lt;a href=&quot;http://www.pheedo.com/click.phdo?s=e1fa02fedd4c8bccb6d5d7b7f0267b90&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=e1fa02fedd4c8bccb6d5d7b7f0267b90&quot;/&gt;&lt;/a&gt;
  &lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=e1fa02fedd4c8bccb6d5d7b7f0267b90&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.125</guid>
		</item>
		<item>
			<title>PrePrint: Optimized Custom Precision Function Evaluation for Embedded Processors</title>
			<link>http://www.pheedo.com/click.phdo?i=8d107663ea201060a06f2bb9614b6a81</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.124</pheedo:origLink>
			<description>Fixed-point processors are utilized in an enormous variety of applications, often for tasks that require the evaluation of mathematical functions. We present an automated method for mapping functions to such processors via polynomials that explicitly targets the native word-length of the processor, thereby significantly reducing the execution time relative to commonly used floating-point emulation approaches based on traditional mathematical libraries. The methods presented here also contrast with hand-tuned processor-specific code, which has the potential to deliver efficient implementations but at the cost of significant design time. We describe an automated design flow utilizing multi-word arithmetic to provide overflow protection and precision accurate to one unit in the last place (ulp). Analytical approaches are used to minimize the number of fixed-width operands required for each operation and to ensure that precision requirements are met. This allows automated generation of processor-optimized code and characterization of a design space representing a rich range of tradeoffs among precision, latency, and memory cost.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=8d107663ea201060a06f2bb9614b6a81&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=8d107663ea201060a06f2bb9614b6a81&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.124</guid>
		</item>
		<item>
			<title>PrePrint: Pipelining Saturated Accumulation</title>
			<link>http://www.pheedo.com/click.phdo?i=af20cd5955a26890d508b084c97e71aa</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.110</pheedo:origLink>
			<description>Aggressive pipelining and spatial parallelism allow integrated circuits (e.g., custom VLSI, ASICs, FPGAs) to achieve high throughput on many Digital Signal Processing applications. However, cyclic data dependencies in the computation can limit parallelism and reduce the efficiency and speed of an implementation. Saturated accumulation is an important example where such a cycle limits the throughput of signal processing applications. We show how to reformulate saturated addition as an associative operation so that we can use a parallel-prefix calculation to perform saturated accumulation at any data rate supported by the device. This allows us, for example, to design a 16-bit saturated accumulator which can operate at 280MHz on a Xilinx Spartan-3 (XC3S-5000-4) FPGA, the maximum frequency supported by the component's DCM.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=af20cd5955a26890d508b084c97e71aa&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=af20cd5955a26890d508b084c97e71aa&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.110</guid>
		</item>
		<item>
			<title>PrePrint: The Synonym Lookaside Buffer: A Solution to the Synonym Problem in Virtual Caches</title>
			<link>http://www.pheedo.com/click.phdo?i=124d6d2583ff411bbad27cfe284164fb</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.108</pheedo:origLink>
			<description>To support dynamic address translation in today's microprocessors, the first-level cache is accessed in parallel with a Translation Lookaside Buffer (TLB). However, this current approach faces mounting problems. This paper introduces new ideas to enable the use of virtual addresses in the cache hierarchy. The major idea is the replacement of the on-chip TLB by a Synonym Lookaside Buffer (SLB). The SLB translates synonyms into a primary virtual address, which is a unique identifier resolving all ambiguities due to synonyms in the memory system. We introduce various system configurations with SLBs and discuss all functional issues associated with them. An SLB is much more scalable than a regular TLB. It scales with memory data set sizes, physical memory sizes and number of cores in a multiprocessor. Moreover SLB entry flushes and shootdowns due to physical memory management are eliminated. We show performance data resulting from the simulation of several applications as diverse as scientific computing, database, and JAVA virtual machines. These evaluations target SLB miss rates and flushes as well as the impact of the SLB on cache miss rates. They show that small SLBs of 8-16 entries are sufficient to solve the synonym problem in virtual caches and that their performance overhead is negligible.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=124d6d2583ff411bbad27cfe284164fb&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=124d6d2583ff411bbad27cfe284164fb&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.108</guid>
		</item>
		<item>
			<title>PrePrint: A Systematic Approach for Designing Redundant Arithmetic Adders Based on Counter Tree Diagrams</title>
			<link>http://www.pheedo.com/click.phdo?i=34d38a914ac79e6f8e1565d8d0c68183</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.106</pheedo:origLink>
			<description>This paper introduces a systematic approach to designing high-performance parallel adders based on Counter Tree Diagrams (CTDs). By using CTDs, we can describe addition algorithms at various levels of abstraction. A high-level CTD represents a network of coarse-grained components associated with word-level operands, whereas a low-level CTD represents a network of primitive components that can be directly mapped onto physical devices. The level of abstraction in circuit representation can be changed by decomposition of CTDs. We can derive possible variations of adder structures by decomposing a high-level CTD into low-level CTDs in a formal manner. In this paper, we focus on an application of CTDs to the design of redundant arithmetic adders with limited carry propagation. For any redundant number representation, we can obtain the optimal adder structure by trying every possible CTD decomposition and CTD-variable encoding. The potential of the proposed approach is demonstrated through an experimental synthesis of Redundant-Binary (RB) adders with CMOS standard cell libraries. We can successfully obtain RB adders that achieve an about 30-40% improvement in terms of power-delay product compared with conventional designs.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=34d38a914ac79e6f8e1565d8d0c68183&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=34d38a914ac79e6f8e1565d8d0c68183&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.106</guid>
		</item>
		<item>
			<title>PrePrint: March Test Generation Revealed</title>
			<link>http://www.pheedo.com/click.phdo?i=0b434b62091f3415902a5a33b1ca76d4</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.105</pheedo:origLink>
			<description>Memory testing commonly faces two issues: the characterization of detailed and realistic fault models, and the definition of time-efficient test algorithms to detect them. March tests have proven to be a fast, simple and regularly structured class of memory test algorithms. This paper proposes a new polynomial algorithm to automatically generate march tests. The formal model adopted to represent memory faults allows the definition of a general methodology to deal with both static, dynamic and linked faults.&lt;br style=&quot;clear: both;&quot;/&gt;
      &lt;a href=&quot;http://www.pheedo.com/click.phdo?s=0b434b62091f3415902a5a33b1ca76d4&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=0b434b62091f3415902a5a33b1ca76d4&quot;/&gt;&lt;/a&gt;
  &lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=0b434b62091f3415902a5a33b1ca76d4&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.105</guid>
		</item>
		<item>
			<title>PrePrint: Strongly Diagnosable Systems under the Comparison Diagnosis Model</title>
			<link>http://www.pheedo.com/click.phdo?i=c969747fcfaebdbb1baed29c04238860</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.104</pheedo:origLink>
			<description>A system is $t$-diagnosable if all faulty nodes can be identified without replacement when the number of faults does not exceed $t$, where $t$ is some positive integer. Furthermore, a system is strongly $t$-diagnosable if it is $t$-diagnosable and can achieve $(t+1)$-diagnosable except for the case where a node's neighbors are all faulty. In this paper, we propose some conditions for verifying whether a class of interconnection networks, called Matching Composition Networks (MCNs), are strongly diagnosable under the comparison diagnosis model.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=c969747fcfaebdbb1baed29c04238860&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=c969747fcfaebdbb1baed29c04238860&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.104</guid>
		</item>
		<item>
			<title>PrePrint: Automatic Generation of Modular Multipliers for FPGA Applications</title>
			<link>http://www.pheedo.com/click.phdo?i=2d9e27a7597bfec7043aea2a3581f6ef</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.102</pheedo:origLink>
			<description>Since redundant number systems allow constant time addition, they are often at the heart of modular multipliers designed for public key cryptography (PKC) applications. Indeed, PKC involves large operands (160 to 1024 bits) and several researchers proposed carry-save or borrow-save algorithms. However, these number systems do not take advantage of the dedicated carry logic available in modern Field Programmable Gate Arrays (FPGAs). To overcome this problem, we suggest to perform modular multiplication in a high-radix carry-save number system, where a sum bit of the carry-save representation is replaced by a sum word. Two digits are then added by means of a small Carry-Ripple Adder (CRA). Furthermore, we propose an algorithm which selects the best high-radix carry-save representation for a given modulus, and generates a synthesizable VHDL description of the operator.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=2d9e27a7597bfec7043aea2a3581f6ef&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=2d9e27a7597bfec7043aea2a3581f6ef&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.102</guid>
		</item>
		<item>
			<title>PrePrint: A Cost-Effective Latency-Aware Memory Bus for Symmetric Multiprocessor Systems</title>
			<link>http://www.pheedo.com/click.phdo?i=99c3cc528b5ac024b032aa3f5587c530</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.96</pheedo:origLink>
			<description>This paper presents how a multi-core system can benefit from the use of a latency-aware memory bus capable of dual-concurrent data transfers on a single wire line: Source synchronous CDMA interconnect (SSCDMA-I) has been adopted to implement the memory bus of a shared-memory multi-core system. Two types of bus-based homogeneous and heterogeneous multi-core systems are modeled and simulated by a cycle-accurate simulation platform. Unlike the conventional time-division multiplexing (TDM) bus-based multi-core system that shows degradation in performance as the number of processing cores increases, the proposed SSCDMA bus-based multi-core shows higher performance up to 23.1% for 4 cores. The maximum latency of a heterogeneous multi-core system with a mix of traffic loads has been reduced up to 78%. These results demonstrate that the performance of multi-core systems can be improved with less cost and network complexity by reducing the bus contention interferences and by supporting higher concurrency in memory accesses that brings shorter critical word access latency.&lt;br style=&quot;clear: both;&quot;/&gt;
      &lt;a href=&quot;http://www.pheedo.com/click.phdo?s=99c3cc528b5ac024b032aa3f5587c530&quot;&gt;&lt;img alt=&quot;&quot; style=&quot;border: 0;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?s=99c3cc528b5ac024b032aa3f5587c530&quot;/&gt;&lt;/a&gt;
  &lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=99c3cc528b5ac024b032aa3f5587c530&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.96</guid>
		</item>
		<item>
			<title>PrePrint: Immunet: Dependable Routing for Interconnection Networks with Arbitrary Topology</title>
			<link>http://www.pheedo.com/click.phdo?i=6e2f314ac800f8a19082bf6ae9c957f9</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.95</pheedo:origLink>
			<description>A complete mechanism for tolerating multiple failures in parallel computer systems, denoted as Immunet, is described in this paper. Immunet can be applied to arbitrary topologies, either regular or irregular, exhibiting in both cases graceful performance degradation. Provided that the network remains connected, Immunet is able to deal with any number of failures regardless of their spatial and temporal distribution. Our mechanism operates on the basis of a dynamic network reconfiguration in response to failures. The network reconfiguration only employs local information recorded at the router nodes which leads to a highly scalable system. In addition, its low cost and overhead permit a practicable hardware implementation. Finaly, Immunet could allow circumvent failures transparently to applications running on a parallel system because it does not require dropping in-flight traffic. Only packets stored in or traveling through a broken component should be recovered by higher system levels.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=6e2f314ac800f8a19082bf6ae9c957f9&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=6e2f314ac800f8a19082bf6ae9c957f9&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.95</guid>
		</item>
		<item>
			<title>PrePrint: Adaptive Fault Management of Parallel Applications for High-Performance Computing</title>
			<link>http://www.pheedo.com/click.phdo?i=2e0e612e47af14291a7e4912cc519718</link>
			<pheedo:origLink>http://doi.ieeecomputersociety.org/10.1109/TC.2008.90</pheedo:origLink>
			<description>As the scale of high performance computing (HPC) grows, application fault resilience becomes increasingly important. In this paper, we propose FT-Pro, an adaptive fault management approach that combines the merits of reactive checkpointing and proactive migration. It enables parallel applications to avoid anticipated failures via preventive migration, and in the case of unforeseeable failures, to minimize their impact through selective checkpointing. An adaptation manager is designed for making runtime decision in response to failure prediction. We evaluate FT-Pro through stochastic modeling and case studies with real applications under a wide range of settings. Preliminary results indicate that FT-Pro outperforms periodic checkpointing, in terms of both reducing application completion times and improving resource utilization, by up to 43%.&lt;br style=&quot;clear: both;&quot;/&gt;
  &lt;img alt=&quot;&quot; style=&quot;border: 0; height:1px; width:1px;&quot; border=&quot;0&quot; src=&quot;http://www.pheedo.com/img.phdo?i=2e0e612e47af14291a7e4912cc519718&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
&lt;img src=&quot;http://www.pheedo.com/feeds/tracker.php?i=2e0e612e47af14291a7e4912cc519718&quot; style=&quot;display: none;&quot; border=&quot;0&quot; height=&quot;1&quot; width=&quot;1&quot; alt=&quot;&quot;/&gt;</description>
			<guid isPermaLink="false">http://doi.ieeecomputersociety.org/10.1109/TC.2008.90</guid>
		</item>
	</channel>
</rss>