<?xml version="1.0" encoding="ISO-8859-1"?>
<rss version="2.0">
<channel>
<title>IEEE Transactions on Computers</title>
<link>http://www.computer.org/tc</link>
<description>The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field. It publishes papers, brief contributions, and comments on research in areas of current interest to the readers. These areas include, but are not limited to, the following: a) computer organizations and architectures; b) operating systems, software systems, and communication protocols; c) real-time systems and embedded systems; d) digital devices, computer components, and interconnection networks; e) specification, design, prototyping, and testing methods and tools; f) performance, fault tolerance, reliability, security, and testability;
g) case studies and experimental and theoretical evaluations; and h) new and important applications and trends.	</description>
	<language>en-us</language>
	<pubDate>Wed, 4 Jan 2012 11:00:01 GMT</pubDate>
	<image>
		<url>http://csdl.computer.org/common/images/logos/tc.gif</url>
		<title>IEEE Computer Society</title>
		<description>List of recently published journal articles</description>
		<link>http://www.computer.org/tc</link>
	</image>
  <item>
     <title>PrePrint: An Efficient Denoising Architecture for Removal of Impulse Noise in Images</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2011.256</link>
     <description>Images are often corrupted by impulse noise in the procedures of image acquisition and transmission. In this paper, we propose an efficient denoising scheme and its VLSI architecture for the removal of random-valued impulse noise. To achieve the goal of low cost, a low-complexity VLSI architecture is proposed. We employ a decision-tree-based impulse noise detector to detect the noisy pixels, and an edge-preserving filter to reconstruct the intensity values of noisy pixels. Furthermore, an adaptive technology is used to enhance the effects of removal of impulse noise. Our extensive experimental results demonstrate that the proposed technique can obtain better performances in terms of both quantitative evaluation and visual quality than the previous lower-complexity methods. Moreover, the performance can be comparable to the higher-complexity methods. The VLSI architecture of our design yields a processing rate of about 200 MHz by using TSMC 0.18&amp;#x00B5;m technology. Compared with the state-of-the-art techniques, this work can reduce memory storage by more than 99%. The design requires only low computational complexity and two line memory buffers. Its hardware cost is low and suitable to be applied to many real-time applications.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2011.256</guid>
  </item>
  <item>
     <title>PrePrint: Preserving Temporal Relationships of Events for Wireless Sensor Actor Networks</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2011.215</link>
     <description>In this paper, we present the performance evaluation of an algorithm for preserving temporal relationships of events in Wireless Sensor Actor Networks (WSANs). The algorithm consists of two modules, which deal with the problems of temporal event ordering and time synchronization. These two problems are approached as a whole as they complement each other: in order to temporally order the events, the nodes must be synchronized. The goal of the proposed event ordering algorithm for WSANs is to reduce the overhead in terms of energy dissipation and delay. We also propose a tunable time synchronization algorithm employing a hybrid synchronization scheme suited for clustered topology. The proposed algorithm utilizes the message exchange necessary for event ordering and routing for synchronization purposes by piggybacking messages with synchronization pulses and replies to reduce the communication cost of synchronization. Simulation experiments showed that the event ordering algorithm is capable of reducing the overhead when compared to previously proposed algorithms. The synchronization algorithm demonstrated that the combination of synchronization techniques was well suited for the communication mode utilized in a clustered topology. The approach of piggybacking synchronization pulses and replies resulted in a considerable gain, which we demonstrated in the number of messages that were piggybacked for synchronization purposes.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2011.215</guid>
  </item>
  <item>
     <title>PrePrint: Testing and Diagnosing Comparison Faults of TCAMs with Asymmetric Cells</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2011.196</link>
     <description>This paper presents several comparison fault models of TCAMs with asymmetric cells based on electrical defects. One march-like test algorithm TAC-H is also proposed to cover the defined comparison faults. The TAC-H consists of 8N Write operations and (3N +2B) Compare operations for an N&#215;B-bit TCAM with Hit output only. We also propose two march-like diagnosis algorithms to identify the defined comparison faults of TCAMs with asymmetric cells. The first diagnosis algorithm DAC-H requires 5N Write operations, 3N Erase operations, and (5N +2B) Compare operations to distinguish 100% comparison faults for a TCAM with Hit output only. The second diagnosis algorithm DAC-P requires 3N Write operations, 1N Erase operations, and (5N +2B) Compare operations to distinguish 100% comparison faults for a TCAM with Hit and priority address encoder outputs.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2011.196</guid>
  </item>
  <item>
     <title>PrePrint: Power and Delay Aware Management of Packet Switches</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2011.191</link>
     <description>Due to increasing circuit densities and data throughput rates, power consumption has become a significant concern in the design and operation of high-performance packet switches. We extend the idea of Dynamic Power Management (DPM) to input queued switches, allowing operators to tradeoff power and delay in a useful way. We frame the problem as a dynamic program and solve a relaxation using techniques from Linear Quadratic Regulation (LQR). This optimal policy is combined with existing, non-power-aware switch controls to generate two novel scheduling algorithms: (a) LQR Power Aware Maximum Weight Matching (LQR PA MWM) and (b) LQR Power Aware Projective Cone Scheduling (LQR PA PCS). Simulation results suggest that our algorithms result in significant power savings compared to MWM and previous power control schemes with little performance degradation.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2011.191</guid>
  </item>
  <item>
     <title>PrePrint: Variability-Aware Task Allocation for Energy-Efficient Quality of Service Provisioning in Embedded Streaming Multimedia Applications</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2011.127</link>
     <description>Multimedia streaming applications running on next-generation parallel multiprocessor arrays in sub-45nm technology face new challenges related to device and process variability, leading to performance and power variations across the cores. In this context, Quality of Service (QoS), as well as energy efficiency, could be severely impacted by variability. In this work we propose a run-time variability-aware workload distribution technique for enhancing real-time predictability and energy efficiency based on an innovative Linear-Programming + Bin-Packing formulation which can be solved in linear time. We demonstrate our approach on the virtual prototype of a next-generation industrial multi-core platform running a multithread MPEG2 decoder. Experimental results confirm that our technique compensates variability, while improving energy-efficiency and minimizing deadline violations in presence of performance and power variations across the cores.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2011.127</guid>
  </item>
  <item>
     <title>PrePrint: vCUDA: GPU-Accelerated High-Performance Computing in Virtual Machines</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2011.112</link>
     <description>This paper describes vCUDA, a general-purpose graphics processing unit (GPGPU) computing solution for virtual machines (VMs). vCUDA allows applications executing within VMs to leverage hardware acceleration, which can be beneficial to the performance of a class of high-performance computing (HPC) applications. The key insights in our design include API call interception and redirection and a dedicated RPC system for VMs. With API interception and redirection, Compute Unified Device Architecture (CUDA) applications in VMs can access a graphics hardware device and achieve high computing performance in a transparent way. In the current study, vCUDA achieved a near-native performance with the dedicated RPC system. We carried out a detailed analysis of the performance of our framework. Using a number of unmodified official examples from CUDA SDK and third-party applications in the evaluation, we observed that CUDA applications running with vCUDA exhibited a very low performance penalty in comparison with the native environment, thereby demonstrating the viability of vCUDA architecture.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2011.112</guid>
  </item>
  <item>
     <title>PrePrint: Comments on "Provably Sublinear Point Multiplication on Koblitz Curves and Its Hardware Implementation"</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2011.109</link>
     <description>In 2008, Dimitrov et al. proposed a point multiplication algorithm on Koblitz curves using multiple-base expansions. They claimed that their algorithm is the first provably sublinear point multiplication algorithm on Koblitz curves. In this paper, we show that the well-known tau-adic NAF method is already sublinear and also guarantees a better average performance.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2011.109</guid>
  </item>
  <item>
     <title>PrePrint: A Classified Multi-Suffix Trie for IP Lookup and Update</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2011.86</link>
     <description>In this paper, a new data structure, called the classified multi-suffix trie (CMST), is proposed for designing dynamic router-tables. CMST achieves a better performance than existing data structures because each node can store more than one prefix and the longest matching prefix may be found in an internal node rather than on a leaf. Furthermore, with the classification in each node, the dynamic router-table operations can be performed efficiently. To reduce the memory requirement, we store each prefix's corresponding suffix in a CMST node, instead of storing a full binary string. Experiments using real IPv4 routing databases demonstrate that the proposed data structure is efficient in terms of memory usage and it performs well in terms of the lookup, insert and delete operations. We report the results of experiments conducted to compare the performance of the proposed data structure with that of other structures using the benchmark IPv4 prefix databases AS4637, AS6447, and AS65000 with 219,581, 296,552, and 226,847 prefixes respectively.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2011.86</guid>
  </item>
  <item>
     <title>PrePrint: New Design For Testability Approach for Clock Fault Testing</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2011.59</link>
     <description>We propose a new design for testability approach for testing clock faults of next generation high performance microprocessors. In fact, it has been shown that conventional manufacturing test is unable to guarantee their detection, although they could compromise the effectiveness of delay fault testing, as well as the microprocessor correct operation in the field. These conditions will of course worsen with technology scaling, due to the expected increase in fault likelihood, included clock faults. To deal with these problems we propose a new design for testability approach that, by means of simple modifications to conventional clock buffers, allows clock fault detection through any conventional manufacturing test approach. This is achieved at the cost of very low increase in area and power consumption of clock buffers, and with no additional test cost or impact on the microprocessor performance and infield operation. We then introduce a possible further modification to clock buffers that, at additional limited cost allows their calibration after fabrication in order to compensate for parameter variations possibly occurring during manufacturing, thus minimizing the likelihood of either false test fails, or test misses. We show the application of our approach to the clock network of the Pentium 4 microprocessor.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2011.59</guid>
  </item>
  <item>
     <title>PrePrint: A Note on Diagnosability of Large Fault Sets on Star Graphs</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2010.234</link>
     <description>Diagnosability of a system directly refers to the maximum number of faulty vertices that can be identified by the system. Somani et al. (IEEE Trans. on Computers 45, 892-903 (1996)) proposed a generalized measure to increase the degree of diagnosability of the hypercubes and star graphs. This paper provides counterexamples for the results of diagnosability of star graphs.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2010.234</guid>
  </item>
  <item>
     <title>PrePrint: Minimizing Eavesdropping Risk by Transmission Power Control in Multihop Wireless Networks</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2007.1066</link>
     <description>To defend against reconnaissance activity in adhoc wireless networks, we propose transmission power control as an effective mechanism for minimizing the eavesdropping risk. Our main contributions are as follows: First, we cast the w-th order eavesdropping risk as the maximum probability of packets being eavesdropped when there are w adversarial nodes in the network. Second, we derive the closed-form solution of the first order eavesdropping risk as a polynomial function of the normalized transmission radius. This derivation assumes a uniform distribution of user nodes. Then we generalize the model to allow arbitrary user nodes distribution and prove that the uniform user distribution minimizes the first order eavesdropping risk. This result plays an essential role in deriving analytical bounds for the eavesdropping risk given arbitrary user distributions. Our simulation results show that for a wide range of non-uniform traffic patterns, the difference of their eavesdropping risk values from the corresponding lower bounds is 3dB or less.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2007.1066</guid>
  </item>
  <item>
     <title>PrePrint: Handauth: Efficient Handover Authentication with Conditional Privacy for Wireless Networks</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2011.258</link>
     <description>Existing mechanisms for handover authentication mainly focus on designing a secure authentication module, little attention has been paid to protect users' privacy when they are authenticated by the access points for data access. Further, most existing approaches do not support user revocation. In this paper, we present a secure and efficient authentication protocol named Handauth. Similar to the mechanisms of this field, Handauth provides user authentication and session key establishment. However, compared to other well known approaches, Handauth not only enjoys both computation and communication efficiency, but also achieves strong user anonymity and untraceablility, forward secure user revocation, conditional privacy-preservation, AAA server anonymity, access service expiration management, access point authentication, easily scheduled revocation, dynamic user revocation and attack-resistance. Experimental results show that the proposed approach is feasible for real applications.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2011.258</guid>
  </item>
  <item>
     <title>IEEE Transactions on Computers - February 2012 (Vol. 61, No. 2)</title>
     <link>http://opac.ieeecomputersociety.org/opac?year=2012&amp;volume=61&amp;issue=02&amp;acronym=tc</link>
     <description>IEEE Transactions on Computers</description>
     <guid isPermaLink="true">http://www.computer.org/portal/site/tc/</guid>
  </item>
  <item>
     <title>PrePrint: A Distributed TCAM Coprocessor Architecture for Integrated Longest Prefix Matching, Policy Filtering and Content Filtering</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2011.255</link>
     <description>Longest Prefix Matching (LPM), Policy Filtering (PF) and Content Filtering (CF) are three important tasks for Internet nowadays. It is both technologically and economically important to develop integrated solutions to the effective execution of the three tasks. To this end, in this paper, we propose a distributed Ternary Content Addressable Memory (TCAM) coprocessor architecture. The integrated solution exploits the complementary lookup load and storage load requirements of the three tasks to balance the lookup load and storage load among the TCAMs. A prefix filtering based CF algorithm is designed to reduce the lookup load and a novel cache system is developed to dynamically handle the lookups from overloaded TCAMs. Simulations based on real-world traffic traces show that the proposed solution can perform all three tasks given a 10 Gbps line rate using only the resources required to perform just the CF task given a 10 Gbps line rate.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2011.255</guid>
  </item>
  <item>
     <title>PrePrint: On Resource Placement in Gaussian and EJ Interconnection Networks</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2011.254</link>
     <description>...</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2011.254</guid>
  </item>
  <item>
     <title>PrePrint: WLAN Location Service with TXOP</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2011.253</link>
     <description>The provision of location based services with high positional accuracy requires the use of Time of Arrival (TOA) based techniques. However, existing TOA based WLAN location service schemes are inefficient due to the individual query and response ranging method employed. We present a highly efficient WLAN location service architecture which includes a modification to the Transmit Opportunity (TXOP) technique in the IEEE 802.11e standard. Our Location Service with TXOP (LSOP) scheme achieves high efficiency by minimizing the number of TOA transmissions and eliminating the contention overhead for TOA messages. The adaptation of TXOP technique also improves location accuracy by protecting TOA messages from collision and by grouping the TOA messages into one compact burst. Our analysis shows that the LSOP scheme achieves the highest location update rate compared to previous schemes. Our simulation results show that the LSOP scheme has minimum impact on data traffic and achieves higher accuracy than the previous schemes. Experimental results demonstrate the degradation in localization performance caused by packet collisions. These results validate that our LSOP scheme, which implements contention free broadcast of TOA messages with a modified TXOP, provides the best combination of high location update rate, low network load and high location accuracy compared to other schemes.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2011.253</guid>
  </item>
  <item>
     <title>PrePrint: Fault Models and Test Methods for Subthreshold SRAMs</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2011.252</link>
     <description>Due to the increasing demand of an extra-low-power system, a great amount of research effort has been spent in the past to develop an effective and economic subthreshold-SRAM design. However, the test methods regarding those newly developed subthreshold-SRAM designs have not yet been fully discussed. In this paper, we first categorize the subthreshold-SRAM designs into three types, study the faulty behavior of open defects and address decoders faults on each type of designs, and then identify the faults which may not be covered by a traditional SRAM test method. We will also discuss the impact of open defects and threshold-voltage mismatch on sense amplifiers under subthreshold operations. A discussion about the temperature at test will be also provided.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2011.252</guid>
  </item>
  <item>
     <title>PrePrint: Parallel AES Encryption Engines for Many-Core Processor Arrays</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2011.251</link>
     <description>By exploring different granularities of data-level and task-level parallelism, we map 16 implementations of an Advanced Encryption Standard (AES) encipher with both online and offline key expansion on a fine-grained many-core system. The smallest design utilizes only 6 cores for offline key expansion and 8 cores for online key expansion, while the largest requires 107 cores and 137 cores, respectively. The throughput of each design is examined by both synchronous dataflow models and measurements from a fabricated chip. In comparison with published AES encipher implementations on general purpose processors, our design has 3.5-15.6 times higher throughput per area and 8.2-18.1 times higher energy efficiency. Moreover, the design shows 2.0 times higher throughput than the TI DSP C6201, and 3.3 times higher throughput per area and 2.9 times higher energy efficiency than the GeForce 8800 GTX.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2011.251</guid>
  </item>
  <item>
     <title>PrePrint: NoC-based FPGA Acceleration for Monte Carlo Simulations with Applications to SPECT Imaging</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2011.250</link>
     <description>As the number of transistors that are integrated onto a silicon die continues to increase, the compute power is becoming a commodity. This has enabled a whole host of new applications that rely on high-throughput computations. Recently, the need for faster and cost-effective applications in form-factor constrained environments has driven an interest in on-chip acceleration of algorithms based on Monte Carlo simulations. Though Field Programmable Gate Arrays (FPGAs), with hundreds of on-chip arithmetic units, show significant promise for accelerating these embarrassingly parallel simulations, a challenge exists in sharing access to simulation data amongst many concurrent experiments. This paper presents a compute architecture for accelerating Monte Carlo simulations based on the Network-on-Chip (NOC) paradigm for on-chip communication. We demonstrate through the complete implementation of a Monte Carlo-based image reconstruction algorithm for Single-Photon Emission Computed Tomography (SPECT) imaging that this complex problem can be accelerated by two orders of magnitude on even a modestly-sized FPGA over a 2GHz Intel Core 2 Duo Processor. The architecture and the methodology that we present in this paper is modular and hence it is scalable to problem instances of different sizes, with application to other domains that rely on Monte Carlo simulations.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2011.250</guid>
  </item>
  <item>
     <title>PrePrint: Dynamic Bit Encoding for Privacy Protection Against Correlation Attacks in RFID Backward Channel</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2011.248</link>
     <description>Nowadays Radio Frequency Identification (RFID) technologies are applied in many fields for a variety of applications. Though bringing great productivity gains, RFID systems may cause new security and privacy threats to individuals or organizations. Therefore, it is important to protect the security of RFID systems and the privacy of RFID tag owners. Unfortunately, none of the existing solutions provide a complete defense against eavesdroppers who could monitor the communication between RFID readers and tags and recover the contents of tags. Based on our research, we propose two novel RFID backward channel protection protocols, namely dynamic bit encoding and optimized dynamic bit encoding. Our schemes are able to achieve high anonymity with limited communication overhead. Our extensive simulations show that both proposed schemes provide much stronger backward channel protection than existing techniques. In addition, analytical models were created and validated through comparisons with simulation results.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2011.248</guid>
  </item>
  <item>
     <title>PrePrint: A Novel Heuristic Method for Application Dependent Testing of a SRAM-Based FPGA Interconnect</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2011.247</link>
     <description>This paper presents a new method for generating configurations for application dependent testing of a SRAM-based FPGA interconnect. This method connects an activating input to multiple nets, thus generating activating test vectors for detecting stuck-at, open and bridging faults. This arrangement permits a reduction in the number of redundant configurations, thus also achieving a reduction in test time for application-dependent testing at full fault coverage. As the underlying solution requires an exponential complexity, a heuristic algorithm that is polynomial and greedy in nature (based on sorting) is used for net selection in the configuration generation process. It is proved that this algorithm has an execution complexity of O(L^3) (where L is the number of LUTs in the design). The proposed method requires at most log2(M+2) configurations (where M denotes the number of activating inputs) as Walsh coding is employed. Moreover, it is scalable with respect to LUT inputs. Extensive logic based simulation results are provided for ISCAS89 sequential benchmark designs implemented on Xilinx Virtex4 FPGAs; these results shows that the proposed method achieves a considerable reduction in the number of test configurations compared with methods found in the technical literature (on average, a reduction of 49.5%).</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2011.247</guid>
  </item>
  <item>
     <title>PrePrint: Low Cost NBTI Degradation Detection &amp;amp; Masking Approaches</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2011.246</link>
     <description>Performance degradation of integrated circuits due to aging effects, such as Negative Bias Temperature Instability (NBTI), is becoming a great concern for current and future CMOS technology. In this paper we propose two monitoring and masking approaches that detect late transitions due to NBTI degradation in the combinational part of critical data-paths and guarantee the correctness of the provided output data by adapting the clock frequency. Compared to recently proposed alternative solutions, one of our approaches (denoted as Low Area and Power (LAP) approach) requires lower area overhead and lower, or comparable, power consumption, while exhibiting the same impact on system performance, while the other proposed approach (denoted as High Performance (HP) approach) allows us to reduce the impact on system performance, at the cost of some increase in area and power consumption.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2011.246</guid>
  </item>
  <item>
     <title>PrePrint: Privacy-Preserving Public Auditing for Secure Cloud Storage</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2011.245</link>
     <description>Using Cloud Storage, users can remotely store their data and enjoy the on-demand high quality applications and services from a shared pool of configurable computing resources, without the burden of local data storage and maintenance. However, the fact that users no longer have physical possession of the outsourced data makes the data integrity protection in Cloud Computing a formidable task, especially for users with constrained computing resources. Moreover, users should be able to just use the cloud storage as if it is local, without worrying about the need to verify its integrity. Thus, enabling public auditability for cloud storage is of critical importance so that users can resort to a third party auditor (TPA) to check the integrity of outsourced data and be worry-free. To securely introduce an effective TPA, the auditing process should bring in no new vulnerabilities towards user data privacy, and introduce no additional online burden to user. In this paper, we propose a secure cloud storage system supporting privacy-preserving public auditing. We further extend our result to enable the TPA to perform audits for multiple users simultaneously and efficiently. Extensive security and performance analysis show the proposed schemes are provably secure and highly efficient.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2011.245</guid>
  </item>
  <item>
     <title>PrePrint: Minimizing Probing Cost and Achieving Identifiability in Probe Based Network Link Monitoring</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2011.244</link>
     <description>Continuously monitoring link performance is important to network diagnosis. In this paper, we address the problem of minimizing the probing cost and achieving identifiability in probe based network link monitoring. Given a set of links to monitor, our objective is to select the minimum number of probing paths that can uniquely determine all identifiable links and cover all unidentifiable links. We propose an algorithm based on a linear system model to find out all irreducible sets of probing paths that can uniquely determine an identifiable link, and we extend the bipartite model to reflect the relationship between a set of probing paths and an identifiable link. Since our optimization problem is NP-hard, we propose a heuristic based algorithm to greedily select probing paths. Our method eliminates two types of redundant probing paths, i.e., those that can be replaced by others and those without any contribution to achieving identifiability. Simulations based on real network topologies show that our approach can achieve identifiability with very low probing cost. Compared with prior work, our method is more general and has better performance.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2011.244</guid>
  </item>
  <item>
     <title>PrePrint: Integer Codes Correcting Burst Errors Within A Byte</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2011.243</link>
     <description>This paper presents a class of integer codes that can correct any burst of length &amp;#x2264; / within a b-bit byte. Their main advantages lie in linear complexity of encoding and decoding procedures, as well as in the fact that a look-up table based error control procedure requires relatively small memory resources.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2011.243</guid>
  </item>
  <item>
     <title>PrePrint: A General Framework of Side-Channel Atomicity for Elliptic Curve Scalar Multiplication</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2011.242</link>
     <description>Simple power attack (SPA) is a type of side-channel attack (SCA). In literature, many SPA-resistant scalar multiplication algorithms have been proposed, but most are inefficient and not interoperable with other coding methods. To prevent SPA, Chevallier-Mames et al. proposed a technique called side-channel atomicity for pure binary number systems. Using their method, extra costs for preventing SPA can be limited. Even though many researchers have extended this technique to other number systems, their algorithms are for specific cases and few provide implementation results. In this paper, we generalize the atomicity technique to protect nearly all existing fast coding methods/number systems. Our general framework provides security and flexibility while its efficiency is coupled to that of the coding methods. Moreover, we utilize our framework to protect the known fastest scalar multiplications by exploring application on the GLV method for GLS curves. Proof of concept programs are written in the C language along with assembly for fast field operations and run on AMD Athlon X2 245 based hardware.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2011.242</guid>
  </item>
  <item>
     <title>PrePrint: Increasing the Effectiveness of Directory Caches by Avoiding the Tracking of Non-Coherent Memory Blocks</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2011.241</link>
     <description>A key aspect in the design of multiprocessors is the cache coherence protocol. Although directory-based protocols constitute the most scalable approach, the limited size of the directory caches together with the growing size of systems may cause frequent evictions and, consequently, the invalidation of cached blocks, which jeopardizes system performance. Directory caches keep track of every memory block stored in processor caches. However, a significant fraction of the cached memory blocks do not require coherence maintenance because they are either accessed by just one processor or they are never modified. We propose to deactivate the coherence protocol for those blocks that do not require coherence. This deactivation means directory caches not to keep track of non-coherent blocks, which reduces directory cache occupancy and increases its effectiveness. Since the detection of non-coherent blocks is carried out by the operating system, our proposal only requires minor modifications. Simulation results show that, thanks to our proposal, directory caches can avoid the tracking of about 66% of the blocks accessed by a wide range of applications, thereby improving the efficiency of directory caches. This contributes either to shortening the runtime of parallel applications by 15% while keeping directory cache size or to maintaining performance while using directory caches 16 times smaller.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2011.241</guid>
  </item>
  <item>
     <title>PrePrint: Computing Accurate Performance Bounds for Best Effort Networks-on-Chip</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2011.240</link>
     <description>Real-time (RT) communication support is a critical requirement for many complex embedded applications which are currently targeted to Network-on-chip (NoC) platforms. In this paper, we present novel methods to efficiently calculate worst-case bandwidth and latency bounds for RT traffic streams on wormhole-switched NoCs with arbitrary topology. The proposed methods apply to best-effort NoC architectures, with no extra hardware dedicated to RT traffic support. By applying our methods to several realistic NoC designs, we show substantial improvements (more than 30% in bandwidth and 50% in latency, on average) in bound tightness with respect to existing approaches.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2011.240</guid>
  </item>
  <item>
     <title>PrePrint: Elevator-First: a Deadlock-Free Distributed Routing Algorithm for Vertically Partially Connected 3D-NoCs</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2011.239</link>
     <description>In this paper we propose a distributed routing algorithm for vertically partially connected regular 2D-topologies of different shapes and sizes (e.g. 2D-mesh, torus, ring). The topologies that are the target of this algorithm are of practical interest in the 3D integration of heterogeneous dies using Through-Silicon-Vias (TSVs). Indeed, TSV-based 3D integration allows to envision the stacking of dies with different functions and technologies, using as an interconnect backbone a 3D-NoC. Intrinsically 3D topologies have better performances, but yield and active area (and thus the cost) are function of the number of TSVs, therefore the designs tend to use only a subset of available TSVs between two dies. The definition of blockage free and low implementation cost distributed deterministic routing on this kind of topology is thus of theoretical and practical interests. We formally prove that independently of the shape and dimensions of the planar topologies and of the number and placement of the TSVs, the proposed routing algorithm using two virtual channels in the plane is deadlock and livelock free. We also experimentally show that the performance of this algorithm is still acceptable when the number of vertical connections decreases.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2011.239</guid>
  </item>
  <item>
     <title>PrePrint: SenSmart: Adaptive Stack Management for Multitasking Sensor Networks</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2011.238</link>
     <description>The networked application environment has motivated the development of multitasking operating systems for sensor networks and other low-power electronic devices, but their multitasking capability is severely limited because traditional stack management techniques perform poorly on small-memory systems without virtual memory support. In this paper, we show that combining binary translation and a new kernel runtime can lead to efficient OS designs on resource constrained platforms. We introduce SenSmart, a multitasking OS for sensor networks, and present new OS design techniques for supporting preemptive multi-task scheduling, memory isolation, and adaptive stack management. Our solution provides memory isolation and automatic stack relocation on usual sensornet platforms. The adaptive stack management frees programmers from the burden of estimating tasks--stack usage, yet it enables SenSmart to schedule and run more tasks than other multitasking OSes for sensor networks. We have implemented SenSmart on MICA2/MICAz motes. Evaluation shows that SenSmart has a significantly better capability in managing concurrent tasks than other sensornet operating systems.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2011.238</guid>
  </item>
  <item>
     <title>PrePrint: Elastic Buffer Flow Control for On-Chip Networks</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2011.237</link>
     <description>Networks-on-chip (NoCs) were developed to meet the communication requirements of large-scale systems. The majority of current NoCs spend considerable area and power for router buffers. In our past work, we have developed elastic buffer (EB) flow control which adds simple control logic in the channels to use pipeline flip-flops (FFs) as EBs with two storage locations. This way, channels act as distributed FIFOs and input buffers are no longer required. Removing buffers and virtual channels (VCs) significantly simplifies router design. Compared to VC networks, EB networks provide an up to 45% shorter cycle time, 43% more throughput per unit power or 22% more throughput per unit area. EB networks provide traffic classes using duplicate physical subnetworks. However, this approach negates the cost gains or becomes infeasible for a large number of traffic classes. Therefore, in this paper we propose a hybrid EB-VC router which provides an arbitrary number of traffic classes by using an input buffer to drain flits facing severe contention or deadlock. Thus, hybrid routers operate as EB routers in the common case, and as VC routers when necessary. For this reason, the hybrid EB-VC scheme offers 21% more throughput per unit power than VC networks and 12% than EB networks.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2011.237</guid>
  </item>
  <item>
     <title>PrePrint: Optimally Removing Inter-Core Communication Overhead for Streaming Applications on MPSoCs</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2011.236</link>
     <description>This paper aims to totally remove inter-core communication overhead with joint computation and communication task scheduling for streaming applications on MPSoCs (Multiprocessor Systemon-Chips). Our basic idea is to let some computation and communication tasks be executed in earlier periods (the added periods are called the prologue) such that inter-core data transfer can be finished before the execution of the tasks that need the data to start. In particular, we solve the following problem: how to do rescheduling in such a way that the schedule length can be minimized with the minimum prologue length (the number of periods in the prologue) while the inter-core communication overhead can be totally removed? To solve this problem, we first perform schedulability analysis and obtain the upper bound of the times needed to reschedule each computation task. Then we formulate the problem as an ILP (Integer Linear Programming) formulation and obtain an optimal solution. We evaluate our technique with a set of benchmarks from both real-life streaming applications and synthetic task graphs. The experimental results show that our technique can achieve significant reductions in schedule length and energy consumption compared with the previous work.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2011.236</guid>
  </item>
  <item>
     <title>PrePrint: Antecedence Graph Approach to Checkpointing for Fault Tolerance in Mobile Agent Systems</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2011.235</link>
     <description>The flexibility offered by mobile agents is quite noticeable in distributed computing environments. However, the greater flexibility of the mobile agent paradigm compared to the client/server computing paradigm comes at an additional threats since agent systems are prone to failures originating from bad communication, security attacks, agent server crashes, system resources unavailability or even deadlock situations. In such events, mobile agents either get lost or damaged during execution. In this paper we propose parallel checkpointing approach based on the use of antecedence graphs for providing fault tolerance in mobile agent systems. During normal computation message transmission, the dependency information among mobile agents is recorded in the form of antecedence graphs by participating mobile agents of mobile agent group. When a checkpointing procedure begins, the initiator concurrently informs relevant mobile agents, which minimizes the identifying time. The proposed scheme utilizes the checkpointed information for fault tolerance which is stored in form of antecedence graphs. In case of failures, using checkpointed information, the antecedence graphs and message logs are regenerated for recovery and then normal operation continued. Quantitative analysis and experimental simulation show that our algorithm outperforms other coordinated checkpointing schemes in terms of the identifying time and the number of blocked mobile agents and then can provide a better system performance.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2011.235</guid>
  </item>
  <item>
     <title>PrePrint: QCA Systolic Array Design</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2011.234</link>
     <description>Quantum-dot Cellular Automata (QCA) technology is a promising potential alternative to CMOS technology. To explore the characteristics of QCA and suitable design methodologies, digital circuit design approaches have been investigated. Due to the inherent wire delay in QCA, pipelined architectures appear to be a particularly suitable design technique. Also, because of the pipeline nature of QCA technology, it is not suitable for complicated control system design. Systolic arrays take advantage of pipelining, parallelism and simple control. Therefore, an investigation into systolic array design in QCA technology is provided in this paper. Two case studies, namely a matrix multiplier and a Galois Field multiplier are designed and analyzed based on both multilayer and coplanar crossings. The performance of these two types of interconnections are compared and it is found that even though coplanar crossings are currently more practical, they tend to occupy a larger design area and incur slightly more clocking zone delays in coplanar to multilayer crossings. A general semi-conductor QCA systolic array design methodology is also proposed. It is found that by applying a systolic array structure in QCA design, significant benefits can be achieved particularly with large systolic arrays, even more so than when applied in CMOS-based technology.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2011.234</guid>
  </item>
  <item>
     <title>PrePrint: Taming Hardware Event Samples for Precise and Versatile Feedback Directed Optimizations</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2011.233</link>
     <description>Feedback-directed optimization (FDO) is effective in improving application performance, but has not been widely adopted due to the tedious dual-compilation model, the difficulties in generating representative training data sets, and the high runtime overhead of profile collection. The use of hardware-event sampling to generate estimated execution profiles overcomes these drawbacks. Yet, hardware event samples are typically not precise at the instruction or basic-block granularity. These inaccuracies lead to missed performance when compared to instrumentation-based FDO. In this paper, we use PMU based sampling to collect the instruction frequency profiles. By collecting profiles using multiple events, and applying heuristics to predict the accuracy, we managed to improve the accuracy of the profile. We also show how emerging techniques can be used to further improve the accuracy of sampling based profile. Additionally, these emerging techniques are used to collect value profiles, as well as to assist a lightweight inter-procedural optimizer. All these profiles are represented in a portable form, thus they can be used across different platforms. We demonstrate that sampling based FDO can achieve an average of 92% of the performance gains obtained using instrumentation-based exact edge profiles for both SPEC CINT2000 and 2006 benchmarks. The overhead of collection is only 0.93% on average, while compiler based instrumentation incurs 2.0%-351.5% overhead.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2011.233</guid>
  </item>
  <item>
     <title>PrePrint: SEDUM: Exploiting Social Networks in Utility-Based Distributed Routing for DTNs</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2011.232</link>
     <description>Probabilistic forwarding in DTNs that forwards a message to a node with a higher delivery utility enhances single-copy routing. However, current probabilistic forwarding methods only consider node contact frequency in calculating the utility while neglecting the influence of contact duration on the throughput, though both contact frequency and contact duration reflect the node movement pattern in a social network. In this paper, we theoretically prove that considering both factors leads to higher throughput than considering only contact frequency. To fully exploit a social network for high throughput, we propose a Social network oriEnted and Duration Utility based distributed Multi-copy routing protocol (SEDUM). SEDUM is distinguished by three features. First, it considers both contact frequency and duration in node movement patterns of social networks. Second, it uses multi-copy routing and can discover the minimum number of copies of a message to achieve a desired routing delay. Third, it has an effective buffer management mechanism to increase throughput and decrease routing delay. Theoretical analysis and simulation results show that SEDUM provides high throughput compared to existing routing approaches. The results conform to our expectation that considering both contact frequency and duration for delivery utility in routing can achieve higher throughput than considering only contact frequency.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2011.232</guid>
  </item>
  <item>
     <title>PrePrint: Fast Deep Packet Inspection with a Dual Finite Automata</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2011.231</link>
     <description>Deep packet inspection, in which payload is matched against a large set of patterns, is an important algorithm in many networking applications. Non-deterministic finite automaton (NFA) and deterministic finite automaton (DFA) are the basis of existing algorithms. However, both NFA and DFA are not ideal for real-world rule-sets: NFA has the minimum storage, but the maximum memory bandwidth; while DFA has the minimum memory bandwidth, but the maximum storage. Specifically, NFA and DFA cannot handle the presence of character sets, wildcards, and repetitions of character set or wildcard in real-world rule-sets. In this paper, we propose and evaluate a dual finite automaton (dual FA) to address these shortcomings. The dual FA consists of a linear finite automaton (LFA) and an extended deterministic finite automaton (EDFA). The LFA is simple to implement and it provides an alternative approach to handle the repetition of character set and wildcard (which could otherwise cause the state explosion problem in a DFA) without increasing memory bandwidth. We evaluate the automaton in real-world rule-sets, using different synthetic payload streams. The results show that dual FA can reduce the number of states up to five orders of magnitude while their memory bandwidth is close to minimum.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2011.231</guid>
  </item>
  <item>
     <title>PrePrint: LS-Sig: Locality-Sensitive Signatures for Transactional Memory</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2011.230</link>
     <description>Transactional Memory (TM) is an alternative to conventional multithreaded programming to ease the writing of concurrent programs. In the context of unbounded TM, concurrent threads may use hardware signatures to record all the memory addresses issued inside a transaction to detect conflicts. Signatures are usually implemented as per-thread fixed hardware Bloom filters that summarize a very large amount of read and write memory addresses at the cost of false conflicts (detection of non-existing conflicts). In this paper, to reduce the probability of false conflicts, a novel signature design that exploits spatial locality is proposed. The design is based on new hash function mappings, so that nearby located addresses share some bits inserted in the filters. This is favorable particularly for large transactions that usually exhibit some amount of spatial locality. Besides, its implementation does not require extra hardware. The proposed signature was experimentally evaluated using the GEMS simulator and all the codes of the STAMP benchmark suite. In most cases, the results show significant improvement, particularly in the codes that involve long-running, large-data transactions.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2011.230</guid>
  </item>
  <item>
     <title>PrePrint: Compiler-Directed Energy Reduction for Voltage Islands</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2011.229</link>
     <description>Addressing power and energy consumption related issues early in the system design flow ensures good design and minimizes iterations for faster turnaround time. Recent research demonstrates that voltage islands provide the flexibility to reduce power by selectively shutting down the different regions of the chip and/or running the select parts of the chip at different voltage/frequency levels. As against most of the prior work on voltage islands that mainly focused on the architecture design and IP placement related issues, this paper studies the necessary software compiler support for voltage islands. Specifically, we focus on an embedded multiprocessor architecture that supports both voltage islands and control domains within these islands, and determine how an optimizing compiler can automatically map an embedded application onto this architecture. Such an automated support is critical since it is unrealistic to expect an application programmer to reach a good mapping correlating multiple factors such as performance and energy at the same time. Our experiments with the proposed compiler support show that our approach is very effective in reducing energy consumption. The experiments also show that the energy savings we achieve are consistent across a wide range of values of our major simulation parameters.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2011.229</guid>
  </item>
  <item>
     <title>PrePrint: A Quick Pessimistic Diagnosis Algorithm for Hypercube-like Multiprocessor Systems under the PMC Model</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2011.228</link>
     <description>Processor fault diagnosis is an essential subject for the reliability of a multicomputer system. The precise strategy and the pessimistic strategy are two classical diagnostic strategies which are based on the well-known PMC model. The precise strategy problem of fault diagnosis is discussed widely and demands that all processors be identified correctly, specifically that all fault-free processors are identified as 'fault-free' and all faulty processors are identified as 'faulty'. The pessimistic diagnosis strategy proposed by Kavianpour and Friedman \cite{KavFried1978} is a process to diagnose faults that allows all faulty processors to be isolated within a set that contains at most one fault-free processor. In this paper, we study the pessimistic diagnosis strategy under the PMC model for hypercube-like multicomputer systems. The contribution is to propose an efficient pessimistic diagnosis algorithm for hypercube-like multicomputer systems. If we denote by $N$ the total number of processors in the hypercube-like system to be diagnosed, then the algorithm can run in $O(N)$ time.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2011.228</guid>
  </item>
  <item>
     <title>PrePrint: Efficient Hardware Implementations of BRW Polynomials and Tweakable Enciphering Schemes</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2011.227</link>
     <description>A new class of polynomials was introduced by Bernstein (Bernstein 2007) which were later named by Sarkar as Bernstein-Rabin-Winograd (BRW) polynomials (Sarkar 2009). For the purpose of authentication, BRW polynomials offer considerable computational advantage over usual polynomials: $(m-1)$ multiplications for usual polynomial hashing versus $\lfloor \frac{m}{2}\rfloor$ multiplications and $\lceil log_2m\rceil$ squarings for BRW hashing, where $m$ is the number of message blocks to be authenticated. In this paper, we develop an efficient pipelined hardware architecture for computing BRW polynomials. The BRW polynomials have a nice recursive structure which is amenable to parallelization. While exploring efficient ways to exploit the inherent parallelism in BRW polynomials we discover some interesting combinatorial structural properties of such polynomials. These are used to design an algorithm to decide the order of the multiplications which minimizes pipeline delays. Using the nice structural properties of the BRW polynomials we present a hardware architecture for efficient computation of BRW polynomials. Finally we provide implementations of tweakable enciphering schemes proposed in Sarkar 2009 which use BRW polynomials. This leads to the fastest known implementation of disk encryption systems.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2011.227</guid>
  </item>
  <item>
     <title>PrePrint: Randomized Throughput-Optimal Oblivious Routing for Torus Networks</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2011.226</link>
     <description>In this paper, we study the problem of optimal oblivious routing for one and two dimensional torus networks. We introduce a new closed-form oblivious routing algorithm called W2TURN that is worst-case throughput optimal for 2D torus networks. W2TURN is based on a weighted random selection of paths that contain at most two turns. Restricting the maximum number of turns in routing paths to just two enables a simple deadlock-free implementation of W2TURN. In terms of average hop count, W2TURN outperforms the best previously known closed-form worst-case throughput optimal routing algorithm called IVAL (Improved Valiant). When the network radix is odd, W2TURN achieves the minimum average hop count that can be achieved with 2-turn paths while remaining worst-case throughput optimal. When the network radix is even, W2TURN comes very close to achieving the minimum average hop count while remaining worst-case throughput optimal, within just 0.72% on a 12&amp;#x00D7;12 torus. Finally, we present a new optimal weighted random routing algorithm for rings called WRD (Weighted Random Direction). WRD provides a closed-form expression for the optimal distribution of traffic along the minimal and non-minimal directions in a ring topology to achieve minimum average hop count while guaranteeing optimal worst-case throughput.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2011.226</guid>
  </item>
  <item>
     <title>PrePrint: Using Virtual Secure Circuit to Protect Embedded Software from Side-Channel Attacks</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2011.225</link>
     <description>Side-Channel Attacks (SCAs) can break a cryptographic implementation within a very short time, and therefore, has become a practical threat to embedded security. This work presents Virtual Secure Circuit (VSC) as a software countermeasure to SCA. VSC provides protection to software by emulating WDDL, an SCA-resistant hardware circuit style. VSC is algorithm-independent. This enables designers to protect different cryptographic software with only one solution. This work proposes the concept of VSC together with two implementation schemes. One scheme is based on a custom-instruction single-core processor architecture and the other on a dual-core architecture. Correspondingly, we built two prototypes on FPGA systems. Experiments with real-world side-channel power and electromagnetic attacks demonstrate that, compared with the unprotected software, VSC on single-core processor provides 20 times security improvement. The experiments also show that, although VSC on dual-core processor does not thwart electromagnetic attacks, it offers more than 25 times security improvement against power attacks. We conclude that VSC is comparable in security improvement to WDDL but is more flexible and has much lower hardware cost.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2011.225</guid>
  </item>
  <item>
     <title>PrePrint: An Adjacent Switching Activity Metric Under Functional Broadside Tests</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2011.224</link>
     <description>The local switching activity of scan-based tests is important due to the possibility that scan-based tests will result in excessive power dissipation in certain subcircuits even when the total power dissipation is acceptable. This paper focuses on the local switching activity during the fast functional capture cycles of functional broadside tests. This switching activity is guaranteed not to exceed the switching activity possible during functional operation. Therefore, with functional broadside tests it is possible to maximize the switching activity without causing excessive power dissipation. This is important for test quality since, in general, higher switching activity allows more delay defects to be detected. In addition, it allows smaller test sets to be obtained for delay faults. The paper defines a switching activity metric called the adjacent switching activity that captures the switching activity around the sites of detected transition faults, where additional switching activity is most likely to contribute to test quality. It compares the cases where the adjacent and the total switching activity of functional broadside tests for transition faults are maximized. The results demonstrate that the two objectives result in significantly different test sets. Moreover, better quality test sets are obtained by maximizing the adjacent switching activity.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2011.224</guid>
  </item>
  <item>
     <title>PrePrint: An Approach to Source-Code Plagiarism Detection and Investigation using Latent Semantic Analysis</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2011.223</link>
     <description>Plagiarism is a growing problem in academia. Academics often use plagiarism detection tools to detect similar source-code files. Once similar files are detected, the academic proceeds with the investigation process which involves identifying the similar source-code fragments within them that could be used as evidence for proving plagiarism. This paper describes PlaGate a novel tool that can be integrated with existing plagiarism detection tools to improve plagiarism detection performance. The tool also implements a new approach for investigating the similarity between source-code files with a view to gathering evidence for proving plagiarism. Graphical evidence is presented that allows for the investigation of source-code fragments with regards to their contribution towards evidence for proving plagiarism. The graphical evidence indicates the relative importance of the given source-code fragments across files in a corpus. This is done by using the Latent Semantic Analysis information retrieval technique to detect how important they are within the specific files under investigation in relation to other files in the corpus.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2011.223</guid>
  </item>
  <item>
     <title>PrePrint: Peer-Assisted On-Demand Streaming: Characterizing Demands and Optimizing Supplies</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2011.222</link>
     <description>Nowadays, there has been significant deployment of peer-assisted on-demand streaming services over the Internet. Two of the most unique and salient features in a peer-assisted on-demand streaming system are the differentiation in the demand (or request) and the prefetching capability with caching. In this paper, we develop a theoretical framework based on queueing models, in order to (1) justify the superiority of service prioritization based on a taxonomy of requests, and (2) understand the fundamental principles behind optimal prefetching and caching designs in peer-assisted on-demand streaming systems. The focus is to instruct how limited uploading bandwidth resources and peer caching capacities can be utilized most efficiently to achieve better system performance. To achieve these objectives, we first use priority queueing analysis to prove how service quality and user experience can be statistically guaranteed, by prioritizing requests in the order of significance, including urgent playback (e.g., random seeks or initial startup), normal playback, and prefetching. We then proceed to construct a fine-grained stochastic supply-demand model to investigate peer caching and prefetching as a global optimization problem. This not only provides insights in understanding the fundamental characterization of demand, but also offers guidelines towards optimal prefetching and caching strategies in peer-assisted on-demand streaming systems.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2011.222</guid>
  </item>
  <item>
     <title>PrePrint: Efficient Byzantine Fault Tolerance</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2011.221</link>
     <description>We present two asynchronous Byzantine fault-tolerant state machine replication (BFT) algorithms, which improve previous algorithms in terms of several metrics. First, they require only 2f+1 replicas, instead of the usual 3f+1. Second, the trusted service in which this reduction of replicas is based is quite simple, making a verified implementation straightforward (and even feasible using commercial trusted hardware). Third, in nice executions the two algorithms run in the minimum number of communication steps for non-speculative and speculative algorithms, respectively 4 and 3 steps. Besides the obvious benefits in terms of cost, resilience and management complexity - fewer replicas to tolerate a certain number of faults - our algorithms are simpler than previous ones, being closer to crash fault-tolerant replication algorithms. The performance evaluation shows that, even with the trusted component access overhead, they can have better throughput than Castro and Liskov's PBFT, and better latency in networks with non-negligible communication delays.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2011.221</guid>
  </item>
  <item>
     <title>PrePrint: Power Control for Crossbar-based Input-Queued Switches</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2011.220</link>
     <description>We consider an N&amp;#x00D7;N input-queued switch with a crossbar-based switching fabric implemented on a single chip. The power consumption produced by the crossbar chip and due to the data transfer grows as N R^3, where R is the maximum bit rate. Thus, at increasing bit rate, power dissipation is becoming more and more challenging, limiting the crossbar scalability for high performance switches. We propose to exploit Dynamic Voltage and Frequency Scaling (DVFS) techniques to control packet transmissions through each crosspoint of the switching fabric. Our power control operates independently of the packet scheduler and exploits the knowledge of a traffic matrix obtained by on-line measurements. We propose a family of control algorithms to reduce the power consumption. The algorithms are particularly efficient in non-overloaded conditions. The actual potential of the proposed approach is also evaluated on a real design case synthesized on a 90 nm CMOS technology.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2011.220</guid>
  </item>
  <item>
     <title>PrePrint: Optimal and Heuristic Application-Aware Oblivious Routing</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2011.219</link>
     <description>Conventional oblivious routing algorithms do not take into account resource requirements (e.g., bandwidth, latency) of various flows in a given application. As they are not aware of flow demands that are specific to the application, network resources can be poorly utilized to cause serious local congestions. We present a framework for application-aware routing that assures deadlock-freedom under one or more virtual channels by forcing routes to conform to an acyclic channel dependence graph. Using the application-aware routing framework, we develop and evaluate a bandwidth-sensitive oblivious routing scheme that statically determines routes considering an application's communication characteristics. Our results show that it is possible to achieve better performance than traditional deterministic and oblivious routing schemes on popular synthetic benchmarks using our bandwidth-sensitive approach. In addition, we present methods to statically and efficiently allocate virtual channels to flows or packets, for oblivious routing, when there are two or more virtual channels per link. We show that, when oblivious routing is used and there are more flows than virtual channels per link, the static assignment of virtual channels to flows can help mitigate the effects of head-of-line blocking, encountered by flows when competing for virtual channels dynamically.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2011.219</guid>
  </item>
  <item>
     <title>PrePrint: Concurrent Multi-Resource Arbiter: Design and Applications</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2011.218</link>
     <description>This paper presents a novel type of asynchronous arbiter that allocates M interchangeable resources among N clients. This arbiter enables the concurrent utilisation of multiple resources and is a useful device in various load-balancing circuits. Dedicated request signals from the resources and the clients are used in pairs to form each new grant. The 2&amp;#x00D7;2 arbiter is examined as an accessible special case of the N&amp;#x00D7;M arbiter. A concurrent implementation is compared to fully sequential design. It is shown that the sequential design can be more practical when the time between a grant and the withdrawal of the initial request is small. The concurrent design provides higher performance in a system with a longer resource utilisation time. A scalable tiled structure is developed to extend the arbiter structure beyond 2&amp;#x00D7;2 to support N clients and M resources. Models and subsequent implementations of the tiles are presented. The tiles can be repeated without the use of additional connecting logic, enabling the construction of arbiters of larger sizes. Several examples demonstrate the usage of the arbiter.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2011.218</guid>
  </item>
   </channel>
</rss>
