<?xml version="1.0" encoding="ISO-8859-1"?>
<rss version="2.0">
<channel>
<title>IEEE Transactions on Computers</title>
<link>http://www.computer.org/tc</link>
<description>The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field. It publishes papers, brief contributions, and comments on research in areas of current interest to the readers. These areas include, but are not limited to, the following: a) computer organizations and architectures; b) operating systems, software systems, and communication protocols; c) real-time systems and embedded systems; d) digital devices, computer components, and interconnection networks; e) specification, design, prototyping, and testing methods and tools; f) performance, fault tolerance, reliability, security, and testability;
g) case studies and experimental and theoretical evaluations; and h) new and important applications and trends.	</description>
	<language>en-us</language>
	<pubDate>Sun, 19 May 2013 10:00:07 GMT</pubDate>
	<image>
		<url>http://csdl.computer.org/common/images/logos/tc.gif</url>
		<title>IEEE Computer Society</title>
		<description>List of recently published journal articles</description>
		<link>http://www.computer.org/tc</link>
	</image>
  <item>
     <title>PrePrint: Rapid Prototyping and Evaluation of Intelligence Functions of Active Storage Devices</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2013.101</link>
     <description>Active storage devices further improve their performance by executing &#x0022;intelligence functions,&#x0022; such as prefetching and data deduplication, in addition to handling the usual I/O requests they receive. Significant research has been done to develop effective intelligence functions for the active storage devices. However, laborious and time-consuming efforts are usually required to set up a suitable experimental platform to evaluate each new intelligence function. Moreover, it is difficult to make such prototypes available to other researchers and users to gain valuable experience and feedback. To overcome these difficulties, we propose IOLab, a virtual machine (VM) based platform for evaluating intelligence functions of active storage devices. The VM based structure of IOLab enables the evaluation of new (and existing) intelligence functions for different types of OSes and active storage devices with little additional effort. IOLab also supports real-time execution of intelligence functions, providing users opportunities to experience latest intelligence functions without waiting for their deployment in commercial products. Using a set of interesting case studies, we demonstrate the utility of IOLab with negligible performance overhead except for the VM's virtualization overhead.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2013.101</guid>
  </item>
  <item>
     <title>PrePrint: CLOCK-DWF: A Write-History-Aware Page Replacement Algorithm for Hybrid PCM and DRAM Memory Architectures</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2013.98</link>
     <description>Phase change memory has emerged as one of the most promising technologies to incorporate into the memory hierarchy of future computer systems. However, PCM has two critical weaknesses to substitute DRAM memory in its entirety. First, the number of write operations allowed to each PCM cell is limited. Second, write access time of PCM is about 6-10 times slower than that of DRAM. To cope with this situation, hybrid memory architectures that use a small amount of DRAM together with PCM have been suggested. This paper presents a new memory management technique for hybrid PCM and DRAM memory architecture that efficiently hides the slow write performance of PCM. Specifically, we aim to estimate future write references accurately and then absorb frequent memory writes into DRAM. To do this, we analyze the characteristics of memory write references and find two noticeable phenomena. First, using write history alone performs better than using both read and write history in estimating future write references. Second, the frequency characteristic is a better estimator than temporal locality in predicting future memory writes. Based on these two observations, we present a new page replacement algorithm called CLOCK-DWF that significantly reduces the number of writes that occur on PCM and also increases the lifespan of PCM memory.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2013.98</guid>
  </item>
  <item>
     <title>PrePrint: Booting Time Minimization for Real-Time Embedded Systems with Non-Volatile Memory</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2013.96</link>
     <description>Minimizing the booting time of an embedded system has become a major technical issue for the success of many consumer electronics. In this paper, the booting time minimization problem for real-time embedded systems with the joint consideration of DRAM and non-volatile memory is formally formulated. We show this is an NP-hard problem, and propose an optimal but pseudo-polynomial-time algorithm with dynamic programming techniques. In considering polynomial-time solutions, a 0.25-approximation greedy algorithm is provided, and a polynomial-time approximation scheme is developed to trade the optimality of the derived solution for the time complexity according to a user-specified error bound. The proposed algorithms can manage real-time embedded systems consisting of not only real-time tasks, but also initialization tasks that are executed only once during system booting. The proposed algorithms were then evaluated with 65 real benchmarks from the MRTC and DSPstone benchmark suites, and the results showed that all of the proposed algorithms can reduce booting time for each benchmark set by more than 29%. Moreover, extensive simulations were conducted to show the capability of the proposed approaches when used with various hardware resources and software workloads.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2013.96</guid>
  </item>
  <item>
     <title>PrePrint: A DFA with Extended Character-Set for Fast Deep Packet Inspection</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2013.93</link>
     <description>Deep packet inspection (DPI), based on regular expressions, is expressive, compact, and efficient in specifying attack signatures. We focus on their implementations based on general-purpose processors that are cost-effective and flexible to update. In this paper, we propose a novel solution, called deterministic finite automata with extended character-set (DFA/EC), which can significantly decrease the number of states through doubling the size of the character-set. Unlike existing state reduction algorithms, our solution requires only a single main memory access for each byte in the traffic payload, which is the minimum. We perform experiments with several Snort rule-sets. Results show that, compared to DFAs, DFA/ECs are very compact and are over four orders of magnitude smaller in the best cases; DFA/ECs also have smaller memory bandwidth and run faster. We believe that DFA/EC will lay a groundwork for a new type of state compression technique in fast packet inspection.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2013.93</guid>
  </item>
  <item>
     <title>PrePrint: Reliable Multicast in Data Center Networks</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2013.91</link>
     <description>Reliable packet delivery is required in data center multicast for data-intensive computations. However, existing reliable multicast solutions for the Internet are not suitable for the data center environment. We present RDCM, a novel reliable multicast protocol for data center network. The key idea of RDCM is to minimize the impact of packet loss on the multicast hroughput, by leveraging the rich link resource in data centers. A multicast-tree-aware backup overlay is explicitly built on group members for peer-to-peer packet repair. The backup overlay is organized in such a way that it causes little individual repair burden, control overhead, as well as overall repair traffic. RDCM also realizes a window-based congestion control to adapt its sending rate to the traffic status in the network. Simulation results in typical data center networks show that RDCM can achieve higher application throughput and less traffic footprint than other representative reliable multicast protocols. We have implemented RDCM as a user-level library on Windows platform. The experiments on our test bed show that RDCM handles packet loss without obvious throughput degradation during high-speed data transmission, gracefully respond to link failure and receiver failure, and causes less than 10% CPU overhead to data center servers.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2013.91</guid>
  </item>
  <item>
     <title>PrePrint: A Parallel and Uniform k-Partition Method for Montgomery Multiplication</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2013.89</link>
     <description>A way to speed up the Montgomery Multiplication by distributing the multiplier operand bits into k partitions is proposed. All of them process in parallel and use an identical algorithm. Each partition executes its task in n/k steps. Even though the computation step operates in radix 2^k, the complexity is reduced by the use of a limited digit set. Experiments with a 90nm cell library show that the hardware cost and its complexity have a linear growth according to the number of partitions. Besides the gain in speed, the proposal reduces power consumption for multiplication operands with 256, 512, 1024, and 2048 bits. The uniform treatment of partition hardware design enables the realization of a fault-tolerant hardware.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2013.89</guid>
  </item>
  <item>
     <title>PrePrint: Truthful Mechanisms for Allocating a Single Processor to Sporadic Tasks in Competitive Real-Time Environments</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2013.86</link>
     <description>In a non-competitive environment, sporadic real-time task scheduling on a single processor is well understood. In this paper, we consider a competitive environment comprising several real-time tasks vying for execution upon a shared single processor. Each task obtains a value if the processor successfully schedules all its jobs. Our objective is to select a feasible subset of these tasks to maximize the sum of values of selected tasks. We consider both dynamic-priority and static-priority scheduling algorithms. There are algorithms for solving these problems in non-competitive settings. However, we consider these problems in an economic setting in which each task is owned by a selfish agent. Each agent reports the characteristics of her own task to the processor owner. The processor owner uses a mechanism to allocate the processor to a subset of agents and to determine the payment of each agent. Since agents are selfish, they may try to manipulate the mechanism to obtain the processor. We are interested in truthful mechanisms in which it is always in agents' best interest to report the true characteristics of their tasks. We design exact and approximate truthful mechanisms for this competitive environment and study their performance.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2013.86</guid>
  </item>
  <item>
     <title>PrePrint: High-Throughput Compact Delay-Insensitive Asynchronous NoC Router</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2013.81</link>
     <description>A new asynchronous delay-insensitive data-transmission method based on level-encoded dual-rail (LEDR) encoding with novel packet-structure restriction is proposed to realize a high-throughput Network-on-Chip (NoC) router together with a compact hardware. The use of LEDR encoding makes communication steps and the registers being used half in comparison with four-phase dual-rail encoding, be- cause the spacer information of the four-phase one is eliminated, which significantly improves the network throughput. By using the proposed packet structure, the phase information of header and tail flits is uniquely determined. Since the router can be asynchronously controlled by ignoring the phase information, the circuit is compactly implemented. As a result, the proposed asynchronous NoC router on a 0.13&amp;amp;#x03BC;m CMOS technology, has a 90% increase in throughput and a 34% decrease in energy dissipation with 25% area overhead in comparison with a conventional four-phase asynchronous NoC router under a post-layout simulation. In a 4x4 2-D mesh topology, the proposed asynchronous NoC has a 140% increase in throughput and half packet latency compared with the conventional one. We also fabricate the asynchronous NoC based on the proposed router on a 0.13&amp;amp;#x03BC;m CMOS technology and demonstrate the chip correctly operates under a supply voltage of 0.6V to 1.8V.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2013.81</guid>
  </item>
  <item>
     <title>PrePrint: Self-Reconfigurable Evolvable Hardware System for Adaptive Image Processing</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2013.78</link>
     <description>This paper presents an evolvable hardware system, fully contained in an FPGA, which is capable of autonomously generating digital processing circuits, implemented on an array of Processing Elements (PE). Candidate circuits are generated by an embedded evolutionary algorithm and implemented by means of dynamic partial reconfiguration, enabling evaluation in the final hardware. The PE array follows a systolic approach, and PEs do not contain extra logic such as path multiplexers or unused logic, so array performance is high. Hardware evaluation in the target device and the fast reconfiguration engine used, yield smaller reconfiguration than evaluation times. This means the complete evaluation cycle is faster than software based approaches and previous evolvable digital systems. The selected application is digital image filtering and edge detection. The evolved filters yield better quality than classic linear and non-linear filters using Mean Absolute Error as standard comparison metric. Results do not only show better circuit adaptation to different noise types and intensities, but also a non-degrading filtering behavior. This means they may be run iteratively in order to enhance filtering quality. These properties are even kept for high noise levels (40%). The system as a whole is a step towards fully autonomous, adaptive systems.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2013.78</guid>
  </item>
  <item>
     <title>PrePrint: Memristor-Based Neural Logic Blocks for Non-Linearly Separable Functions</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2013.75</link>
     <description>Neural logic blocks (NLBs) enable the realization of biologically-inspired reconfigurable hardware. Networks of NLBs can be trained to perform complex computations such as multi-level Boolean logic and optical character recognition (OCR) in an area- and energy-efficient manner. Recently, several groups have proposed perceptron-based NLB designs with thinfilm memristor synapses. These designs are implemented using a static threshold activation function, limiting the set of learnable functions to be linearly-separable. In this work, we propose two NLB designs&amp;amp;#x2013;robust adaptive NLB (RANLB) and multi-threshold NLB (MTNLB)&amp;amp;#x2013;which overcome this limitation by allowing the effective activation function to be adapted during the training process. Consequently, both designs enable any logic function to be implemented in a single-layer NLB network. The proposed NLBs are trained to implement ISCAS-85 benchmark circuits, as well as OCR. The MTNLB achieves 90% improvement in the energy delay product (EDP) over lookup table (LUT)-based implementations of the ISCAS-85 benchmarks and up to a 99% improvement over a previous NLB implementation. As a compromise, the RANLB provides a smaller EDP improvement, but achieves faster training times for all tested functions.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2013.75</guid>
  </item>
  <item>
     <title>PrePrint: Adaptive Voltage Scaling with in-situ Detectors in Commercial FPGAs</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2013.73</link>
     <description>This paper investigates the limits of adaptive voltage scaling (AVS) applied to commercial FPGAs which do not specifically support voltage adaptation. An adaptive power architecture based on a modified design flow is created with in-situ detectors and dynamic reconfiguration of clock management resources. AVS is a power-saving technique that enables a device to regulate its own voltage and frequency based on workload, process and operating conditions in a closed-loop configuration. It results in significant improved energy profiles compared with DVFS (Dynamic Voltage Frequency Scaling) in which the device uses a number of pre-calculated valid working points. The results of deploying AVS in FPGAs with in-situ detectors shows power and energy savings exceeding 85% compared with nominal voltage operation at the same frequency. The in-situ detector approach compares favorably with critical path replication based on delay lines since it avoids the need of cumbersome and error-prone delay line calibration.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2013.73</guid>
  </item>
  <item>
     <title>PrePrint: LACS: A Locality-Aware Cost-Sensitive Cache Replacement Algorithm</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2013.61</link>
     <description>The design of an effective last-level cache (LLC) in general - and an effective cache replacement/partitioning algorithm in particular - is critical to the overall system performance. The processor's ability to hide the LLC miss penalty differs widely from one miss to another. The more instructions the processor manages to issue during the miss, the better it is capable of hiding the miss penalty and the lower the cost of that miss. This non-uniformity in the processor's ability to hide LLC miss latencies, and the resultant non-uniformity in the performance impact of LLC misses, opens up an opportunity for a new cost-sensitive cache replacement algorithm. This paper makes two key contributions. First, it proposes a framework for estimating the costs of cache blocks at run-time based on the processor's ability to (partially) hide their miss latencies. Second, it proposes a simple, low-hardware overhead, yet effective, cache replacement algorithm that is Locality-Aware and Cost-Sensitive (LACS). LACS is thoroughly evaluated using a detailed simulation environment. LACS speeds up 12 LLC-performanceconstrained SPEC CPU2006 benchmarks by up to 51% and 11% on average. When evaluated using a dual/quad-core CMP with a shared LLC, LACS significantly outperforms LRU in terms of performance and fairness, achieving improvements up to 54%.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2013.61</guid>
  </item>
  <item>
     <title>PrePrint: Community-Aware Opportunistic Routing in Mobile Social Networks</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2013.55</link>
     <description>Abstract&amp;amp;#x2014;Mobile social networks (MSNs) are a kind of delay tolerant network that consists of lots of mobile nodes with social characteristics. Recently, many social-aware algorithms have been proposed to address routing problems in MSNs. However, these algorithms tend to forward messages to the nodes with locally optimal social characteristics, and thus cannot achieve the optimal performance. In this paper, we propose a distributed optimal Community-Aware Opportunistic Routing (CAOR) algorithm. Our main contributions are that we propose a home-aware community model, whereby we turn an MSN into a network that only includes community homes. We prove that, in the network of community homes, we still can compute the minimum expected delivery delays of nodes through a reverse Dijkstra algorithm and achieve the optimal opportunistic routing performance. Since the number of communities is far less than the number of nodes in magnitude, the computational cost and maintenance cost of contact information are greatly reduced. We demonstrate how our algorithm significantly outperforms the previous ones through extensive simulations, based on a real MSN trace and a synthetic MSN trace.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2013.55</guid>
  </item>
  <item>
     <title>PrePrint: Dynamic Scheduling of Real-Time Mixture-of-Experts Systems on Limited Resources</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2013.50</link>
     <description>A Mixture-of-Experts (MoE) system generates an output in each operating cycle by combining results of multiple models (the &#x0022;experts&#x0022;). The contribution of any given expert to a final solution depends on a parameter called responsibility, which can vary from cycle to cycle. When resources are insufficient to run all experts, two problems arise: (1) how much utilization is to be allocated to experts and (2) how can a schedule be created based on these allocations. Problem (1) can be formulated as a succession of optimization problems, each of which calculates experts' allocations in a cycle. Explicit mappings from responsibilities to allocation weights are needed to solve each of these problems in every cycle using a technique called &#x0022;task compression (TC)&#x0022;. We refer to this baseline approach as TT-TC. Two other proposed heuristics TT-TC* and TT-Top reduce TC's execution time to O(N) for N experts. To address (2), the proposed EPOC scheduler converts the heuristics' allocations into schedules that satisfy capacity, execution and learning constraints across cycles. Simulations demonstrate that our approaches enable real-time computation and significantly decrease the average percentage error of limited-resource outputs (i.e., 0.2-40% and 0.3-0.5% when scheduled with TT-TC* and TT-Top, respectively, versus 0.2-97% when using TT-TC).</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2013.50</guid>
  </item>
  <item>
     <title>PrePrint: Squashing Alternatives for Software-based Speculative Parallelization</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2013.46</link>
     <description>Speculative parallelization is a runtime technique that optimistically executes sequential code in parallel, checking that no dependence violations arise. In the case of a dependence violation, all mechanisms proposed so far either switch to sequential execution, or conservatively stop and restart the offending thread and all its successors, potentially discarding work that does not depend on this particular violation. In this work we systematically explore the design space of solutions for this problem, proposing a new mechanism that reduces the number of threads that should be restarted when a data dependence violation is found. Our new solution, called exclusive squashing, keeps track of inter-thread dependencies at runtime, selectively stopping and restarting offending threads, together with all threads that have consumed data from them. We have compared this new approach with existent solutions on a real system, executing different applications with loops that are not analyzable at compile time and present as much as 10% of inter-thread dependence violations at runtime. Our experimental results show a relative performance improvement of up to 14%, together with a reduction of one-third of the numbers of squashed threads. The speculative parallelization scheme and benchmarks described in this paper are available under request.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2013.46</guid>
  </item>
  <item>
     <title>PrePrint: APC: A Novel Memory Metric and Measurement Methodology for Modern Memory System</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2013.38</link>
     <description>Due to the infamous &#x0022;memory wall&#x0022; problem and a drastic increase in the number of data intensive applications, memory rather than processors, has become the leading performance bottleneck in modern computing systems. Evaluating and understanding memory system performance is increasingly becoming the core of high-end computing. Conventional memory metrics, such as miss ratio, AMAT, etc., are designed to measure a given memory performance parameter, and do not reflect the overall performance or complexity of a modern memory system. On the other hand, widely used system-performance metrics, such as IPC, are designed to measure CPU performance, and do not directly reflect memory performance. In this paper, we propose a novel memory metric called Access Per Cycle (APC), which is the number of data accesses per cycle, to measure the overall memory performance with respect to the complexity of modern memory systems. A unique contribution of APC is its separation of memory evaluation from CPU evaluation; therefore, it provides a quantitative measurement of the &#x0022;data-intensiveness&#x0022; of an application. Simulation results show that the memory performance measured by APC captures the concurrency complexity of modern memory systems, while other metrics cannot. APC is simple, effective, and is significantly more appropriate than existing memory metrics in evaluating modern memory systems.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2013.38</guid>
  </item>
  <item>
     <title>PrePrint: Design and Analysis of a Highly User-Friendly, Secure, Privacy-Preserving, and Revocable Authentication Method</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2013.25</link>
     <description>A large portion of system breaches are caused by authentication failure, either during the login process or in the post-authentication session; these failures are themselves related to the limitations associated with existing authentication methods. Current authentication methods, whether proxy based or biometrics based, are not user-centric and/or endanger users' (biometric) security and privacy. In this paper, we propose a biometrics based user-centric authentication approach. This method involves introducing a reference subject (RS), securely fusing the user's biometrics with the RS, generating a BioCapsule (BC) from the fused biometrics, and employing BCs for authentication. Such an approach is user friendly, identity bearing yet privacy-preserving, resilient, and revocable once a BC is compromised. It also supports &#x0022;one-click sign-on&#x0022; across systems by fusing the user's biometrics with a distinct RS on each system. Moreover, active and non-intrusive authentication can be automatically performed during post-authentication sessions. We formally prove that the secure fusion based approach is secure against various attacks. Extensive experiments and detailed comparison with existing approaches show that its performance (i.e., authentication accuracy) is comparable to existing typical biometric approaches and the new BC based approach also possesses many desirable features such as diversity and revocability.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2013.25</guid>
  </item>
  <item>
     <title>PrePrint: Improving MapReduce Performance Using Smart Speculative Execution Strategy</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2013.15</link>
     <description>MapReduce is a widely used parallel computing framework for large scale data processing. The performance of MapReduce is seriously impacted by stragglers &amp;amp;amp;#8211; machines on which tasks take an unusually long time to finish. Speculative execution is a common approach for dealing with this problem by backing up slow tasks on alternative machines. Existing strategies have some pitfalls: i) Identify slow tasks by average progress rate while actually progress rate can be unstable, ii) Care less about data locality when choosing backup nodes. In this paper, we first provide a detailed analysis of pitfalls in existing strategies. Then we develop a new strategy named MCP (Maximum Cost Performance), which improves the effectiveness of speculative execution significantly. MCP provides the following methods: i) Use EWMA to predict process speed and calculate task&amp;amp;amp;#8217;s remaining time, ii) Determine which task to backup based on the load of cluster using a cost-benefit model, iii) To choose proper node for backups, we take both data locality and data skew into consideration. We evaluate MCP in a cluster of 101 virtual machines with several applications. Experiment results show that MCP can run job up to 39% faster and improve the cluster throughput up to 44% compared to Hadoop-0.21.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2013.15</guid>
  </item>
  <item>
     <title>PrePrint: Cache Friendliness-Aware Management of Shared Last-Level Caches for High Performance Multi-Core Systems</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2013.18</link>
     <description>To achieve high efficiency and prevent destructive interference among multiple divergent workloads, the last-level cache of Chip Multiprocessors has to be carefully managed. Previously proposed cache management schemes suffer from inefficient cache capacity utilization, by either focusing on improving the absolute number of cache misses or by allocating cache capacity without taking into consideration the applications' memory sharing characteristics. Reduction of the overall number of misses does not always correlate with higher performance as Memory-level Parallelism can hide the latency penalty of a significant number of misses in out-of-order execution. In this work we describe a quasi-partitioning scheme for last-level caches that combines the memory-level parallelism, cache friendliness and interference sensitivity of competing applications, to efficiently manage the shared cache capacity. The proposed scheme improves both system throughput and execution fairness -- outperforming previous schemes that are oblivious to applications' memory behavior.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2013.18</guid>
  </item>
  <item>
     <title>PrePrint: Design and Implementation of an Asymmetric Block-Based Parallel File System</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2013.6</link>
     <description>existing block-based parallel file systems, which are deployed in the Storage Area Network (SAN), blend metadata with data in underlying disks. Unfortunately, such symmetric architecture is prone to system-level failures, as metadata on shared disks can be damaged by a malfunctioning client. In this paper, we propose an asymmetric block-based parallel file system, Redbud, which isolates the metadata storage in the metadata server (MDS) access domain. Although centralized metadata management can effectively improve the reliability of the system, it faces some challenges in providing high performance and availability. Towards this end, we introduce an embedded directory mechanism and a locality-aware namespace index to explore the disk bandwidth of the metadata storage; we also introduces adaptive layout operations to deliver high I/O throughput for various file access pattern. Besides, by taking the MDS's load into consideration, we propose an adaptive timeout algorithm to make the MDS failure detection adaptive to the evolving workloads, improving the system availability. Measurements of a wide range of workloads demonstrate the benefit of our design and that Redbud gains good scalability.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2013.6</guid>
  </item>
  <item>
     <title>PrePrint: Exploiting Implementation Diversity and Partial Connection of Routers in Application-Specific Network-on-Chip Topology Synthesis</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2012.294</link>
     <description>This paper proposes a novel application-specific Network-on-Chip (NoC) topology synthesis method in which the partial connection and the implementation diversity of routers are exploited. In our observation, those NoC topology synthesis methods resemble the logic synthesis from several aspects. However, an outstanding difference is that the existing NoC topology synthesis methods consider only a single implementation for each size of router, whereas modern logic synthesis tools utilize multiple implementations of a cell to produce better netlist. To tackle this drawback, we propose a novel NoC topology synthesis methodology where the implementation diversity of routers is exploited to produce optimal topologies in terms of area and/or power consumption. Two different approaches, the post-process approach and the in-process approach, are proposed for exploiting the implementation diversity in order to provide the flexibility between synthesis time and design quality. Compared to the method in which the implementation diversity is exploited but the partial connection is not, the experimental results demonstrate that the proposed method can reduce the power consumption by up to 67.8% and 40.0% on average. Compared to the method in which the partial connection is exploited but the implementation diversity is not, our method can reduce the power consumption by up to 12.0% and 3.4% on average.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2012.294</guid>
  </item>
  <item>
     <title>PrePrint: An Effective Gray-Box Identification Procedure for Multicore Thermal Modelling</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2012.293</link>
     <description>Aggressive thermal management is a critical feature for high-end computing platforms, as worst-case thermal budgeting is becoming unaffordable. Reactive thermal management, which sets temperature thresholds to trigger thermal capping actions, is too &#x0022;near-sighted&#x0022;, and it may lead to severe performance degradation and thermal overshoots. More aggressive proactive thermal managements minimize performance penalty with smooth optimal control. These techniques require knowledge of thermal models which have to be accurate and simple to make the controls effective, while keeping their complexity limited. Unfortunately, in practice, these models are not provided by manufacturers, and in most cases they strongly depend on the deployment environment. Hence, procedures to automatically derive thermal models in the field are needed. In this paper, we propose a gray-box procedure to learn a compact and physically-consistent model for multicore chips. We leverage the physical-consistency of the proposed model to tame the model complexity and to face large quantization noise in measurements. Output Error structures along with Levenberg-Marquardt and Least Squares optimization algorithms have been exploited. We tackle the problem in a real-life contest: we developed a complete infrastructure for model-building and thermal data collection in the Linux environment, and we tested it on an Intel Nehalem-based server CPU.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2012.293</guid>
  </item>
  <item>
     <title>PrePrint: Floorplan Optimization of Fat-Tree Based Networks-on-Chip for Chip Multiprocessors</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2012.295</link>
     <description>Chip multiprocessor (CMP) is becoming increasingly popular in the processor industry. Efficient network-on-chip (NoC) that has similar performance to the processor cores is important in CMP design. Fat-tree based on-chip network has many advantages over traditional mesh or torus based networks in terms of throughput, power efficiency and latency. It has a bright future in the development of CMP. However, the floorplan design of the fat-tree based NoC is very challenging because of the complexity of topology. There are a large number of crossings and long interconnects, which cause severe performance degradation in the network. In electronic NoCs, the parasitic capacitance and inductance will be significant. In optical ones, large crosstalk noise and power loss will be introduced. The novel contribution of this paper is to propose a method to optimize the fat-tree floorplan, which can effectively reduce the number of crossings and minimize the interconnect length. Two types of floorplans are proposed, which could be applied to fat-tree based networks of arbitrary size. Compared with the traditional one, our floorplans could reduce more than 87% of the crossings. We also present a method to calculate the optimum aspect ratio of the processor cores to minimize the traversal distance.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2012.295</guid>
  </item>
  <item>
     <title>PrePrint: Synthesis of Stochastic Flow Networks</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2012.270</link>
     <description>A stochastic flow network is a directed graph with incoming edges (inputs) and outgoing edges (outputs), tokens enter through the input edges, travel stochastically in the network, and can exit the network through the output edges. Each node in the network is a splitter, namely, a token can enter a node through an incoming edge and exit on one of the output edges according to a predefined probability distribution. Stochastic flow networks can be easily implemented by beam splitters, or by DNA-based chemical reactions, with promising applications in optical computing, molecular computing and stochastic computing. In this paper, we address a fundamental synthesis question: Given a finite set of possible splitters and an arbitrary rational probability distribution, design a stochastic flow network, such that every token that enters the input edge will exit the outputs with the prescribed probability distribution. It shows that when each splitter has two outgoing edges and is unbiased, an arbitrary rational probability a/b with $/math/$ can be realized by a stochastic flow network of size n that is optimal. Compared to the other stochastic systems, feedback (cycles in networks) strongly improves the expressibility of stochastic flow networks.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2012.270</guid>
  </item>
  <item>
     <title>PrePrint: E-Shadow: Lubricating Social Interaction using Mobile Phones</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2012.290</link>
     <description>In this paper, we propose E-Shadow, a distributed mobile phone-based local social networking system. E-Shadow has two main components: (1) Local profiles. They enable E-Shadow users to record and share their names, interests, and other information with fine-grained privacy controls. (2) Mobile phone based local social interaction tools. E-Shadow provides mobile phone software that enables rich social interactions. The software maps proximate users' local profiles to their human owners and enables user communication and content sharing. We have designed and implemented E-Shadow on mobile phones. In our E-Shadow system, we allow users to perform dynamic and layered information publishing, making use of interpersonal relevance in space and time. Our system also provides a mechanism to help users perform direction-driven localization of an E-Shadow and match it with its owner. Experiments on real world Windows Mobile phones and large-scale simulations show that our system disseminates information efficiently and helps receivers find the direction of a specific E-Shadow with accuracy. We believe our E-Shadow concept and system can lead to a more tightly-knit temporary community in one's physical vicinity.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2012.290</guid>
  </item>
  <item>
     <title>PrePrint: The Switch Reordering Contagion: Preventing a Few Late Packets from Ruining the Whole Party</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2012.288</link>
     <description>Packet reordering has now become one of the most significant bottlenecks in next-generation switch designs. A switch practically experiences a reordering delay contagion, such that a few late packets may affect a disproportionate number of other packets. This contagion can have two possible forms. First, since switch designers tend to keep the switch flow order, i.e. the order of packets arriving at the same switch input and departing from the same switch output, a packet may be delayed due to packets of other flows with little or no reason. Further, within a flow, if a single packet is delayed for a long time, then all the other packets of the same flow will have to wait for it and suffer as well. In this paper, we suggest solutions against this reordering contagion. We first suggest several hash-based counter schemes that prevent inter-flow blocking and reduce reordering delay. We further suggest schemes based on network coding to protect against rare events with high queueing delay within a flow. Last, we demonstrate using both analysis and simulations that the use of these solutions can indeed reduce the resequencing delay. For instance, resequencing delays are reduced by up to an order of magnitude using real-life traces and a real hashing function.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2012.288</guid>
  </item>
  <item>
     <title>PrePrint: Joint Design of Asynchronous Sleep-wake Scheduling and Opportunistic Routing in Wireless Sensor Networks</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2012.282</link>
     <description>Designing a lifetime-maximization routing in wireless sensor networks poses a great challenge mainly due to unreliable wireless links and limited power supply. Recently, two natural advantages of opportunistic routing, i.e., path diversity and the improvement of transmission reliability, are exploited to develop a lifetime-extended opportunistic routing for wireless sensor networks. Besides, asynchronous sleep-wake scheduling is an effective mechanism to reduce energy consumption by appropriately arranging sensor nodes to sleep. Hence, in this paper, we propose a joint design of Asynchronous Sleep-wake Schedules and Opportunistic Routing, called ASSORT, to maximize the network lifetime. Simulation results show that ASSORT effectively achieves network lifetime extension compared with other routing schemes.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2012.282</guid>
  </item>
  <item>
     <title>PrePrint: The well-connected processor array</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2012.280</link>
     <description>A new theoretical model for reconfigurable processor arrays is introduced. Most models in the literature are similar to the reconfigurable mesh (RMESH), in which each processing element (PE) is connected to its four neighbors by reconfigurable buses. In the new model, called the &#x0022;well-connected processor array&#x0022; (WECPAR), every PE is connected to each neighbor by k point-to-point lines, and it also controls the switching between those lines. k is called the connectivity of the WECPAR. Any line entering the PE can either be connected to the PE itself, or it can be connected by the PE to another line, thus enabling complex configurations. This model is suitable for arrays in which the computation and memory areas of a PE are very much larger than a switch area. The concept of a burden placed on a PE by the lines connected to or passing through it is introduced, and used to derive a sharp lower bound on the connectivity required to embed any graph with a given degree. Other issues include graph embeddings, algorithms, broadcasting, routing, and self-simulation. A novel transportation-type routing method utilizes the connectivity for efficient routing, and FFT-like algorithms can be implemented efficiently.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2012.280</guid>
  </item>
  <item>
     <title>PrePrint: Modelling and Tools for Power Supply Variations Analysis in Networks-on-Chip</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2012.272</link>
     <description>Power supply integrity has become a critical concern with the rapid shrinking feature size and the ever increasing power consumption in nanometre scale integration. In particular, on-chip communication, in platforms such as networks-on-chip (NoC), dictates the power dissipation and overall system performance in multi-core systems and embedded computing architectures. These architectures require a dedicated tool for analyzing the power supply noise which must embed distinctive communication characteristics and spatial parameters. In this paper, we present a tool dedicated for determining the on-chip VDD drops due to communication workload in NoCs. This tool integrates a fast power grid model, a NoC simulator, an on-chip link model and a microarchitectural power model for router. The model has been rigorously verified using SPICE simulations. The proposed model and tools are further exemplified through analyzing the impact of power supply noise for NoC links. Statistical timing analysis of NoC links in the presence of power supply noise was performed to evaluate the bit error rates. This work would enable better understanding of the tradeoffs existing in the design of NoCs, and the induced power supply noise due to on-chip communication. This understanding is crucial for the analysis of the quality of service (QoS) of communication fabrics in NoCs at the early design stages.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2012.272</guid>
  </item>
  <item>
     <title>PrePrint: Efficiently Securing Systems from Code Reuse Attacks</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2012.269</link>
     <description>Code reuse attacks (CRAs) are recent security exploits that allow attackers to execute arbitrary code on a compromised machine. CRAs, exemplified by return-oriented and jump-oriented programming approaches, reuse fragments of the library code, thus avoiding the need for explicit injection of attack code on the stack. Since the executed code is reused existing code, CRAs bypass current hardware and software security measures that prevent execution from data or stack regions of memory. While software-based full control flow integrity (CFI) checking can protect against CRAs, it includes significant overhead, involves non-trivial effort of constructing a control flow graph, relies on proprietary tools and has potential vulnerabilities due to the presence of unintended branch instructions in architectures such as x86---those branches are not checked by the software CFI. We propose {\em branch regulation} (BR), a lightweight hardware-supported protection mechanism against the CRAs that addresses all limitations of software CFI. BR enforces simple control flow rules in hardware at the function granularity to disallow arbitrary control flow transfers from one function into the middle of another function. This prevents common classes of CRAs without the complexity and run-time overhead of full CFI enforcement.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2012.269</guid>
  </item>
  <item>
     <title>PrePrint: NFRA: Generalized Network Flow Based Resource Allocation for Hosting Centers</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2012.253</link>
     <description>Due to prohibitive cost of datacenter setup and maintenance, many small-scale businesses rely on hosting centers to provide the cloud infrastructure to run their workloads. Hosting centers host services of the clients on their behalf and guarantee quality of service as defined by service level agreements (SLAs.) To reduce energy consumption and to maximize profit it is critical to optimally allocate resources to meet client SLAs. Optimal allocation is a non-trivial task due to 1) resource heterogeneity where energy consumption of a client task varies depending on the allocated resources 2) lack of energy proportionality where energy cost for a task varies based on server utilization. In this paper we introduce a generalized Network Flow based Resource Allocation framework, called NFRA, for energy minimization and profit maximization. NFRA provides a unified framework to model profit maximization under a wide range of SLAs. We will demonstrate the simplicity of this unified framework by deriving optimal resource allocations for three different SLAs. We derive workload demands and server energy consumption data from SPECWeb2009 benchmark results to demonstrate the efficiency of NFRA framework.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2012.253</guid>
  </item>
  <item>
     <title>PrePrint: Supporting Lock-Free Composition of Concurrent Data Objects: Moving Data Between Containers</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2012.248</link>
     <description>Lock-free data objects offer several advantages over their blocking counterparts, such as being immune to deadlocks, priority inversion and convoying. They have also been shown to work well in practice. However, composing the operations they provide into larger atomic operations, while still guaranteeing efficiency and lock-freedom, is a challenging algorithmic task. We present a lock-free methodology for composing a wide variety of concurrent linearizable objects together by unifying their linearization points. This makes it possible to relatively easily introduce atomic lock-free {\it move} operations to a wide range of concurrent lock-free containers. This move operation allows data to be transferred from one container to another, in a lock-free way, without blocking any of the operations supported by the original container. For a data object to be suitable for composition using our methodology it needs to fulfil a set of requirements. These requirement are however generic enough to be fulfilled by a large set of objects. To show this we have performed case studies on six commonly used lock-free objects (a stack, a queue, a skip-list, a deque, a doubly linked-list and a hash-table) to demonstrate the general applicability of the methodology. We also show that the operations originally supported by the data objects keep their performance behavior under our methodology.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2012.248</guid>
  </item>
  <item>
     <title>PrePrint: Period Selection for Minimal Hyper-period in Periodic Task Systems</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2012.243</link>
     <description>Task period selection is often used to adjust the workload to the available computational resources. In this paper, we propose a model where each selected period is not restricted to be a natural number, but can be any rational number within a range. Under this generalisation, we contribute a period selection algorithm that yields a much smaller hyper-period than that of previous works: with respect to the largest period, the hyper-period with integer constraints is exponentially bounded; with rational periods the worst case is only quadratic. By means of an integer approximation at each task activation, we show how our rational period approach can work under system clock granularity; it is thus compatible with scheduling analysis practice and implementation. Our finding has practical applications in several fields of real-time scheduling: lowering complexity in table driven schedulers, reducing search space in model checking analysis, generating synthetic workload for statistical analysis of real-time scheduling algorithms, etc.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2012.243</guid>
  </item>
  <item>
     <title>PrePrint: Analytical Leakage-Aware Thermal Modeling of a Real-Time System</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2012.237</link>
     <description>We consider a firm real-time system with a single processor working in two power modes depending on whether it is idle or executing a job. The system is equipped with dynamic thermal management through a cooling subsystem which can switch between two cooling modes. Real-time jobs which arrive to the system have stochastic properties and are prone to soft errors. A successful job is one that enters the system and completes its execution with no timing or soft error. Appropriateness of the system is evaluated based on its performance, temperature behavior, reliability, and energy consumption. It is noteworthy that these criteria have mutual interactions to each other: the stochastic nature of the system affects the success ratio of jobs beside the system dynamic power, the leakage as well as dynamic power impacts the processor temperature, this temperature affects the leakage power, the cooling subsystem power, and the soft error rate, which the latter in turn impacts the system reliability and the success ratio of jobs. This paper proposes an analytical evaluation method with a Markovian view to the system which considers these reciprocal effects. A number of simulation experiments are carried out to validate the accuracy of the proposed method.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2012.237</guid>
  </item>
  <item>
     <title>PrePrint: SymPLFIED: Symbolic Program Level Fault Injection and Error Detection Framework</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2012.219</link>
     <description>This paper introduces SymPLFIED, a program-level framework that allows specification of arbitrary error detectors and the verification of their efficacy against hardware errors. SymPLFIED comprehensively enumerates all transient hardware errors in registers, memory and computation (expressed symbolically as value errors) that potentially evade detection and cause program failure. The framework uses symbolic execution to abstract the state of erroneous values in the program and model checking to comprehensively find all errors that evade detection. We demonstrate the use of SymPLFIED on a widely deployed aircraft collision avoidance application, tcas. Our results show that the SymPLFIED framework can be used to uncover hard-to-detect catastrophic cases caused by transient errors in programs that may not be exposed by random fault-injection based validation. Further, the errors exposed by the framework help us formulate a set of error detectors for the application to avoid the catastrophic case and other incorrect outcomes.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2012.219</guid>
  </item>
  <item>
     <title>PrePrint: MuSA: Multivariate Sampling Algorithm for Wireless Sensor Networks</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2012.229</link>
     <description>A wireless sensor network can be used to collect and process environmental data, which is often of multivariate nature. This work proposes a multivariate sampling algorithm based on component analysis techniques in wireless sensor networks. To improve the sampling, the algorithm uses component analysis techniques to rank the data. Once ranked, the most representative data is retained. Simulation results show that our technique reduces the data keeping its representativeness. In addition, the energy consumption and delay to deliver the data on the network are reduced.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2012.229</guid>
  </item>
  <item>
     <title>PrePrint: Topology Control for Time-Evolving and Predictable Delay-Tolerant Networks</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2012.220</link>
     <description>Previous DTN research mainly focuses on routing and information propagation. However, with large number of wireless devices' participation, how to maintain efficient and dynamic topology of the DTN becomes crucial. In this paper, we study the topology control (TC) problem in a predictable DTN where the time-evolving network topology is known a priori or can be predicted. We first model such time-evolving network as a directed space-time graph which includes both spacial and temporal information. The aim of TC is to build a sparse structure from the original space-time graph such that (1) the network is still connected over time and supports DTN routing between any two nodes; (2) the total cost of the structure is minimized. We prove that this problem is NP-hard, and then propose two greedy-based methods which can significant reduce the total cost of topology while maintain the connectivity over time. We also introduce another version of the TC problem by requiring that the least cost path for any two nodes in this constructed structure is still cost-efficient compared with the one in the original graph. Two greedy-based methods are provided for such problem. Simulations have been conducted on both random DTN networks and real-world DTN tracing data. Results demonstrate the efficiency of the proposed methods.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2012.220</guid>
  </item>
  <item>
     <title>PrePrint: Computation of an Equilibrium in Spectrum Markets for Cognitive Radio Networks</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2012.211</link>
     <description>In this paper, we investigate a market equilibrium in multi-channel sharing cognitive radio networks (CRNs): it is assumed that every subchannel is orthogonally licensed to a single primary user (PU), and can be shared with multiple secondary users (SUs). We model this sharing as a spectrum market where PUs offer SUs their subchannels with limiting the interference from SUs; the SUs purchase the right to transmit over the subchannels while observing the inference limits set by the PUs and their budget constraints. Moreover, we consider each SU limits the total interference that can be invoked from all other SUs, and assume that every transmitting SU marks the interference charges to other transmitting SUs. The utility function of SU is defined as least achievable transmission rate, and that of PU is given by the net profit. We define a market equilibrium in the context of extended Fisher model, and show that the equilibrium is yielded by solving an optimization problem, Eisenberg-Gale convex program. In order to make the solutions of the convex program meet the market equilibrium, we apply monotone-transformation to the utility function of each SU. Furthermore, we develop a distributed algorithm that yields the stationary solutions asymptotically equivalent to the solutions given by the convex program.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2012.211</guid>
  </item>
  <item>
     <title>PrePrint: Flow Problems in Multi-interface Networks</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2012.214</link>
     <description>In heterogeneous networks, devices communicate by means of multiple wired or wireless interfaces. By switching among interfaces or by combining the available ones, each device might establish several connections. A connection could be established when the devices at its endpoints share at least one active interface. In this paper, we consider two fundamental optimization problems. In the first one (Maximum Flow in Multi-Interface Networks, MFMI), we aim to establish the maximal bandwidth that can be guaranteed between two given nodes of the input network. In the second problem (Minimum-Cost Flow in Multi-Interface Networks, MCFMI), we look for activating the cheapest set of interfaces among a network in order to guarantee a minimum bandwidth B of communication between two specified nodes. We show that MFMI is polynomially solvable while MCFMI is NP-hard even for a bounded number of different interfaces and bounded degree networks. Moreover, we provide polynomial approximation algorithms for MCFMI and exact algorithms for relevant sub-problems. Finally, we experimentally analyze the proposed approximation algorithm, showing that in practical cases it guarantees a low approximation ratio.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2012.214</guid>
  </item>
  <item>
     <title>PrePrint: SPONGENT: The Design Space of Lightweight Cryptographic Hashing</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2012.196</link>
     <description>The design of secure yet efficiently implementable cryptographic algorithms is a fundamental problem of cryptography. Lately, lightweight cryptography - optimizing the algorithms to fit the most constrained environments - has received a great deal of attention, the recent research being mainly focused on building block ciphers. As opposed to that, the design of lightweight hash functions is still far from being well-investigated with only few proposals in the public domain. In this article, we aim to address this gap by exploring the design space of lightweight hash functions based on the sponge construction instantiated with PRESENT-type permutations. The resulting family of hash functions is called SPONGENT. We propose 13 SPONGENT variants - for different levels of collision and (second) preimage resistance as well as for various implementation constraints. For each of them we provide several ASIC hardware implementations - ranging from the lowest area to the highest throughput. We make efforts to address the fairness of comparison with other designs by providing an exhaustive hardware evaluation on various technologies. We also prove essential differential properties of SPONGENT permutations, give a security analysis in terms of collision and preimage resistance, as well as study in detail dedicated linear distinguishers.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2012.196</guid>
  </item>
  <item>
     <title>PrePrint: Measuring Temporal Lags in Delay-Tolerant Networks</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2012.208</link>
     <description>Delay-tolerant networks (DTNs) are characterized by a possible absence of end-to-end communication routes at any instant. Yet, connectivity can be established over time and space, leading to evaluate a route both in terms of topological length or temporal length. The problem of measuring temporal distances was addressed in social networks through processing interaction traces in which contacts have no duration. We focus on the distributed version of this problem and in the case of arbitrary long contacts, asking whether each node can track in real-time how &#x0022;out-of-date'' it is with respect to every other node. Although straightforward with punctual contacts, this problem becomes substantially more complex with arbitrariry long contacts: consecutive hops may either be disconnected (intermittent connectedness) or connected (implying a continuum of path opportunities at times). The problem is further complicated by addressing continuous-time systems and non-negligible, though fixed, message latencies (time to propagate a message over a single link). We demonstrate the problem remains solvable by generalizing a time-measurement vector clock construct to the case of &#x0022;non-punctual'' causality, which results in a tool called T-Clocks, of independent interest. The rest of the paper shows how T-Clocks can help solve concrete problems like building foremost broadcast trees, network backbones, or fastest broadcast trees in periodic DTNs.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2012.208</guid>
  </item>
  <item>
     <title>PrePrint: Parallel Simulation of Pore Networks Using Multicore CPUs</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2012.197</link>
     <description>Pore networks can be simulated in silico by using the Dual Site-Bond Model. In this approach, a set of cavities (sites) are interconnected to each other by means of a set of throats (bonds), while considering that each site should be always larger than any of its delimiting bonds. The NoMISS greedy algorithm has been implemented recently in order to address this task; nevertheless, even if this procedure is relatively fast, there arises problems related to large memory consumption and long computing time, as pore networks become somewhat large. Here, three parallel methods are proposed to allow a proficient construction of large pore networks. The first method is a parallel Monte Carlo procedure, which applies a number of exchanges among pore sizes in order to obtain a valid pore network. The other two methods are parallel versions of the pioneering NoMISS greedy algorithm. The first version uses a static data partitioning to speed up the running time, whilst the second applies a dynamic data distribution policy to improve the pore network quality. The obtained results show the behavior of each proposed version with respect to their performance and quality, by employing the resources of a 125-core Linux cluster.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2012.197</guid>
  </item>
  <item>
     <title>PrePrint: Generalized Hypercubes: Edge Disjoint Hamiltonian Cycles and Gray Codes</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2012.192</link>
     <description>Some new classes of Hamming metric Gray codes over $Z_p^n$ where $p$ is a prime and $n$ is an integer power of 2, are described; then, how these Gray codes can be used to generate the maximum number of edge disjoint Hamiltonian cycles in an $n$-dimensional generalized hypercube (GHC), $Q_p^n$, is shown. For $Q_p^n$, the number of edge disjoint Hamiltonian cycles generated using these methods is $n(p - 1)/2$ which is the maximum possible since the degree of each node in $Q_p^n$ is $n(p - 1)$. In addition, for any integers $p$ and $n$, $p$ not necessarily a prime and $n$ not necessarily a power of 2, how to generate the maximum number of edge disjoint Hamiltonian cycles in $Q_p^n$ is also described.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2012.192</guid>
  </item>
  <item>
     <title>PrePrint: Protein Sequence Pattern Matching: Leveraging Application Specific Hardware Accelerators</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2012.187</link>
     <description>Digitalization has brought a tremendous momentum to health care research. Recognition of patterns in proteins is crucial for identifying possible functions of newly discovered proteins, as well as analysis of known proteins for previously undetermined activity. In this paper the workload consists of locating patterns from the PROSITE database in protein sequences. We optimize the pattern search task by using a new breed of processors that merge network and server attributes. We leverage massive multithreading and Regular-Expression (RegX) hardware accelerators; the latter were designed and built for an entirely different application--high bandwidth deep-packet inspection. Our multithreading optimization achieves 18x improvement, but by harnessing a RegX accelerator we were able to further demonstrate a significant 392x improvement relative to software pattern matching. Moreover, performance per area and power consumption are improved by multiple orders of magnitude as well.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2012.187</guid>
  </item>
  <item>
     <title>PrePrint: SD3: An Efficient Dynamic Data-Dependence Profiling Mechanism</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2012.182</link>
     <description>Data-dependence profiling is an important program analysis technique to exploit parallelism in serial programs. More specifically, manual, semi-automatic, or automatic parallelization can use the outcomes of data-dependence profiling to guide where and how to parallelize in a program. However, state-of-the-art data-dependence profiling techniques consume extremely huge resources as they suffer from two major issues when profiling large and long-running applications: (1) runtime overhead and (2) memory overhead. Existing data-dependence profilers are either unable to profile large-scale applications with a typical resource budget or only report very limited information. In this paper, we propose an efficient approach to data-dependence profiling that can address both runtime and memory overhead in a single framework. Our technique, called SD3, reduces the runtime overhead by parallelizing the dependence profiling step itself. To reduce the memory overhead, we compress memory accesses that exhibit stride patterns and compute data dependences directly in a compressed format. We demonstrate that SD3 reduces the runtime overhead when profiling SPEC 2006 by a factor of 4.1x on eight cores. For the memory overhead, we successfully profile 22 SPEC 2006 benchmarks with the reference input. We also demonstrate the usefulness of SD3 by showing manual parallelization followed by data-dependence profiling results.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2012.182</guid>
  </item>
  <item>
     <title>PrePrint: Real-Time Query Scheduling for Wireless Sensor Networks</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2012.172</link>
     <description>Recent years have seen the emergence of wireless cyber-physical systems that must support real-time queries of physical environments through wireless sensor networks. This paper proposes Real-Time Query Scheduling (RTQS), a novel approach to conflict-free transmission scheduling for real-time queries in wireless sensor networks. First, we show that there is an inherent trade-off between latency and real-time capacity in query scheduling. We then present three new real-time schedulers. The non-preemptive query scheduler supports high real-time capacity but cannot provide low response times to high priority queries due to priority inversions. The preemptive query scheduler eliminates priority inversions at the cost of reduced real-time capacity. The slack stealing query scheduler combines the benefits of the preemptive and non-preemptive schedulers to improve the real-time capacity while meeting query deadlines. Furthermore, we provide schedulability analysis for each scheduler. The analysis and advantages of our approach are validated through NS2 simulations.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2012.172</guid>
  </item>
  <item>
     <title>PrePrint: Requirement-Aware Scheduling of Bag-of-Tasks Applications on Grids with Dynamic Resilience</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2012.164</link>
     <description>In this paper, we propose an online algorithm called Prudent Scheduling Algorithm (PSA) for scheduling Grid applications structured as Bag-of-tasks (BoT). PSA is shown to prudently make scheduling decisions that can tolerate prediction errors. Also, PSA adopts task duplication as an attempt to reduce serious schedule increase. In addition, since the applications to be performed may widely vary in terms of their required hardware and software, we also capture the loads' various processing requirements in our algorithms, a unique feature that is applicable for running proprietary applications only on certain eligible processing nodes. Thus in our problem formulation each application can only be processed by certain processors as both the applications and processing nodes are heterogeneous. We then present a task selection policy, referred to as Requirement-Aware Load Selection (RALS) policy to handle the contention of multiple applications that have various processing requirements but share the same computing resources. We integrate RALS into PSA and RR to address the scheduling of multiple BoT applications with heterogeneous processing requirements and conduct rigorous performance evaluation studies to demonstrate the effectiveness and competitiveness of our approaches.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2012.164</guid>
  </item>
  <item>
     <title>PrePrint: One Attack to Rule Them All: Collision Timing Attack versus 42 AES ASIC Cores</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2012.154</link>
     <description>When complex functions, e.g., substitution boxes of block ciphers, are realized in hardware, timing attributes of the underlying combinational circuit depend on the input/output changes of the function. These characteristics can be exploited by the help of a relatively new scheme called fault sensitivity analysis. A collision timing attack which exploits the data-dependent timing characteristics of combinational circuits is demonstrated in this article. The attack is based on an also recently published correlation collision attack, which avoids the need for a hypothetical timing model for the underlying combinational circuit to recover the secret materials. The target platforms of our proposed attack are 14 AES ASIC cores of the SASEBO LSI chips in three different process technologies, 130nm, 90nm, and 65nm. Successfully breaking all cores including the DPA-protected and fault attack protected cores indicates the strength of the attack.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2012.154</guid>
  </item>
  <item>
     <title>PrePrint: New Metrics for the Reliability of Approximate and Probabilistic Adders</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TC.2012.146</link>
     <description>Addition is a fundamental function in arithmetic operation; several adder designs have been proposed for implementations in inexact computing. These adders show different operational profiles; some of them are approximate in nature while others rely on probabilistic features of nanoscale circuits. However, there has been a lack of appropriate metrics to evaluate the efficacy of various inexact designs. In this paper, new metrics are proposed for evaluating the reliability as well as the power efficiency of approximate and probabilistic adders. Error distance (ED) is initially defined as the arithmetic distance between an erroneous output and the correct output for a given input. The mean error distance (MED) and normalized error distance (NED) are then proposed as unified figures that consider the averaging effect of multiple inputs and the normalization of multiple-bit adders. The MED is, therefore, useful in assessing the effectiveness of an approximate or probabilistic adder implementation, while the NED is useful in characterizing the reliability of a specific design. Since inexact adders are often used for saving power, the product of power and NED is further utilized for evaluating the tradeoffs between power consumption and precision.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TC.2012.146</guid>
  </item>
  <item>
     <title>IEEE Transactions on Computers - </title>
     <link>http://www.computer.org/portal/site/tc/</link>
     <description>IEEE Transactions on Computers</description>
     <guid isPermaLink="true">http://www.computer.org/portal/site/tc/</guid>
  </item>
   </channel>
</rss>