<?xml version="1.0" encoding="ISO-8859-1"?>
<rss version="2.0">
<channel>
<title>IEEE Transactions on Parallel and Distributed Systems</title>
<link>http://www.computer.org/tpds</link>
<description>IEEE Transactions on Parallel and Distributed Systems (TPDS) is published monthly. The goal of TPDS is to publish a range of papers, comments on previously published papers, and survey articles that deal with the research areas of current importance to our readers. Current areas of particular interest include, but are not limited to the following: a) architectures: design, analysis, and implementation of multiple-processor systems (including multi-processors, multicomputers, and networks); impact of VLSI on system design; interprocessor communications; b) software: parallel languages and compilers; scheduling and task partitioning; databases, operating systems, and programming environments for multiple-processor systems; c) algorithms and applications: models of computation; analysis and design of parallel/distributed algorithms; application studies resulting in better multiple-processor systems; d) other issues: performance measurements, evaluation, modeling and simulation of multiple-processor systems; real-time, reliability and fault-tolerance issues; conversion of software from sequential-to-parallel forms.	</description>
	<language>en-us</language>
	<pubDate>Mon, 20 May 2013 10:00:04 GMT</pubDate>
	<image>
		<url>http://csdl.computer.org/common/images/logos/tpds.gif</url>
		<title>IEEE Computer Society</title>
		<description>List of recently published journal articles</description>
		<link>http://www.computer.org/tpds</link>
	</image>
  <item>
     <title>PrePrint: Cloning, Resource Exchange and Relation Adaptation: An Integrative Self-Organisation Mechanism in a Distributed Agent Network</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.120</link>
     <description>Self-organisation provides a suitable paradigm for developing self-managed complex distributed systems, e.g., grid computing and sensor networks. Towards this end, in this paper, an integrative self-organisation mechanism is proposed. Unlike current related studies, which concerned only a single principle of self-organisation, this mechanism synthesises the three principles of self-organisation, i.e., cloning/spawning, resource exchange and relation adaptation. Based on this mechanism, an agent can autonomously generate new agents when it is overloaded, exchange resources with other agents if necessary, and modify relations with other agents to achieve a better agent network structure. In this way, agents can adapt to dynamic environments. The proposed mechanism is evaluated through a comparison with three other approaches, each of which represents state of the art research in each of the three self-organisation principles, respectively. Experimental results demonstrate that the proposed mechanism outperforms the three approaches in various aspects.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.120</guid>
  </item>
  <item>
     <title>PrePrint: Trajectory Improves Data Delivery in Urban Vehicular Networks</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.118</link>
     <description>Efficient data delivery is of great importance, but highly challenging for vehicular networks because of frequent network disruption, fast topological change and mobility uncertainty. The vehicular trajectory knowledge plays a key role in data delivery. Existing algorithms have largely made predictions on the trajectory with coarse-grained patterns such as spatial distribution or/and the inter-meeting time distribution, which has led to poor data delivery performance. In this paper, we mine the extensive datasets of vehicular traces from two large cities in China, i.e., Shanghai and Shenzhen, through conditional entropy analysis, we find that there exists strong spatiotemporal regularity with vehicle mobility. By extracting mobility patterns from historical vehicular traces, we develop accurate trajectory predictions by using multiple order Markov chains. Based on an analytical model, we theoretically derive packet delivery probability with predicted trajectories. We then propose routing algorithms taking full advantage of predicted probabilistic vehicular trajectories. Finally, we carry out extensive simulations based on three large datasets of real GPS vehicular traces, i.e., Shanghai taxi dataset, Shanghai bus dataset and Shenzhen taxi dataset. The conclusive results demonstrate that our proposed routing algorithms can achieve significantly higher delivery ratio at lower cost when compared with existing algorithms.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.118</guid>
  </item>
  <item>
     <title>PrePrint: BitTorrent Locality and Transit Traffic Reduction: When, Why and at What Cost?</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.109</link>
     <description>A substantial amount of work has recently gone into localizing BitTorrent traffic within an ISP in order to avoid excessive and often times unnecessary transit costs. Several archi- tectures and systems have been proposed and the initial results from specific ISPs and a few torrents have been encouraging. In this work we attempt to deepen and scale our understanding of locality and its potential. Looking at specific ISPs, we consider tens of thousands of concurrent torrents, and thus capture ISP- wide implications that cannot be appreciated by looking at only a handful of torrents. Secondly, we go beyond individual case studies and present results for the top 100 ISPs in terms of number of users represented in our dataset of up to 40K torrents involving more than 3.9M concurrent peers and more than 20M in the course of a day spread in 11K ASes. We develop scalable methodologies that allow us to process this huge dataset and get concrete quantitative answers rather than qualitative speculations to questions like: &amp;amp;#x201C;what is the minimum and the maximum transit traffic reduction across hundreds of ISPs?&amp;amp;#x201D;, &amp;amp;#x201C;what are the win- win boundaries for ISPs and their users?&amp;amp;#x201D;, &amp;amp;#x201C;what is the maximum amount of transit traffic that can be localized without requiring fine-grained control of inter-ISP overlay connections?&amp;amp;#x201D;, &amp;amp;#x201C;what is the impact to transit traffic from upgrades of residential broadband speeds?&amp;amp;#x201D;.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.109</guid>
  </item>
  <item>
     <title>PrePrint: Process Placement in Multicore Clusters: Algorithmic Issues and Practical Techniques</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.104</link>
     <description>Current generations of NUMA node clusters feature multicore or manycore processors. Programming such architectures efficiently is a challenge because numerous hardware characteristics have to be taken into account, especially the memory hierarchy. One appealing idea to improve the performance of parallel applications is to decrease their communication costs by matching the communication pattern to the underlying hardware architecture. In this paper, we detail the algorithm and techniques proposed to achieve such a result: first, we gather both the communication pattern information and the hardware details. Then we compute a relevant reordering of the various process ranks of the application. Finally, those new ranks are used to reduce the communication costs of the application.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.104</guid>
  </item>
  <item>
     <title>PrePrint: Modeling Object Flows from Distributed and Federated RFID Data Streams for Efficient Tracking and Tracing</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.99</link>
     <description>In the emerging environment of the Internet of Things (IoT), through the connection of billions of radio frequency identification (RFID) tags and sensors to the Internet, applications will generate an unprecedented number of transactions and amount of data that require novel approaches in RFID data stream processing and management. Unfortunately, it is difficult to maintain a distributed model without a shared directory or structured index. In this paper, we propose a fully distributed model for sovereign RFID data streams. This model combines two techniques namely, Tilted Time Frame and Histogram to represent the patterns of object flows. Our model is efficient in space and can be stored in main memory. The model is built on top of an unstructured P2P overlay. To reduce the overhead of distributed data acquisition, we further propose several algorithms that use a statistically minimum number of network calls to maintain the model. The scalability and efficiency of the proposed model are demonstrated through an extensive set of experiments.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.99</guid>
  </item>
  <item>
     <title>PrePrint: Connectivity-Based Boundary Extraction of Large-Scale 3D Sensor Networks: Algorithm and Applications</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.97</link>
     <description>We present CABET, a novel Connectivity-bAsed Boundary Extraction scheme for large-scale Three-dimensional sensor networks. To the best of our knowledge, CABET is the first 3D-capable and pure connectivity-based solution for detecting sensor network boundaries. It is fully distributed, and is highly scalable, requiring overall message cost linear with the network size. A highlight of CABET is its nonuniform critical node sampling, called r&amp;amp;#x02B9;-sampling, that selects landmarks to form boundary surfaces with bias toward nodes embodying salient topological features. Simulations show that CABET is able to extract a well-connected boundary in the presence of holes and shape variation, with performance superior to that of some state-of-the-art alternatives. In addition, we show how CABET benefits a range of sensor network applications including 3D skeleton extraction, 3D segmentation, multi-resolution extraction, and 3D localization.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.97</guid>
  </item>
  <item>
     <title>PrePrint: On False Data Injection Attacks Against Power System State Estimation: Modeling and Countermeasures</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.92</link>
     <description>It is critical for a power system to estimate its operation state based on meter measurements in the field and the configuration of power grid networks. Recent study shows that the adversary can bypass the existing bad data detection schemes, posing dangerous threats to the operation of power grid systems. Nevertheless, two critical issues remain open: (i) how can an adversary choose the meters to compromise in order to cause the most significant deviation of the system state estimation, and (ii) how can a system operator defend against such attacks? To address these issues, we study the problem of finding the optimal attack strategy - i.e., a data-injection attacking strategy which selects a set of meters to manipulate so as to cause the maximum damage. We formalize the problem and develop efficient algorithms to identify the optimal meter set. We implement and test our attack strategy on various IEEE standard, and demonstrate its superiority over a baseline strategy of random selections. To defend against false data injection attacks, we propose a protection-based defense and a detection-based defense. For the detection-based defense, we develop the spatial-based and temporal-based detection schemes to accurately identify data injection attacks. Our experimental data show that our spatial-based detection algorithm can detect at least 95% attacks when the adversary changes up to 6% of the magnitude values of state variables. Our temporal-based detection algorithm can identify compromised meters quickly.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.92</guid>
  </item>
  <item>
     <title>PrePrint: Distributed and Asynchronous Data Collection in Cognitive Radio Networks with Fairness Consideration</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.75</link>
     <description>As a promising communication paradigm, Cognitive Radio Networks (CRNs) have paved a road for Secondary Users (SUs) to opportunistically exploit unused licensed spectrum without causing unacceptable interference to Primary Users (PUs). In this paper, we study the distributed data collection problem for asynchronous CRNs, which has not been addressed before. We study the Proper Carrier-sensing Range (PCR) for SUs. By working with this PCR, an SU can successfully conduct data transmission without disturbing the activities of PUs and other SUs. Subsequently, based on the PCR, we propose an Asynchronous Distributed Data Collection (ADDC) algorithm with fairness consideration for CRNs. ADDC collects data of a snapshot to the base station in a distributed manner without any time synchronization requirement. The algorithm is scalable and more practical compared with centralized and synchronized algorithms. Through comprehensive theoretical analysis, we show that ADDC is order-optimal in terms of delay and capacity, as long as an SU has a positive probability to access the spectrum. Furthermore, we extend ADDC to the continuous data collection issue, and analyze the delay and capacity performance of ADDC in continuous data collection, which are also proven to be order-optimal. Finally, extensive simulation results indicate that ADDC can effectively finish a data collection task and significantly reduce data collection delay.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.75</guid>
  </item>
  <item>
     <title>PrePrint: Scalable Relative Debugging</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.86</link>
     <description>Detecting and isolating bugs that arise only at high processor counts is a challenging task. Over a number of years, we have implemented a special debugging method, called &#x0022;relative debugging&#x0022;, that supports debugging applications as they evolve or are ported to larger machines. It allows a user to compare the state of a suspect program against another reference version even as the number of processors is increased. The innovative idea is the comparison of runtime data in order to reason about the state of the suspect program. Whilst powerful, a na&#x00EF;ve implementation of the comparison phase does not scale to large problems running on large machines. In this paper, we propose two different solutions including a hash-based scheme and a direct point-to-point scheme. We demonstrate the implementation, a case study, as well as the performance, of our techniques on 20K cores of a Cray XE6 system.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.86</guid>
  </item>
  <item>
     <title>PrePrint: The Generalized Loneliness Detector and Weak System Models for k-set Agreement</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.77</link>
     <description>This paper introduces two weak partially synchronous system models MAntix[n-k] and MSinkx[n-k], which are just strong enough for solving k-set agreement. We do so introducing the generalized (n-k)-loneliness failure detector L(k), which we first prove to be sufficient for solving k-set agreement, and showing that L(k) but not L(k-1) can be implemented in both models. MAntix[n-k] and MSinkx[n-k] are hence the first message passing models that lie between models where Omega (and therefore consensus) can be implemented and the purely asynchronous model. We also address k-set agreement in anonymous systems, that is systems where (unique) process identifiers are not available. Since our novel k-set agreement algorithm using L(k) also works in anonymous systems, it turns out that the loneliness failure detector L=L(n-1) introduced by Delporte et.al. is also the weakest failure detector for set agreement in anonymous systems. Finally, we analyze the relationship between L(k) and other k-set oriented failure detectors.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.77</guid>
  </item>
  <item>
     <title>PrePrint: Harmonic-Aware Multi-Core Scheduling For Fixed-Priority Real-Time Systems</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.71</link>
     <description>This paper presents a new semi-partitioned approach to schedule sporadic tasks on multi-core platform based on the Rate Monotonic Scheduling (RMS) policy. Our approach exploits the well known fact that harmonic tasks have better schedulablility than non-harmonic ones on a uniprocessor. The challenge for our approach, however, is how to take advantage of this fact to assign and split appropriate tasks on different processors in the semi-partitioned approach, and how to guarantee the schedulability of real-time tasks. We formally prove that our scheduling approach can successfully schedule any task sets with system utilizations bounded by the Liu&amp;amp;Layland bound. Our extensive experiment results demonstrate that the proposed algorithm can significantly improve the scheduling performance compared with the previous work.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.71</guid>
  </item>
  <item>
     <title>PrePrint: A Tag Encoding Scheme Against Pollution Attack to Linear Network Coding</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.24</link>
     <description>Network coding allows intermediate nodes to encode data packets to improve network throughput and robustness. However, it increases the propagation speed of polluted data packets if a malicious node injects fake data packets into the network, which degrades the bandwidth efficiency greatly and leads to incorrect decoding at sinks. In this paper, insights on new mathematical relations in linear network coding are presented and a key pre-distribution based tag encoding scheme KEPTE is proposed, which enables all intermediate nodes and sinks to detect the correctness of the received data packets. Furthermore, the security of KEPTE with regard to pollution attack and tag pollution attack is quantitatively analyzed. The performance of KEPTE is competitive in terms of: (1) low computational complexity; (2) the ability that all intermediate nodes and sinks detect pollution attack; (3) the ability that all intermediate nodes and sinks detect tag pollution attack; (4) high fault-tolerance ability. To the best of our knowledge, the existing key pre-distribution based schemes aiming at pollution detection can only achieve at most three points as described above. Finally, discussions on the application of KEPTE to practical network coding are also presented.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.24</guid>
  </item>
  <item>
     <title>PrePrint: A Scalable Two-Phase Top-Down Specialization Approach for Data Anonymization Using MapReduce on Cloud</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.48</link>
     <description>A large number of cloud services require users to share private data like electronic health records for data analysis or mining, bringing privacy concerns. Anonymizing data sets via generalization to satisfy certain privacy requirements such as k-anonymity is a widely used category of privacy preserving techniques. At present, the scale of data in many cloud applications increases tremendously in accordance with the Big Data trend, thereby making it a challenge for commonly-used software tools to capture, manage and process such large-scale data within a tolerable elapsed time. As a result, it is a challenge for existing anonymization approaches to achieve privacy preservation on privacy-sensitive large-scale data sets due to their insufficiency of scalability. In this paper, we propose a scalable two-phase top-down specialization approach to anonymize large-scale data sets using the MapReduce framework on cloud. In both phases of our approach, we deliberately design a group of innovative MapReduce jobs to concretely accomplish the specialization computation in a highly scalable way. Experimental evaluation results demonstrate that with our approach, the scalability and efficiency of top-down specialization can be improved significantly over existing approaches.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.48</guid>
  </item>
  <item>
     <title>PrePrint: Impact of Brooks-Iyengar Distributed Sensing Algorithm on Real Time Systems</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.25</link>
     <description>Time to time some algorithms appear and significantly affect technology. Once such algorithm is Brooks-Iyengar Distributed Sensing Algorithm \cite{BI96,BI98,CI02,KI04,I10} that has had a profound impact on sensor technology similar to the effect the the TCP/IP suite of protocols has had on data communication, Dijkstra's algorithm has had on process synchronization, and two-phase locking protocols has had on transaction serialization. It solved a number of complex issues in the deployment of large scale sensor networks and continues to do so as the technology moves forward.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.25</guid>
  </item>
  <item>
     <title>PrePrint: Network Coding Aware Cooperative MAC Protocol for Wireless Ad Hoc Networks</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.22</link>
     <description>Cooperative communication, which utilizes neighboring nodes to relay the overhearing information, has been employed as an effective technique to deal with the channel fading and to improve the network performances. Network coding, which combines several packets together for transmission, is very helpful to reduce the redundancy at the network and to increase the overall throughput. Introducing network coding into the cooperative retransmission process, enables the relay node to assist other nodes while serving its own traffic simultaneously. To leverage the benefits brought by both of them, an efficient Medium Access Control (MAC) protocol is needed. In this paper, we propose a novel network coding aware cooperative MAC protocol, namely NCAC-MAC, for wireless ad hoc networks. The design objective of NCAC-MAC is to increase the throughput and reduce the delay. Simulation results reveal that NCAC-MAC can improve the network performance under general circumstances comparing with two benchmarks.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.22</guid>
  </item>
  <item>
     <title>PrePrint: Scaling up Publish/Subscribe Overlays using Interest Correlation for Link Sharing</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.6</link>
     <description>Topic-based publish/subscribe is at the core of many distributed systems, ranging from application integration middleware to news dissemination. Therefore, much research was dedicated to publish/subscribe architectures and protocols, and in particular to the design of overlay networks for decentralized topic-based routing and efficient message dissemination. Nonetheless, existing systems fail to take full advantage of shared interests when disseminating information, hence suffering from high maintenance and traffic costs, or construct overlays that cope poorly with the scale and dynamism of large networks. In this paper we present StaN, a decentralized protocol that optimizes the properties of gossip-based overlay networks for topic- based publish/subscribe by sharing a large number of physical connections without disrupting its logical properties. StaN relies only on local knowledge and operates by leveraging common interests among participants to improve global resource usage and promote topic and event scalability. The experimental evaluation under two real workloads, both via a real deployment and through simulation shows that StaN provides an attractive infrastructure for scalable topic-based publish/subscribe.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.6</guid>
  </item>
  <item>
     <title>PrePrint: On the Knowledge Soundness of a Cooperative Provable Data Possession Scheme in Multicloud Storage</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.16</link>
     <description>Provable data possession (PDP) is a probabilistic proof technique for cloud service providers (CSPs) to prove the clients' data integrity without downloading the whole data. In 2012, Zhu {\em et al.} proposed the construction of an efficient PDP scheme for multicloud storage. They studied the existence of multiple CSPs to cooperatively store and maintain the clients' data. Then, based on homomorphic verifiable response and hash index hierarchy, they presented a cooperative PDP (CPDP) scheme from the bilinear pairings. They claimed that their scheme satisfied the security property of knowledge soundness. It is regretful that this comment shows that any malicious cloud service provider (CSP) or the malicious organizer (O) can generate the valid response which can pass the verification even if they have deleted all the stored data, {\em i.e.}, Zhu {\em et al.}'s CPDP scheme can not satisfy the property of knowledge soundness. Then, we discuss the origin and severity of the security flaws. It implies that the attacker can get the pay without storing the clients' data. It is important to clarify the scientific fact in order to design more secure and practical CPDP scheme in Zhu {\em et al.}'s system architecture and security model.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.16</guid>
  </item>
  <item>
     <title>PrePrint: Dynamic Scheduling for Wireless Data Center Networks</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.5</link>
     <description>Unbalanced traffic demands of different data center applications is an important issue in designing Data center networks (DCN). In this paper, we present our exploratory investigation on a hybrid DCN solution of utilizing wireless transmissions in DCNs. Our work aims to solve the congestion problem caused by a few hot nodes to improve the global performance. We model the wireless transmissions in DCN by considering both the wireless interference and the adaptive transmission rate. Besides, both throughput and job completion time are considered to measure the impact of wireless transmissions on the global performance. Based on the model, we formulate the problem of channel allocation as an optimization problem. We also design an approximation algorithm with an approximation bound of $1/2$ and a genetic algorithm to address the scheduling problem. A series of simulations are performed to evaluate and demonstrate the effectiveness of our wireless DCN scheme.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.5</guid>
  </item>
  <item>
     <title>PrePrint: A Scalable Work-Efficient and Depth-Optimal Parallel Scan for the GPGPU Environment</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPDS.2012.336</link>
     <description>The parallel scan is a basic tool that is used to parallelize algorithms which appear to have serial dependencies. The performance of these algorithms relies heavily on the efficiency of the parallel scan that is being used. To maintain work efficiency, current parallelization methods either sacrifice the overall depth or limit the scalability. In this study, we present a parallel scan method that is derived from the Han-Carlson parallel prefix graph and is both a work-efficient and a depth-optimal process. In this method, the depth is increased by a small constant value above the lower bound; therefore, the amount of computation and/or memory access is effectively reduced. We also employ a novel cascaded thread-block execution method to exploit the single-program-multiple-data (SPMD) nature of the Compute Unified Device Architecture (CUDA) environment developed by NVIDIA. The proposed method facilitates the low-latency inter-thread accessible shared memory and the single-instruction-multiple-thread (SIMT) characteristics of the graphics hardware to reduce high-latency global memory access and costly barrier synchronization. Our experimental results demonstrate an average speed up of approximately 40% and 10% over the CUDA Data Parallel Primitives (CUDPP) library derivation of the Kogge-Stone prefix tree and an implementation of Merrill and Grimshaw's method with coarser combination of the Kogge-Stone graph and the Brent-Kung prefix graph, respectively.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPDS.2012.336</guid>
  </item>
  <item>
     <title>PrePrint: Partial Probing for Scaling Overlay Routing</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPDS.2012.326</link>
     <description>Recent work has demonstrated that path diversity is an effective way for improving the end-to-end performance of network applications. For every node pair in a full-mesh network with $n$ nodes, this paper presents a family of new approaches for efficiently identifying an acceptable indirect path that has a similar to or even better performance than the direct path, hence considerably scaling the network at the cost of low per-node traffic overhead. In prior techniques, every node frequently incurs $O(n^{1.5})$ traffic overhead to probe the links from itself to all other nodes and to broadcast its probing results to a small set of nodes. In contrast, in our approaches, each node measures its links to only $O(\sqrt{n})$ other nodes and transmits the measuring results to $O(\sqrt{n})$ other nodes, where the two node sets of size $O(\sqrt{n})$ are determined by the partial sampling schemes presented in this paper. Mathematic analysis and trace-driven simulations show that our approaches dramatically reduce the per-node traffic overhead to $O(n)$ while maintaining an acceptable backup path for every node pair with high probability. More precisely, our approaches which are based on the enhanced and rotational partial sampling schemes, would be capable of increasing said probability to about $65\%$ and $85\%$, respectively. For many network applications, this is sufficiently high such that the increased scalability outweighs this drawback. In addition, it is not desirable to absolutely identify an outstanding backup path for every node pair in reality, due to the variable link quality.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPDS.2012.326</guid>
  </item>
  <item>
     <title>PrePrint: Identification of Peer-to-Peer VoIP Sessions Using Entropy and Codec Properties</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPDS.2012.316</link>
     <description>Voice over Internet Protocol (VoIP) applications based on peer-to-peer (P2P) communications have been experiencing considerable growth in terms of number of users. To overcome filtering policies or protect the privacy of their users, most of these applications implement mechanisms such as protocol obfuscation or payload encryption that avoid the inspection of their traffic, making it difficult to identify its nature. The incapacity to determine the application that is responsible for a certain flow raises challenges for the effective management of the network. In this article, a new method for the identification of VoIP sessions is presented. The proposed mechanism classifies the flows, in real-time, based on the speech codec used in the session. In order to make the classification lightweight, the behavioral signatures for each analyzed codec were created using only the lengths of the packets. Unlike most previous approaches, the classifier does not use the lengths of the packets individually. Instead, it explores their level of heterogeneity in real-time, using entropy to emphasize such feature. The results of the performance evaluation show that the proposed method is able to identify VoIP sessions accurately and simultaneously recognize the used speech codec.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPDS.2012.316</guid>
  </item>
  <item>
     <title>PrePrint: Enhancing Intra-Domain Scalability of IMS-based Services</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPDS.2012.312</link>
     <description>IP Multimedia Subsystem (IMS) and IMS-based services are increasingly providing interoperable session control and mobility for next generation all-IP networks. However, clear design guidelines and techniques for the support of scalable IMS-based deployment, especially for data intensive services such as mobility management, presence, and instant messaging, are still missing. That could block or at least relevantly slow down IMS acceptance by network operators and application providers. To address these challenges, this paper thoroughly analyzes IMS scalability with special attention to intra-domain deployment issues. Then, it proposes a novel solution with three core original contributions toward intra-domain scalability: i) data-centric dissemination of session state with limited overhead; ii) service-aware routing for fast intra-domain load balancing; iii) service-aware load monitoring and component de-/activation for long-term intra-domain load partitioning at both service and infrastructure levels. The reported experimental results point out that our solution can significantly increase intra-domain scalability with very limited costs.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPDS.2012.312</guid>
  </item>
  <item>
     <title>PrePrint: Minimum Message Waiting Time Scheduling in Distributed Systems</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPDS.2012.284</link>
     <description>In this paper, we examine the problem of packet scheduling in a single-hop multichannel system, with the goal of minimizing the average message waiting time. Such an objective function represents the delay incurred by the users before receiving the desired data. We show that the problem of finding a schedule with minimum message waiting time, is NP-complete, by means of polynomial time reduction of the time table design problem to our problem. We present also several heuristics which result in outcomes very close to the optimal ones. We compare these heuristics by means of extensive simulations.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPDS.2012.284</guid>
  </item>
  <item>
     <title>PrePrint: Sociality-Aware Access Point Selection in Enterprise Wireless LANs</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPDS.2012.301</link>
     <description>Most load balancing solutions in WLANs focuses on the optimization of AP operations, assuming that the arrivals and departures of users are independent. However, through the analysis of AP usage based on a real WLAN trace, we find that such an assumption does not hold. In fact, due to due to users&amp;amp;#8217; social activities which is particularly time for enterprise environments, they tend to arrive or leave in unison, which would disruptively affect the load balance among APs. In this paper, we propose a novel AP allocation scheme to tackle the load balancing problem in WLANs, taking into account the social relationships of users. In this scheme, users with intense social relationships are assigned to different APs so that jointly departure of those users would have minor impact on the load balance of APs. Given that the problem of allocating an AP for each user so that the average of the sums of social relation intensity between any pair of users in each AP is NP-complete, we propose an online greedy algorithm. Extensive trace-driven simulations demonstrate the efficacy of our scheme. Comparing to the state-of-art method, we can achieve about 64.7% balancing performance gain on average during peak hours in workdays.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPDS.2012.301</guid>
  </item>
  <item>
     <title>PrePrint: Parallel Sparse Approximate Inverse Preconditioning on Graphic Processing Units</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPDS.2012.286</link>
     <description>Accelerating numerical algorithms for solving sparse linear systems on parallel architectures has attracted the attention of many researchers due to their applicability to many engineering and scientific problems. The solution of sparse systems often dominates the overall execution time of such problems and is mainly solved by iterative methods. Preconditioners are used to accelerate the convergence rate of these solvers and reduce the total execution time. Sparse Approximate Inverse (SAI) preconditioners are a popular class of preconditioners designed to improve the condition number of large sparse matrices and accelerate the convergence rate of iterative solvers for sparse linear systems. We propose a GPU accelerated SAI preconditioning technique called GSAI, which parallelizes the computation of this preconditioner on NVIDIA graphic cards. The preconditioner is then used to enhance the convergence rate of the BiConjugate Gradient Stabilized (BiCGStab) iterative solver on the GPU. The SAI preconditioner is generated on average 28 and 23 times faster on the NVIDIA GTX480 and TESLA M2070 graphic cards respectively compared to ParaSails (a popular implementation of SAI preconditioners on CPU) single processor/core results. The proposed GSAI technique computes the SAI preconditioner in approximately the same time as ParaSails generates the same preconditioner on 16 AMD Opteron 252 processors.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPDS.2012.286</guid>
  </item>
  <item>
     <title>PrePrint: Defending Against Unidentifiable Attacks in Electric Power Grids</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPDS.2012.273</link>
     <description>The electric power grid is a crucial infrastructure in our society and is always a target of malicious users and attackers. In this paper, we first introduce the concept of unidentifiable attack, in which the control center cannot identify the attack even though it detects its presence. Thus, the control center cannot obtain deterministic state estimates, since there may have several feasible cases and the control center cannot simply favor one over the others. Given an unidentifiable attack, we present algorithms to enumerate all feasible cases, and propose an optimization strategy from the perspective of the control center to deal with an unidentifiable attack. Furthermore, we propose a heuristic algorithm from the view of an attacker to find good attack regions such that the number of meters required to compromise is as few as possible. We also formulate the problem that how to distinguish all feasible cases if the control center has some limited resources to verify some meters, and solve it with standard algorithms.Finally, we briefly evaluate and validate our enumerating algorithms and optimization strategy.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPDS.2012.273</guid>
  </item>
  <item>
     <title>PrePrint: A 3.42-Approximation Algorithm for Scheduling Malleable Tasks under Precedence Constraints</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPDS.2012.258</link>
     <description>Scheduling malleable tasks under general precedence constraints involves finding a minimum makespan (maximum completion time) by a feasible allotment. Based on the monotonous penalty assumptions of Blayo \textit{et al.}~\cite{Blayo99}, this work defines two assumptions concerning malleable tasks: the processing time of a malleable task is non-increasing in the number of processors, while the work of a malleable task is non-decreasing in the number of processors. Additionally, the work function is assumed herein to be convex in the processing time. The proposed algorithm reformulates the linear program of~\cite{Jansen06}, and this algorithm and associated proofs are inspired by the ones of~\cite{Jansen06}. This work describes a novel polynomial-time approximation algorithm that is capable of achieving an approximation ratio of $2+\sqrt{2}\approx 3.4142$. This work further demonstrates that the proposed algorithm can yield an approximation ratio of $2.9549$ when the processing time is strictly decreasing in the number of the processors allocated to the task. This finding represents an improvement upon the previous best approximation ratio of $100/63+100(\sqrt{6469}+137)/5481\approx 3.2920$~\cite{Jansen11} achieved under the same assumptions.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPDS.2012.258</guid>
  </item>
  <item>
     <title>PrePrint: An Efficient Penalty-Aware Cache to Improve  the Performance of Parity-Based Disk Arrays under Faulty Conditions</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPDS.2012.266</link>
     <description>The buffer cache plays an essential role in smoothing the gap between the upper-level computational components and the lower-level storage devices. A good buffer cache management scheme should be beneficial to not only the computational components, but also the storage components by reducing disk I/Os. Existing cache replacement algorithms are well optimized for disks in normal mode, but inefficient under faulty scenarios, such as a parity-based disk array with faulty disk(s). To address this issue, we propose a novel asymmetric buffer cache replacement strategy, named Victim Disk(s) First (VDF) cache, to improve the reliability and performance of a storage system consisting of a buffer cache and disk arrays. VDF cache gives higher priority to cache the blocks on the faulty disks when the disk array fails, thus reducing the I/Os addressed directly to the faulty disks. To verify the effectiveness of the VDF cache, we have integrated VDF into the popular cache algorithms LFU and LRU, named VDF-LFU and VDF-LRU, respectively. We have conducted intensive simulations as well as a prototype implementation for disk arrays to tolerate one disk failure (RAID-5) and two disk failures (RAID-6). The results show that VDF can effectively reduce disk I/Os to surviving disks, thus speed up the online recovery and/or improve the maximum system service rate.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPDS.2012.266</guid>
  </item>
  <item>
     <title>PrePrint: High-Accuracy TDOA-Based Localization without Time Synchronization</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPDS.2012.248</link>
     <description>Localization is of great importance in mobile and wireless network applications. TDOA (Time Difference of Arrival) is one of the widely used localization schemes, in which the target (source) emits a signal and a number of anchors (receivers) record the arriving time of the source signal. By calculating the time difference of different receivers, the location of the target is estimated. In such a scheme, receivers must be precisely time-synchronized. But time synchronization adds computational cost, and brings errors which may lower localization accuracy. Previous studies have shown that existing time synchronization approaches using low-cost devices are insufficiently accurate, or even infeasible under high requirement for accuracy. In our scheme (called Whistle), several asynchronous receivers record a target signal and a successive signal that is generated artificially. By two-signal sensing and sample counting techniques, time synchronization requirement can be removed, while high time resolution can be achieved. This design fundamentally changes TDOA in the sense of releasing the synchronization requirement and avoiding many sources of errors caused by time synchronization. We implement Whistle on commercial off-the- shelf (COTS) cell phones with acoustic signal and perform simulations with UWB signal. Especially we use Whistle to localize nodes of large-scale wireless networks, and also achieve desirable results. The extensive real-world experiments and simulations show that Whistle can be widely used with good accuracy.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPDS.2012.248</guid>
  </item>
  <item>
     <title>PrePrint: Robust and Scalable String Pattern Matching for Deep Packet Inspection on Multi-core Processors</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPDS.2012.217</link>
     <description>Conventionally, dictionary-based string pattern matching (SPM) has been implemented as Aho-Corasick deterministic finite automaton (AC-DFA). Due to its large memory footprint, a large-dictionary AC-DFA can experience poor cache performance when matching against inputs with high match ratio on multi-core processors. Such behaviors provide opportunities for performance-based denial-of-service attack in the SPM application. We propose a head-body finite automaton (HBFA) which implements SPM in two parts: a head DFA (H-DFA) and a body NFA (B-NFA). The H-DFA matches the dictionary up to a predefined prefix length in the same way as AC-DFA, but with a much smaller memory footprint. The B-NFA extends the matching to full dictionary lengths in a compact variable-stride branch data structure, accelerated by single-instruction multiple-data (SIMD) operations. A branch grafting mechanism is proposed to opportunistically advance the state of the H-DFA with the matching progress in the B-NFA, further reducing computation and memory bandwidth overhead. Compared with a fully-populated AC-DFA, our HBFA prototype has &amp;amp;#x003C;1/5 construction time, requires &amp;amp;#x003C;1/20 run-time memory, and achieves 3x to 8x throughput when matching real-life large dictionaries against inputs with high match ratios. The throughput scales up 27x to over 34 Gbps on a 32-core Intel Manycore Testing Lab machine based on the Intel Xeon X7560 processors.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPDS.2012.217</guid>
  </item>
  <item>
     <title>PrePrint: Link Scheduling for Exploiting Spatial Reuse in Multi-hop MIMO Networks</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPDS.2012.181</link>
     <description>MIMO has great potential for enhancing the throughput of multi-hop wireless networks via spatial multiplexing or spatial reuse. Spatial reuse with stream control (SC) provides a considerable improvement of the network throughput over spatial multiplexing. The gain of spatial reuse, however, is still not fully exploited. There exist large numbers of additional data streams, which could be transmitted concurrently with those data streams scheduled by stream control at certain time slots and vicinities. In this paper, we address the issue of MIMO link scheduling to maximize the gain of spatial reuse and thus network throughput. We propose a receiver-oriented interference suppression model (ROIS), based on which we design both centralized and distributed link scheduling algorithms to fully exploit the gain of spatial reuse in multi-hop MIMO networks. Further, we address the traffic-aware link scheduling problem by injecting non-uniform traffic load into the network. Through theoretical analysis and comprehensive performance evaluation, we achieve the following results: (1) Link scheduling based on ROIS achieves significant higher network throughput than that based on stream control, with any interference range, number of antennas, and average hop length of data flows. (2) The traffic-aware scheduling is enticingly complementary to the link scheduling based on ROIS model. Accordingly, the two scheduling schemes can be combined to further enhance the network throughput.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPDS.2012.181</guid>
  </item>
  <item>
     <title>IEEE Transactions on Parallel and Distributed Systems - </title>
     <link>http://opac.ieeecomputersociety.org/opac?year=2013&amp;volume=10&amp;issue=01&amp;acronym=tpds</link>
     <description>IEEE Transactions on Parallel and Distributed Systems</description>
     <guid isPermaLink="true">http://www.computer.org/portal/site/tpds/</guid>
  </item>
  <item>
     <title>PrePrint: How to Conduct Distributed Incomplete Pattern Matching</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.128</link>
     <description>In this paper, we first propose a very interesting and practical problem, pattern matching in a distributed mobile environment (e.g., mobile phone networks), where one person&amp;amp;#x2019;s pattern could be separately stored in a number of different stations, and such a local pattern is incomplete compared with the global pattern. A simple solution to pattern matching over a mobile environment is to collect all the data distributed in base stations to a data center and conduct pattern matching at the data center afterwards. Clearly, such a simple solution will raise huge amount of communication traffic, which could cause the communication bottleneck brought by the limited wireless bandwidth to be even worse. Therefore, a communication efficient and search effective solution is necessary. In our work, we present a novel solution which is based on our well-designed Weighted Bloom Filter (WBF), called, Distributed Incomplete pattern matching (DI-matching), to find target patterns over a distributed mobile environment. Specifically, to save communication cost and ensure pattern matching in distributed incomplete patterns, we use WBF to encode a query pattern and disseminate the encoded data to each base station. Each base station conducts a local pattern search according to the received WBF. Only qualified IDs and corresponding weights in each base station are sent to the data center for aggregation and verification. Through non-trivial theoretical analysis and extensive empirical experiments on a real city-scale mobile networks data set, we demonstrate the effectiveness and efficiency of our proposed solutions.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.128</guid>
  </item>
  <item>
     <title>PrePrint: Architectural Support for Handling Jitter in Shared Memory based Parallel Applications</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.127</link>
     <description>With an increasing number of cores per chip, it is becoming harder to guarantee optimal performance for parallel shared memory applications due to interference caused by kernel threads, interrupts, bus contention, and temperature management schemes (referred to as jitter). We demonstrate that the performance of parallel programs gets reduced (upto 42%) in large CMP based systems. In this paper, we characterize the jitter for large multi-core processors, and evaluate the loss in performance. We propose a novel jitter measurement unit that uses a distributed protocol to keep track of the number of wasted cycles. Subsequently, we try to compensate for jitter by using DVFS across a region of timing critical instructions called a frame. By performing detailed cycle accurate simulations, we show that we are able to execute a suite of Splash2 and Parsec benchmarks with a deterministic timing overhead limited to 7.5% for 15 out of 18 benchmarks with modest DVFS factors. We reduce the overall jitter by an average 67.7% for Splash2 and 83.8% for Parsec. The area overhead of our scheme is limited to 0.5%.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.127</guid>
  </item>
  <item>
     <title>PrePrint: Hybrid Dataflow/Von-Neumann Architectures</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.125</link>
     <description>General purpose hybrid dataflow/von-Neumann architectures are gaining attraction as effective parallel platforms. Although different implementations differ in the way they merge the conceptually different computational models, they all follow similar principles: they harness the parallelism and data synchronization inherent to the dataflow model, yet maintain the programmability of the von-Neumann model. In this paper, we classify hybrid dataflow/von-Neumann models according to two different taxonomies: one based on the execution model used for inter- and intra-block execution, and the other based on the integration level of both control and dataflow execution models. The paper reviews the basic concepts of von-Neumann and dataflow computing models, highlights their inherent advantages and limitations, and motivates the exploration of a synergistic hybrid computing model. Finally, we compare a representative set of recent general purpose hybrid dataflow/von-Neumann architectures, discuss their different approaches, and explore the evolution of these hybrid processors.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.125</guid>
  </item>
  <item>
     <title>PrePrint: Hop-by-Hop Message Authentication and Source Privacy in Wireless Sensor Networks</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.119</link>
     <description>Message authentication is one of the most effective ways to thwart unauthorized and corrupted messages from being forwarded in wireless sensor networks (WSNs). For this reason, many message authentication schemes have been developed, based on either a symmetric-key cryptosystem or a public-key cryptosystem. Most of them, however, have the limitations of high computational and communication overhead in addition to lack of scalability and resilience to node compromising attacks. To address these issues, a polynomial-based scheme was recently introduced. However, this scheme and its extensions all have the weakness of a built-in threshold determined by the degree of the polynomial: when the number of messages transmitted is larger than this threshold, the adversary can fully recover the polynomial. In this paper, we propose a scalable authentication scheme based on elliptic curve cryptography (ECC). While enabling intermediate nodes authentication, our proposed scheme allows any nodes to transmit an unlimited number of messages without suffering the threshold problem. In addition, our scheme can also provide message source privacy. Both theoretical analysis and simulation results demonstrate that our proposed scheme is more efficient than the polynomial-based approach in terms of computational and communication overhead under comparable security levels while providing message source privacy.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.119</guid>
  </item>
  <item>
     <title>PrePrint: Dynamic Trust Management for Delay Tolerant Networks and Its Application to Secure Routing</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.116</link>
     <description>Delay tolerant networks (DTNs) are characterized by high end-to-end latency, frequent disconnection, and opportunistic communication over unreliable wireless links. In this paper, we design and validate a dynamic trust management protocol for secure routing optimization in DTN environments in the presence of well-behaved, selfish and malicious nodes. We develop a novel model-based methodology for the analysis of our trust protocol and validate it via extensive simulation. Moreover, we address dynamic trust management, i.e., determining and applying the best operational settings at runtime in response to dynamically changing network conditions to minimize trust bias and to maximize the routing application performance. We perform a comparative analysis of our proposed routing protocol against Bayesian trust-based and non-trust based (PROPHET and epidemic) routing protocols. The results demonstrate that our protocol is able to deal with selfish behaviors and is resilient against trust-related attacks. Furthermore, our trust-based routing protocol can effectively trade off message overhead and message delay for a significant gain in delivery ratio. Our trust-based routing protocol operating under identified best settings outperforms Bayesian trust-based routing and PROPHET, and approaches the ideal performance of epidemic routing in delivery ratio and message delay without incurring high message or protocol maintenance overhead.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.116</guid>
  </item>
  <item>
     <title>PrePrint: Automated and Agile Server Parameter Tuning by Coordinated Learning and Control</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.115</link>
     <description>Automated server parameter tuning is crucial to performance and availability of Internet applications hosted in cloud environments. It is challenging due to high dynamics and burstiness of workloads, multi-tier service architecture, and virtualized server infrastructure. In this paper, we investigate automated and agile server parameter tuning for maximizing effective throughput of multi-tier Internet applications. A recent study proposed a reinforcement learning based server parameter tuning approach for minimizing average response time of multi-tier applications. Reinforcement learning is a decision making process determining the parameter tuning direction based on trial-and-error, instead of quantitative values for agile parameter tuning. It relies on a predefined adjustment value for each tuning action. However it is nontrivial or even infeasible to find an optimal value under highly dynamic and bursty workloads. We design a neural fuzzy control based approach that combines the strengths of fast online learning and self-adaptiveness of neural networks and fuzzy control. Due to the model independence, it is robust to highly dynamic and bursty workloads. It is agile in server parameter tuning due to its quantitative control outputs. We implemented the new approach on a testbed of virtualized data center hosting RUBiS and WikiBench benchmark applications. Experimental results demonstrate that the new approach significantly outperforms the reinforcement learning based approach for both improving effective system throughput and minimizing average response time.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.115</guid>
  </item>
  <item>
     <title>PrePrint: Constructing Sub-Arrays with Short Interconnects from Degradable VLSI Arrays</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.114</link>
     <description>Reducing the interconnection length of VLSI arrays leads to less capacitance, power dissipation and dynamic communication cost between the processing elements (PEs). This paper develops efficient algorithms for constructing tightly-coupled subarrays from the mesh-connected VLSI arrays with faulty PEs. For a given size r &amp;amp;#x22C5; s of the target (logical) array, the proposed algorithm searches and reroutes a physical r &amp;amp;#x00D7; s subarray that has the least number of faults, resulting in an approximate target array, which is subsequently extended to the desired target array. Experimental results show that over 65% redundant interconnects can be reduced for a 64 &amp;amp;#x00D7; 64 target array on the 512 &amp;amp;#x00D7; 512 host array with no more than 1% faults. In addition, we propose a recursive divide-and-conquer algorithm for constructing the maximum target array (MTA). The lower bound of the total interconnection length of the MTA has been established. Experimental results show that the proposed algorithm is capable of reducing the long interconnects by over 33% for the MTA derived from the 512 &amp;amp;#x00D7; 512 host array with no more than 1% faults. Moreover, the proposed total interconnection length of target array is close to the lower bound for the cases with relatively fewer number of faults.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.114</guid>
  </item>
  <item>
     <title>PrePrint: An Approximation Algorithm for Constructing Degree-Dependent Node-Weighted Multicast Trees</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.108</link>
     <description>This paper studies the problem of constructing a minimum-cost multicast tree (or Steiner tree) in which each node is associated with a cost that is dependent on its degree in the multicast tree. The cost of a node may depend on its degree in the multicast tree due to a number of reasons. For example, a node may need to perform various processing for sending messages to each of its neighbors in the multicast tree. Thus, the overhead for processing the messages increases as the number of neighbors increases. This paper devises a novel technique to deal with the degree-dependent node costs and applies the technique to develop an approximation algorithm for the degree-dependent node-weighted Steiner tree problem. The bound on the cost of the tree constructed by the proposed approximation algorithm is derived to be 2((ln (k/2))+1)(W_T&amp;amp;#x2217; + B), where k is the size of the set of multicast members, W_T&amp;amp;#x2217; is the cost of a minimum-cost Steiner tree T&amp;amp;#x2217;, and B is related to the degree-dependent node costs. Simulations are carried out to study the performance of the proposed algorithm. A distributed implementation of the proposed algorithm is presented. In addition, the proposed algorithm is generalized to solve the degree-dependent node-weighted constrained forest problem.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.108</guid>
  </item>
  <item>
     <title>PrePrint: Hamiltonicity of Product Networks with Faulty Elements</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.103</link>
     <description>\begin{abstract A graph G is k-fault Hamiltonian (resp. Hamiltonian-connected) if after deleting at most k vertices and/or edges from G, the resulting graph remains Hamiltonian (resp. Hamiltonian-connected). Let &amp;amp;#x03B4;i be the minimum degree of G i for i=0,1. Given (&amp;amp;#x03B4;i-2)-fault Hamiltonian and (&amp;amp;#x03B4;i-3)-fault Hamiltonian-connected graph G i for i=0,1, this study shows that the Cartesian product network G 0\times G 1 is (&amp;amp;#x03B4;0+&amp;amp;#x03B4;1-2)-fault Hamiltonian and (&amp;amp;#x03B4;0+&amp;amp;#x03B4;1-3)-fault Hamiltonian-connected. We then apply the result to determine the fault-tolerant Hamiltonicity and Hamiltonian-connectivity of two multiprocessor systems, namely the generalized hypercube and the nearest neighbor mesh hypercube, both of which belong to Cartesian product networks. This study also demonstrates that these results are worst-case optimal with respect to the number of faults tolerated.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.103</guid>
  </item>
  <item>
     <title>PrePrint: QoF: Towards Comprehensive Path Quality Measurement in Wireless Sensor Networks</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.98</link>
     <description>Due to its large scale and constrained communication radius, a wireless sensor network mostly relies on multi-hop transmissions to deliver a data packet. It is of essential importance to measure the forwarding quality of multi-hop paths. Existing metrics like ETX,ETF mainly focus on quantifying the link performance in between the nodes while overlooking the forwarding capabilities inside the sensor nodes. The experience on manipulating GreenOrbs, a large-scale sensor network with 330 nodes, reveals that the quality of forwarding inside each sensor node is at least an equally important factor that contributes to the path quality in data delivery. In this paper we propose QoF, Quality of Forwarding, a new metric which explores the performance in the gray zoneinside a node left unattended in previous studies. By combining the QoF measurements within a node and over a link, we are able to comprehensively measure the intact path quality in designing efficient multi-hop routing protocols. We implement QoF and evaluate the data collection performance in a test-bed consisting of 50 TelosB nodes. The experimental results show that our approach takes bothtransmission cost and forwarding reliability into consideration, thus achieving a high throughput for data collection.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.98</guid>
  </item>
  <item>
     <title>PrePrint: Network Performance Aware MPI Collective Communication Operations in the Cloud</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.96</link>
     <description>This paper examines the performance of collective communication operations in Message Passing Interfaces (MPI) in the cloud computing environment. The awareness of network topology has been a key factor in performance optimizations for existing MPI implementations. However, virtualization in the cloud environment not only hides the network topology information from the users, but also causes traffic interference and dynamics to network performance. Existing topology-aware optimizations are no longer feasible in the cloud environment. Therefore, we develop novel network performance aware algorithms for a series of collective communication operations including broadcast, reduce, gather and scatter. We further implement two common applications, N-body and conjugate gradient (CG). We have conducted our experiments with two complementary methods (on Amazon EC2 and simulations). Our experimental results show that the network performance awareness results in 25.4% and 28.3 performance improvement over MPICH2 on Amazon EC2 and on simulations, respectively. Evaluations on N-body and CG show 41.6% and 14.3% respectively on application performance improvement.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.96</guid>
  </item>
  <item>
     <title>PrePrint: Detecting Movements of a Target Using Face Tracking in Wireless Sensor Networks</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.91</link>
     <description>Target tracking is one of the key applications of wireless sensor networks (WSNs). Existing work mostly requires organizing groups of sensor nodes with measurements of a target&amp;amp;#8217;s movements or accurate distance measurements from the nodes to the target, and predicting those movements. These are, however, often difficult to accurately achieve in practice, especially in the case of unpredictable environments, sensor faults, etc. In this paper, we propose a new tracking framework, called FaceTrack, which employs the nodes of a spatial region surrounding a target, called a face. Instead of predicting the target location separately in a face, we estimate the target's moving toward another face. We introduce an edge detection algorithm to generate each face further in such a way that the nodes can prepare ahead of the target's moving, which greatly helps tracking the target in a timely fashion and recovering from special cases, e.g., sensor fault, loss of tracking. Also, we develop an optimal selection algorithm to select which sensors of faces to query and to forward the tracking data. Simulation results, compared with existing work, show that FaceTrack achieves better tracking accuracy and energy efficiency. We also validate its effectiveness via a proof-of-concept system of the Imote2 sensor platform.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.91</guid>
  </item>
  <item>
     <title>PrePrint: Air Indexing for On-Demand XML Data Broadcast</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.87</link>
     <description>XML data broadcast is an efficient way to disseminate semi-structured information in wireless mobile environments. In this paper, we propose a novel two-tier index structure to facilitate the access of XML document in an on-demand broadcast system. It provides the clients with an overall image of all the XML documents available at the server side and hence enables the clients to locate complete result sets accordingly. A pruning strategy is developed to cut down the index size and a two-tier structure is proposed to further remove any redundant information. In addition, two index distribution strategies, namely naive distribution and partial distribution, have been designed to interleave the index information with the XML documents in the wireless channels. Theoretical analysis and simulation experiments are also put forward to show the benefits of our indexing methods.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.87</guid>
  </item>
  <item>
     <title>PrePrint: Robust Component-Based Localization in Sparse Networks</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.85</link>
     <description>Accurate localization is crucial for wireless ad-hoc and sensor networks. Among the localization schemes, component-based approaches specialize in localization performance. By grouping nodes into increasingly large rigid components, component-based localization algorithms can properly conquer network sparseness and anchor sparseness. However, such design is sensitive to measurement errors. Existing robust localization methods focus on eliminating the positioning error of a single node. Indeed, a single node has two dimensions of freedom in 2D space and only suffers from one type of transformation: translation. As a rigid 2D structure, a component suffers from three possible transformations: translation, rotation, and reflection. A high degree of freedom brings about complicated cases of error productions and difficulties on error controlling. This study is the first work addressing how to deal with ranging noises for component-based methods. By exploiting a set of robust patterns, we present an Error-TOlerant Component-based algorithm (ETOC) that not only inherits the high-performance characteristic of component-based methods, but also achieves robustness of the result. We evaluate ETOC through a real-world sensor network consisting of 120 TelosB motes as well as extensive large-scale simulations. Experiment results show that, comparing with the-state-of-the-art designs, ETOC can work properly in sparse networks and provide more accurate localization results.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.85</guid>
  </item>
  <item>
     <title>PrePrint: Phoenix: Storage Using an Autonomous Mobile Infrastructure</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.84</link>
     <description>We propose a system that makes opportunistic use of mobile computing devices and ad-hoc networking to provide a transient storage service to clients in a localized geographical region. It is assumed that mobile devices are autonomous i.e. their mobility patterns are individually determined and not under the control of a central entity. The main challenge is to offset the potential data loss caused by node mobility with inter-node communication. We first argue, on the basis of simulation and theory that such a service is feasible, given a sufficiently high density of mobile devices. A distributed communication and storage protocol is then presented, and it is shown through testbed experiments and simulation that the protocol operates correctly and makes efficient use of storage space and communication bandwidth, while maximizing the longevity of stored data.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.84</guid>
  </item>
  <item>
     <title>PrePrint: ADAPT-POLICY: Task Assignment in Server Farms when the Service Time Distribution of Tasks is not Known a Priori</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.76</link>
     <description>Service time distribution of certain computing workloads (tasks) such as static web content is well known. However, for many other computing workloads (e.g. dynamic web content, scientific workloads) the service time distribution is not well understood and it is not correct to assume that these tasks follow a particular distribution. In this paper we consider task assignment in server farms when both the service time distribution of tasks and (actual) sizes of tasks are not known a priori. We propose an adaptive task assignment policy, called ADAPT-POLICY, which is based on the concept of multiple static-based task assignment policies. ADAPTPOLICY defines a set of policies for a given system taking into account the specific properties of the system. These policies are selected in such a way that they have different performance characteristics under different workload conditions (i.e. service time distributions, etc.). The objective is to use the task assignment policy with the best performance (i.e. the one with the least expected waiting time) to assign tasks. Which task assignment policy performs the best depends on the traffic conditions that vary over time. ADAPT-POLICY determines the best task assignment using the service time distribution of tasks (and various other traffic properties), which is estimated on-line and then it adaptively changes the task assignment policy to suit with the most recent traffic conditions. The experimental results show that ADAPT-POLICY can result in significant performance improvements over both static and dynamic task assignment policies.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.76</guid>
  </item>
  <item>
     <title>PrePrint: Performance Modeling of Atomic Additions on GPU Scratchpad Memory</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPDS.2012.319</link>
     <description>GPU application implementations using scatter approaches will fall into write contention due to atomic updates of output elements, if these result from more than one input element. Colliding threads will be serialized, seriously harming performance. Dealing with these issues requires a proper understanding of the behavior of the scratchpad or shared memory under conflicting accesses caused by concurrent threads. Thus, this paper presents an exhaustive microbenchmark-based analysis of atomic additions in shared memory that quantifies the impact of access conflicts on latency and throughput. This analysis has led us to discover the lock mechanism that enables atomic updates to shared memory and to propose a performance model to estimate the latency penalties due to collisions by position or bank conflicts. Then, we have derived experiments from this model that show us the way to optimize applications using atomic operations. Position and bank conflicts can be diminished by replication and padding, respectively. The benefits of such techniques are illustrated with the optimization of two widely-used voting processes: the centroid updating in k-means clustering, and histogram calculation.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPDS.2012.319</guid>
  </item>
  <item>
     <title>PrePrint: The Client Assignment Problem for Continuous Distributed Interactive Applications: Analysis, Algorithms, and Evaluation</title>
     <link>http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.47</link>
     <description>Interactivity is a primary performance measure for distributed interactive applications (DIAs) that enable participants at different locations to interact with each other in real time. Wide geographical spreads of participants in large-scale DIAs necessitate distributed deployment of servers to improve interactivity. In a distributed server architecture, the interactivity performance depends on not only client-to-server network latencies but also inter-server network latencies as well as synchronization delays to meet the consistency and fairness requirements of DIAs. All of these factors are directly affected by how the clients are assigned to the servers. In this paper, we investigate the problem of effectively assigning clients to servers for maximizing the interactivity of DIAs. We focus on continuous DIAs that change their states not only in response to user operations but also due to the passing of time. We analyze the minimum achievable interaction time for DIAs to preserve consistency and provide fairness among clients, and formulate the client assignment problem as a combinational optimization problem. We prove that this problem is NP-complete. Three heuristic assignment algorithms are proposed and their approximation ratios are theoretically analyzed. The performance of the algorithms is also experimentally evaluated using real Internet latency data. The experimental results show that our proposed Greedy Assignment and Distributed-Modify Assignment algorithms generally produce near optimal interactivity and significantly reduce the interaction time between clients compared to the intuitive algorithm that assigns each client to its nearest server.</description>
     <guid isPermaLink="true">http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.47</guid>
  </item>
   </channel>
</rss>