DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/SP.2018.00034
Huichen Li , Shanghai Jiao Tong University
Xiaojun Xu , Shanghai Jiao Tong University
Chang Liu , University of California, Berkeley
Teng Ren , TouchPal Inc.
Kun Wu , TouchPal Inc.
Xuezhi Cao , Shanghai Jiao Tong University
Weinan Zhang , Shanghai Jiao Tong University
Yong Yu , Shanghai Jiao Tong University
Dawn Song , University of California, Berkeley
Malicious calls, i.e., telephony spams and scams, have been a long-standing challenging issue that causes billions of dollars of annual financial loss worldwide. This work presents the first machine learning-based solution without relying on any particular assumptions on the underlying telephony network infrastructures. The main challenge of this decade-long problem is that it is unclear how to construct effective features without the access to the telephony networks' infrastructures. We solve this problem by combining several innovations. We first develop a TouchPal user interface on top of a mobile App to allow users tagging malicious calls. This allows us to maintain a large-scale call log database. We then conduct a measurement study over three months of call logs, including 9 billion records. We design 29 features based on the results, so that machine learning algorithms can be used to predict malicious calls. We extensively evaluate different state-of-the-art machine learning approaches using the proposed features, and the results show that the best approach can reduce up to 90% unblocked malicious calls while maintaining a precision over 99.99% on the benign call traffic. The results also show the models are efficient to implement without incurring a significant latency overhead. We also conduct ablation analysis, which reveals that using 10 out of the 29 features can reach a performance comparable to using all features.