$\alpha$ Diff: Cross-Version Binary Code Similarity Detection with DNN

ASE(2018)

引用 142|浏览139
暂无评分
摘要
Binary code similarity detection (BCSD) has many applications, including patch analysis, plagiarism detection, malware detection, and vulnerability search etc. Existing solutions usually perform comparisons over specific syntactic features extracted from binary code, based on expert knowledge. They have either high performance overheads or low detection accuracy. Moreover, few solutions are suitable for detecting similarities between cross-version binaries, which may not only diverge in syntactic structures but also diverge slightly in semantics. In this paper, we propose a solution α Diff, employing three semantic features, to address the cross-version BCSD challenge. It first extracts the intra-function feature of each binary function using a deep neural network (DNN). The DNN works directly on raw bytes of each function, rather than features (e.g., syntactic structures) provided by experts. α Diff further analyzes the function call graph of each binary, which are relatively stable in cross-version binaries, and extracts the inter-function and inter-module features. Then, a distance is computed based on these three features and used for BCSD. We have implemented a prototype of α Diff, and evaluated it on a dataset with about 2.5 million samples. The result shows that α Diff outperforms state-of-the-art static solutions by over 10 percentages on average in different BCSD settings.
更多
查看译文
关键词
Code Similarity Detection,DNN
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要