B2SFinder - Detecting Open-Source Software Reuse in COTS Software.
ASE(2019)
Abstract
COTS software products are developed extensively on top of OSS projects, resulting in OSS reuse vulnerabilities. To detect such vulnerabilities, finding OSS reuses in COTS software has become imperative. While scalable to tens of thousands of OSS projects, existing binary-to-source matching approaches are severely imprecise in analyzing COTS software products, since they support only a limited number of code features, compute matching scores only approximately in measuring OSS reuses, and neglect the code structures in OSS projects.
We introduce a novel binary-to-source matching approach, called B2SFinder1, to address these limitations. First of all, B2SFinder can reason about seven kinds of code features that are traceable in both binary and source code. In order to compute matching scores precisely, B2SFinder employs a weighted feature matching algorithm that combines three matching methods (for dealing with different code features) with two importance-weighting methods (for computing the weight of an instance of a code feature in a given COTS software application based on its specificity and occurrence frequency). Finally, B2SFinder identifies different types of code reuses based on matching scores and code structures of OSS projects. We have implemented B2SFinder using an optimized data structure. We have evaluated B2SFinder using 21991 binaries from 1000 popular COTS software products and 2189 candidate OSS projects. Our experimental results show that B2SFinder is not only precise but also scalable. Compared with the state of the art, B2SFinder has successfully found up to 2.15x as many reuse cases in 53.85 seconds per binary file on average. We also discuss how B2SFinder can be leveraged in detecting OSS reuse vulnerabilities in practice.
MoreTranslated text
Key words
COTS Software, OSS, Code Reuse, One-Day Vulnerability, Code Feature, Binary-to-Source Matching
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined