Cost-effective Detection of Drive-by-Download Attacks
with Hybrid Client Honeypots
<p>With the increasing connectivity of and reliance on computers and networks, important aspects of computer systems are under a constant threat. In particular, drive-by-download attacks have emerged as a new threat to the integrity of computer systems. Drive-by-download attacks are clientside attacks that originate fromweb servers that are visited byweb browsers. As a vulnerable web browser retrieves a malicious web page, the malicious web server can push malware to a user's machine that can be executed without their notice or consent. The detection of malicious web pages that exist on the Internet is prohibitively expensive. It is estimated that approximately 150 million malicious web pages that launch drive-by-download attacks exist today. Socalled high-interaction client honeypots are devices that are able to detect these malicious web pages, but they are slow and known to miss attacks. Detection ofmaliciousweb pages in these quantitieswith client honeypots would cost millions of US dollars. Therefore, we have designed a more scalable system called a hybrid client honeypot. It consists of lightweight client honeypots, the so-called low-interaction client honeypots, and traditional high-interaction client honeypots. The lightweight low-interaction client honeypots inspect web pages at high speed and forward only likely malicious web pages to the high-interaction client honeypot for a final classification. For the comparison of client honeypots and evaluation of the hybrid client honeypot system, we have chosen a cost-based evaluation method: the true positive cost curve (TPCC). It allows us to evaluate client honeypots against their primary purpose of identification of malicious web pages. We show that costs of identifying malicious web pages with the developed hybrid client honeypot systems are reduced by a factor of nine compared to traditional high-interaction client honeypots. The five main contributions of our work are: High-Interaction Client Honeypot The first main contribution of our work is the design and implementation of a high-interaction client honeypot Capture-HPC. It is an open-source, publicly available client honeypot research platform, which allows researchers and security professionals to conduct research on malicious web pages and client honeypots. Based on our client honeypot implementation and analysis of existing client honeypots, we developed a component model of client honeypots. This model allows researchers to agree on the object of study, allows for focus of specific areas within the object of study, and provides a framework for communication of research around client honeypots. True Positive Cost Curve As mentioned above, we have chosen a cost-based evaluationmethod to compare and evaluate client honeypots against their primary purpose of identification ofmaliciousweb pages: the true positive cost curve. It takes into account the unique characteristics of client honeypots, speed, detection accuracy, and resource cost and provides a simple, cost-based mechanism to evaluate and compare client honeypots in an operating environment. As such, the TPCC provides a foundation for improving client honeypot technology. The TPCC is the second main contribution of our work. Mitigation of Risks to the Experimental Design with HAZOP - Mitigation of risks to internal and external validity on the experimental design using hazard and operability (HAZOP) study is the third main contribution. This methodology addresses risks to intent (internal validity) as well as generalizability of results beyond the experimental setting (external validity) in a systematic and thorough manner. Low-Interaction Client Honeypots - Malicious web pages are usually part of a malware distribution network that consists of several servers that are involved as part of the drive-by-download attack. Development and evaluation of classification methods that assess whether a web page is part of a malware distribution network is the fourth main contribution. Hybrid Client Honeypot System - The fifth main contribution is the hybrid client honeypot system. It incorporates the mentioned classification methods in the form of a low-interaction client honeypot and a high-interaction client honeypot into a hybrid client honeypot systemthat is capable of identifying malicious web pages in a cost effective way on a large scale. The hybrid client honeypot system outperforms a high-interaction client honeypot with identical resources and identical false positive rate.</p>