Cost-effective Detection of Drive-by-Download Attacks with Hybrid Client Honeypots

10.26686/wgtn.16972285.v1 ◽

2021 ◽

Author(s):

◽

Christian Seifert

Keyword(s):

Experimental Design ◽

External Validity ◽

Cost Effective ◽

Web Pages ◽

Cost Curve ◽

True Positive ◽

Classification Methods ◽

Web Page ◽

High Interaction ◽

Object Of Study

<p>With the increasing connectivity of and reliance on computers and networks, important aspects of computer systems are under a constant threat. In particular, drive-by-download attacks have emerged as a new threat to the integrity of computer systems. Drive-by-download attacks are clientside attacks that originate fromweb servers that are visited byweb browsers. As a vulnerable web browser retrieves a malicious web page, the malicious web server can push malware to a user's machine that can be executed without their notice or consent. The detection of malicious web pages that exist on the Internet is prohibitively expensive. It is estimated that approximately 150 million malicious web pages that launch drive-by-download attacks exist today. Socalled high-interaction client honeypots are devices that are able to detect these malicious web pages, but they are slow and known to miss attacks. Detection ofmaliciousweb pages in these quantitieswith client honeypots would cost millions of US dollars. Therefore, we have designed a more scalable system called a hybrid client honeypot. It consists of lightweight client honeypots, the so-called low-interaction client honeypots, and traditional high-interaction client honeypots. The lightweight low-interaction client honeypots inspect web pages at high speed and forward only likely malicious web pages to the high-interaction client honeypot for a final classification. For the comparison of client honeypots and evaluation of the hybrid client honeypot system, we have chosen a cost-based evaluation method: the true positive cost curve (TPCC). It allows us to evaluate client honeypots against their primary purpose of identification of malicious web pages. We show that costs of identifying malicious web pages with the developed hybrid client honeypot systems are reduced by a factor of nine compared to traditional high-interaction client honeypots. The five main contributions of our work are: High-Interaction Client Honeypot The first main contribution of our work is the design and implementation of a high-interaction client honeypot Capture-HPC. It is an open-source, publicly available client honeypot research platform, which allows researchers and security professionals to conduct research on malicious web pages and client honeypots. Based on our client honeypot implementation and analysis of existing client honeypots, we developed a component model of client honeypots. This model allows researchers to agree on the object of study, allows for focus of specific areas within the object of study, and provides a framework for communication of research around client honeypots. True Positive Cost Curve As mentioned above, we have chosen a cost-based evaluationmethod to compare and evaluate client honeypots against their primary purpose of identification ofmaliciousweb pages: the true positive cost curve. It takes into account the unique characteristics of client honeypots, speed, detection accuracy, and resource cost and provides a simple, cost-based mechanism to evaluate and compare client honeypots in an operating environment. As such, the TPCC provides a foundation for improving client honeypot technology. The TPCC is the second main contribution of our work. Mitigation of Risks to the Experimental Design with HAZOP - Mitigation of risks to internal and external validity on the experimental design using hazard and operability (HAZOP) study is the third main contribution. This methodology addresses risks to intent (internal validity) as well as generalizability of results beyond the experimental setting (external validity) in a systematic and thorough manner. Low-Interaction Client Honeypots - Malicious web pages are usually part of a malware distribution network that consists of several servers that are involved as part of the drive-by-download attack. Development and evaluation of classification methods that assess whether a web page is part of a malware distribution network is the fourth main contribution. Hybrid Client Honeypot System - The fifth main contribution is the hybrid client honeypot system. It incorporates the mentioned classification methods in the form of a low-interaction client honeypot and a high-interaction client honeypot into a hybrid client honeypot systemthat is capable of identifying malicious web pages in a cost effective way on a large scale. The hybrid client honeypot system outperforms a high-interaction client honeypot with identical resources and identical false positive rate.</p>

Download Full-text

Automatic Web Page Classification System with Improved Accuracy

Webology ◽

10.14704/web/v18i2/web18318 ◽

2021 ◽

Vol 18 (2) ◽

pp. 225-242

Author(s):

Chait hra ◽

Dr.G.M. Lingaraju ◽

Dr.S. Jagannatha

Keyword(s):

Research Work ◽

Web Pages ◽

Automated Classification ◽

Classification Methods ◽

Web Page ◽

Web Page Classification ◽

Chi Squared ◽

The Web ◽

Page Classification

Nowadays, the Internet contain s a wide variety of online documents, making finding useful information about a given subject impossible, as well as retrieving irrelevant pages. Web document and page recognition software is useful in a variety of fields, including news, medicine, and fitness, research, and information technology. To enhance search capability, a large number of web page classification methods have been proposed, especially for news web pages. Furthermore existing classification approaches seek to distinguish news web pages while still reducing the high dimensionality of features derived from these pages. Due to the lack of automated classification methods, this paper focuses on the classification of news web pages based on their scarcity and importance. This work will establish different models for the identification and classification of the web pages. The data sets used in this paper were collected from popular news websites. In the research work we have used BBC dataset that has five predefined categories. Initially the input source can be preprocessed and the errors can be eliminated. Then the features can be extracted depend upon the web page reviews using Term frequency-inverse document frequency vectorization. In the work 2225 documents are represented with the 15286 features, which represents the tf-idf score for different unigrams and bigrams. This type of the representation is not only used for classification task also helpful to analyze the dataset. Feature selection is done by using the chi-squared test which will be in the task of finding the terms that are most correlated with each of the categories. Then the pointed features can be selected using chi-squared test. Finally depend upon the classifier the web page can be classified. The results showed that list has obtained the highest percentage, which reflect its effectiveness on the classification of web pages.

Download Full-text

True Positive Cost Curve: A Cost-Based Evaluation Method for High-Interaction Client Honeypots

2009 Third International Conference on Emerging Security Information, Systems and Technologies ◽

10.1109/securware.2009.17 ◽

2009 ◽

Cited By ~ 8

Author(s):

Christian Seifert ◽

Peter Komisarczuk ◽

Ian Welch

Keyword(s):

Evaluation Method ◽

Cost Curve ◽

True Positive ◽

High Interaction

Download Full-text

Logical Structure for User Friendly dynamic Web Page Visualization for Small Screen Terminals Promoting E-Business

Recent Patents on Engineering ◽

10.2174/1872212114999201109204536 ◽

2020 ◽

Vol 14 ◽

Author(s):

Shefali Singhal ◽

Poonam Tanwar

Keyword(s):

Logical Structure ◽

Vital Role ◽

Web Pages ◽

Main Concern ◽

Web Page ◽

Large Screen ◽

Tree Data ◽

Tree Data Structure ◽

User Friendly ◽

Small Screen

Abstract:: Now-a-days when everything is going digitalized, internet and web plays a vital role in everyone’s life. When one has to ask something or has any online task to perform, one has to use internet to access relevant web-pages throughout. These web-pages are mainly designed for large screen terminals. But due to mobility, handy and economic reasons most of the persons are using small screen terminals (SST) like mobile phone, palmtop, pagers, tablet computers and many more. Reading a web page which is actually designed for large screen terminal on a small screen is time consuming and cumbersome task because there are many irrelevant content parts which are to be scrolled or there are advertisements, etc. Here main concern is e-business users. To overcome such issues the source code of a web page is organized in tree data-structure. In this paper we are arranging each and every main heading as a root node and all the content of this heading as a child node of the logical structure. Using this structure, we regenerate a web-page automatically according to SST size. Background:: DOM and VIPS algorithms are the main background techniques which are supporting the current research. Objective:: To restructure a web page in a more user friendly and content presenting format. Method Backtracking:: Method Backtracking: Results:: web page heading queue generation. Conclusion:: Concept of logical structure supports every SST.

Download Full-text

Automatic Ontology Learning from Multiple Knowledge Sources of Text

International Journal of Intelligent Information Technologies ◽

10.4018/ijiit.2018040101 ◽

2018 ◽

Vol 14 (2) ◽

pp. 1-21 ◽

Cited By ~ 2

Author(s):

B Sathiya ◽

T.V. Geetha

Keyword(s):

Knowledge Base ◽

Web Pages ◽

Knowledge Sources ◽

Statistical Measure ◽

Ontology Learning ◽

Web Page ◽

Semantic Query ◽

Probabilistic Knowledge ◽

Discovery Algorithms ◽

Different Sources

The prime textual sources used for ontology learning are a domain corpus and dynamic large text from web pages. The first source is limited and possibly outdated, while the second is uncertain. To overcome these shortcomings, a novel ontology learning methodology is proposed to utilize the different sources of text such as a corpus, web pages and the massive probabilistic knowledge base, Probase, for an effective automated construction of ontology. Specifically, to discover taxonomical relations among the concept of the ontology, a new web page based two-level semantic query formation methodology using the lexical syntactic patterns (LSP) and a novel scoring measure: Fitness built on Probase are proposed. Also, a syntactic and statistical measure called COS (Co-occurrence Strength) scoring, and Domain and Range-NTRD (Non-Taxonomical Relation Discovery) algorithms are proposed to accurately identify non-taxonomical relations(NTR) among concepts, using evidence from the corpus and web pages.

Download Full-text

Semi-Automatic Online Tagging with K-Medoid Clustering

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194014400075 ◽

2014 ◽

Vol 24 (08) ◽

pp. 1115-1130 ◽

Cited By ~ 1

Author(s):

He Hu ◽

Xiaoyong Du

Keyword(s):

Clustering Algorithm ◽

Prototype System ◽

Web Pages ◽

Web Page ◽

Web Browser ◽

User Input ◽

Browser Extension ◽

Annotation Process ◽

Efficiency And Effectiveness ◽

Automatic Mechanism

Online tagging is crucial for the acquisition and organization of web knowledge. We present TYG (Tag-as-You-Go) in this paper, a web browser extension for online tagging of personal knowledge on standard web pages. We investigate an approach to combine a K-Medoid-style clustering algorithm with the user input to achieve semi-automatic web page annotation. The annotation process supports user-defined tagging schema and comprises an automatic mechanism that is built upon clustering techniques, which can automatically group similar HTML DOM nodes into clusters corresponding to the user specification. TYG is a prototype system illustrating the proposed approach. Experiments with TYG show that our approach can achieve both efficiency and effectiveness in real world annotation scenarios.

Download Full-text

A Simulation of the Structure of the World-Wide Web

Sociological Research Online ◽

10.5153/sro.684 ◽

2002 ◽

Vol 7 (1) ◽

pp. 9-25 ◽

Cited By ~ 2

Author(s):

Moses Boudourides ◽

Gerasimos Antypas

Keyword(s):

World Wide Web ◽

Power Law ◽

Web Sites ◽

World Wide ◽

The Internet ◽

Web Pages ◽

Small Worlds ◽

Web Page ◽

Simple Simulation ◽

The World

In this paper we are presenting a simple simulation of the Internet World-Wide Web, where one observes the appearance of web pages belonging to different web sites, covering a number of different thematic topics and possessing links to other web pages. The goal of our simulation is to reproduce the form of the observed World-Wide Web and of its growth, using a small number of simple assumptions. In our simulation, existing web pages may generate new ones as follows: First, each web page is equipped with a topic concerning its contents. Second, links between web pages are established according to common topics. Next, new web pages may be randomly generated and subsequently they might be equipped with a topic and be assigned to web sites. By repeated iterations of these rules, our simulation appears to exhibit the observed structure of the World-Wide Web and, in particular, a power law type of growth. In order to visualise the network of web pages, we have followed N. Gilbert's (1997) methodology of scientometric simulation, assuming that web pages can be represented by points in the plane. Furthermore, the simulated graph is found to possess the property of small worlds, as it is the case with a large number of other complex networks.

Download Full-text

Exploring the customer orientation of Spanish pharmacy websites

International Journal of Pharmaceutical and Healthcare Marketing ◽

10.1108/ijphm-04-2018-0025 ◽

2018 ◽

Vol 12 (4) ◽

pp. 447-462 ◽

Cited By ~ 2

Author(s):

Carmen Domínguez-Falcón ◽

Domingo Verano-Tacoronte ◽

Marta Suárez-Fuentes

Keyword(s):

Web 2.0 ◽

Customer Orientation ◽

Community Pharmacies ◽

Web Pages ◽

Web Page ◽

Pharmaceutical Sector ◽

Content Type ◽

Web Page Design ◽

Page Design ◽

The Web

Purpose The strong regulation of the Spanish pharmaceutical sector encourages pharmacies to modify their business model, giving the customer a more relevant role by integrating 2.0 tools. However, the study of the implementation of these tools is still quite limited, especially in terms of a customer-oriented web page design. This paper aims to analyze the online presence of Spanish community pharmacies by studying the profile of their web pages to classify them by their degree of customer orientation. Design/methodology/approach In total, 710 community pharmacies were analyzed, of which 160 had Web pages. Using items drawn from the literature, content analysis was performed to evaluate the presence of these items on the web pages. Then, after analyzing the scores on the items, a cluster analysis was conducted to classify the pharmacies according to the degree of development of their online customer orientation strategy. Findings The number of pharmacies with a web page is quite low. The development of these websites is limited, and they have a more informational than relational role. The statistical analysis allows to classify the pharmacies in four groups according to their level of development Practical implications Pharmacists should make incremental use of their websites to facilitate real two-way communication with customers and other stakeholders to maintain a relationship with them by having incorporated the Web 2.0 and social media (SM) platforms. Originality/value This study analyses, from a marketing perspective, the degree of Web 2.0 adoption and the characteristics of the websites, in terms of aiding communication and interaction with customers in the Spanish pharmaceutical sector.

Download Full-text

An Approach to Cost-Effective Wind Tunnel Test Campaign Using Experimental Design and Real-Time Modeling for a Single Use Autonomous Air Vehicle

AIAA Scitech 2020 Forum ◽

10.2514/6.2020-1646 ◽

2020 ◽

Author(s):

Eren Topbas ◽

Vefa N. Yavuztürk ◽

Ozgun Savas

Keyword(s):

Experimental Design ◽

Wind Tunnel ◽

Real Time ◽

Wind Tunnel Test ◽

Cost Effective ◽

Time Modeling ◽

Air Vehicle ◽

Single Use

Download Full-text

A Web Page Clustering Method Based on Formal Concept Analysis

Information ◽

10.3390/info9090228 ◽

2018 ◽

Vol 9 (9) ◽

pp. 228 ◽

Cited By ~ 1

Author(s):

Zuping Zhang ◽

Jing Zhao ◽

Xiping Yan

Keyword(s):

Formal Concept Analysis ◽

Concept Lattice ◽

Concept Analysis ◽

Formal Context ◽

Formal Concept ◽

Web Pages ◽

Web Page ◽

Data Links ◽

Web Page Clustering ◽

The Web

Web page clustering is an important technology for sorting network resources. By extraction and clustering based on the similarity of the Web page, a large amount of information on a Web page can be organized effectively. In this paper, after describing the extraction of Web feature words, calculation methods for the weighting of feature words are studied deeply. Taking Web pages as objects and Web feature words as attributes, a formal context is constructed for using formal concept analysis. An algorithm for constructing a concept lattice based on cross data links was proposed and was successfully applied. This method can be used to cluster the Web pages using the concept lattice hierarchy. Experimental results indicate that the proposed algorithm is better than previous competitors with regard to time consumption and the clustering effect.

Download Full-text