web crawling Latest Research Papers

Web Crawling of Social Media and Related Web Platforms to Analyze Backyard Poultry Owners Responses to the 2018–2020 Newcastle Disease (ND) Outbreak in Southern California

Transboundary and Emerging Diseases ◽

10.1111/tbed.14454 ◽

2022 ◽

Author(s):

Joseph Gendreau ◽

Shayne Ramsubeik ◽

Maurice Pitesky

Keyword(s):

Social Media ◽

Newcastle Disease ◽

Southern California ◽

Web Crawling ◽

Backyard Poultry

Download Full-text

An Automated Word Embedding with Parameter Tuned Model for Web Crawling

Intelligent Automation & Soft Computing ◽

10.32604/iasc.2022.022209 ◽

2022 ◽

Vol 32 (3) ◽

pp. 1617-1632

Author(s):

S. Neelakandan ◽

A. Arun ◽

Raghu Ram Bhukya ◽

Bhalchandra M. Hardas ◽

T. Ch. Anil Kumar ◽

...

Keyword(s):

Word Embedding ◽

Web Crawling

Download Full-text

Node.js based Document Store for Web Crawling

10.31979/etd.e5dk-ut59 ◽

2021 ◽

Author(s):

David Bui

Keyword(s):

Web Crawling

Download Full-text

Understanding the Concept of Different Types of Web Crawling and Its Implementation

International Journal on Recent and Innovation Trends in Computing and Communication ◽

10.17762/ijritcc.v9i11.5511 ◽

2021 ◽

Vol 9 (11) ◽

pp. 06-10

Author(s):

Palika Jajoo

Keyword(s):

World Wide Web ◽

Search Engine ◽

World Wide ◽

Web Crawling ◽

Web Crawler ◽

Traditional Methods ◽

Digital World ◽

The World ◽

Different Types

Web crawling is the method in which the topics and information is browsed in the world wide web and then it is stored in big storing device from where it can be accessed by the user as per his need. This paper will explain the use of web crawling in digital world and how does it make difference for the search engine. There are a variety of web crawling available which is explained in brief in this paper. Web crawler has many advantages over other traditional methods of searching information online. Many tools are made available which supports web crawling and makes the process easy.

Download Full-text

Web Scraping or Web Crawling: State of Art, Techniques, Approaches and Application

International Journal of Advances in Soft Computing and its Applications ◽

10.15849/ijasca.211128.11 ◽

2021 ◽

Vol 13 (3) ◽

pp. 145-168

Author(s):

Moaiad Khder

Keyword(s):

Artificial Intelligence ◽

Big Data ◽

Business Intelligence ◽

Data Science ◽

Web Crawling ◽

Yield Data ◽

Python Language ◽

Web Scraping ◽

Special Degree ◽

Machine Readable

Web scraping or web crawling refers to the procedure of automatic extraction of data from websites using software. It is a process that is particularly important in fields such as Business Intelligence in the modern age. Web scrapping is a technology that allow us to extract structured data from text such as HTML. Web scrapping is extremely useful in situations where data isn’t provided in machine readable format such as JSON or XML. The use of web scrapping to gather data allows us to gather prices in near real time from retail store sites and provide further details, web scrapping can also be used to gather intelligence of illicit businesses such as drug marketplaces in the darknet to provide law enforcement and researchers valuable data such as drug prices and varieties that would be unavailable with conventional methods. It has been found that using a web scraping program would yield data that is far more thorough, accurate, and consistent than manual entry. Based on the result it has been concluded that Web scraping is a highly useful tool in the information age, and an essential one in the modern fields. Multiple technologies are required to implement web scrapping properly such as spidering and pattern matching which are discussed. This paper is looking into what web scraping is, how it works, web scraping stages, technologies, how it relates to Business Intelligence, artificial intelligence, data science, big data, cyber securityو how it can be done with the Python language, some of the main benefits of web scraping, and what the future of web scraping may look like, and a special degree of emphasis is placed on highlighting the ethical and legal issues. Keywords: Web Scraping, Web Crawling, Python Language, Business Intelligence, Data Science, Artificial Intelligence, Big Data, Cloud Computing, Cybersecurity, legal, ethical.

Download Full-text

Developing Products Update-Alert System for E-Commerce Websites Users using Html Data and Web Scraping Technique

International Journal on Natural Language Computing ◽

10.5121/ijnlc.2021.10501 ◽

2021 ◽

Vol 10 (5) ◽

pp. 01-07

Author(s):

Ikechukwu Onyenwe ◽

Ebele Onyedinma ◽

Chidinma Nwafor ◽

Obinna Agbata

Keyword(s):

Internet Technology ◽

The Internet ◽

Web Crawling ◽

Alert System ◽

Current Time ◽

Search Results ◽

Web Scraping ◽

The Way

Websites are regarded as domains of limitless information which anyone and everyone can access. The new trend of technology has shaped the way we do and manage our businesses. Today, advancements in Internet technology has given rise to the proliferation of e-commerce websites. This, in turn made the activities and lifestyles of marketers/vendors, retailers and consumers (collectively regarded as users in this paper) easier as it provides convenient platforms to sale/order items through the internet. Unfortunately, these desirable benefits are not without drawbacks as these platforms require that the users spend a lot of time and efforts searching for best product deals, products updates and offers on ecommerce websites. Furthermore, they need to filter and compare search results by themselves which takes a lot of time and there are chances of ambiguous results. In this paper, we applied web crawling and scraping methods on an e-commerce website to obtain HTML data for identifying products updates based on the current time. These HTML data are preprocessed to extract details of the products such as name, price, post date and time, etc. to serve as useful information for users.

Download Full-text

Analysis of Fire Accident Factors on Construction Sites Using Web Crawling and Deep Learning Approach

Sustainability ◽

10.3390/su132111694 ◽

2021 ◽

Vol 13 (21) ◽

pp. 11694

Author(s):

Jaehong Kim ◽

Sangpil Youm ◽

Yongwei Shan ◽

Jonghoon Kim

Keyword(s):

Fire Safety ◽

News Media ◽

Media Exposure ◽

The Body ◽

Web Crawling ◽

Construction Sites ◽

News Reports ◽

Body Of Knowledge ◽

Construction Accidents ◽

Fire Accidents

Fire safety on construction sites has been rarely studied because fire accidents have a lower occurrence compared to construction’s “Fatal Four”. Despite the lower occurrence, construction fire accidents tend to have a larger severity of impact. This study aims at using news media data and big data analysis techniques to identify patterns and factors related to fire accidents on construction sites. News reports on various construction accidents covered by news media were first collected through web crawling. Then, the authors identified the level of media exposure for various keywords related to construction accidents and analyzed the similarities between them. The results show that the level of media exposure for fire accidents on construction sites is much higher than for fall accidents, which suggests that fire accidents may have a greater impact on the surroundings than other accidents. It was found that the main causes of fire accidents on construction sites are violations of fire safety regulations and the absence of inspections, which could be sufficiently prevented. This study contributes to the body of knowledge by exploring factors related to fire safety on construction sites and their interrelationships as well as providing evidence that the fire type should be emphasized in safety-related regulations and codes on construction sites.

Download Full-text

The Suboptimal WMT Test Sets and Its Impact on Human Parity

10.20944/preprints202110.0199.v1 ◽

2021 ◽

Author(s):

Ahrii Kim ◽

Yunju Bak ◽

Jimin Sun ◽

Sungwon Lyu ◽

Changmin Lee

Keyword(s):

Machine Translation ◽

Web Crawling ◽

Data Set ◽

Test Set ◽

Neural Machine Translation ◽

Sentence Level ◽

Test Sets ◽

Source Test

With the advent of Neural Machine Translation, the more the achievement of human-machine parity is claimed at WMT, the more we come to ask ourselves if their evaluation environment can be trusted. In this paper, we argue that the low quality of the source test set of the news track at WMT may lead to an overrated human parity claim. First of all, we report nine types of so-called technical contaminants in the data set, originated from an absence of meticulous inspection after web-crawling. Our empirical findings show that when they are corrected, about 5% of the segments that have previously achieved a human parity claim turn out to be statistically invalid. Such a tendency gets evident when the contaminated sentences are solely concerned. To the best of our knowledge, it is the first attempt to question the “source” side of the test set as a potential cause of the overclaim of human parity. We cast evidence for such phenomenon that according to sentence-level TER scores, those trivial errors change a good part of system translations. We conclude that to overlook it would be a mistake, especially when it comes to an NMT evaluation.

Download Full-text

Web crawling dla celów lingwistycznych. Wybrane aspekty gromadzenia i analizy danych tekstowych na przykładzie rosyjskojęzycznych newsów internetowych

Prace Językoznawcze ◽

10.31648/pj.6838 ◽

2021 ◽

Vol 23 (3) ◽

pp. 87-104

Author(s):

Daniel Borysowski

Keyword(s):

Web Crawling

Autor niniejszego artykułu zgromadził ok. 2,7 mln rosyjskojęzycznych newsów internetowych.Zasadnicze cele tego tekstu stanowią: omówienie pojęcia web crawlinguw odniesieniu do pozyskiwania internetowych danych tekstowych, omówienie kwestiistrukturyzacji takich danych w nieanotowanych korpusach tekstowych, a także przedstawieniewybranych aspektów analizy danych strukturyzowanych w ten sposób. Autorrozpatruje newsy internetowe jako połączenie tekstu zasadniczego oraz identyfikującychi charakteryzujących go metadanych (wyróżnionych podczas automatycznej ich ekscerpcjize stron internetowych). Rozdział newsów na tekst zasadniczy i metadane stwarzamożliwość przeprowadzenia ich analizy z dwóch perspektyw – tekstowej oraz metainformacyjnej(dodatkowo, np. w odniesieniu do badań chronologizacyjnych, z perspektywyuwzględniającej oba te poziomy). Zarys możliwych badań lingwistycznych zgromadzonegomateriału uzupełnia autor ewaluacją wybranych wielowyrazowych całostek, wydobytychz tych tekstów z wykorzystaniem delimitacyjnej funkcji cudzysłowu.

Download Full-text

Reinforcement Learning in Deep Web Crawling: Survey

10.1007/978-981-16-3346-1_24 ◽

2021 ◽

pp. 291-300

Author(s):

Kapil Madan ◽

Rajesh Bhatia

Keyword(s):

Reinforcement Learning ◽

Deep Web ◽

Web Crawling

Download Full-text

web crawling
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Web Crawling of Social Media and Related Web Platforms to Analyze Backyard Poultry Owners Responses to the 2018–2020 Newcastle Disease (ND) Outbreak in Southern California

An Automated Word Embedding with Parameter Tuned Model for Web Crawling

Node.js based Document Store for Web Crawling

Understanding the Concept of Different Types of Web Crawling and Its Implementation

Web Scraping or Web Crawling: State of Art, Techniques, Approaches and Application

Developing Products Update-Alert System for E-Commerce Websites Users using Html Data and Web Scraping Technique

Analysis of Fire Accident Factors on Construction Sites Using Web Crawling and Deep Learning Approach

The Suboptimal WMT Test Sets and Its Impact on Human Parity

Web crawling dla celów lingwistycznych. Wybrane aspekty gromadzenia i analizy danych tekstowych na przykładzie rosyjskojęzycznych newsów internetowych

Reinforcement Learning in Deep Web Crawling: Survey

Export Citation Format

web crawlingRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Web Crawling of Social Media and Related Web Platforms to Analyze Backyard Poultry Owners Responses to the 2018–2020 Newcastle Disease (ND) Outbreak in Southern California

An Automated Word Embedding with Parameter Tuned Model for Web Crawling

Node.js based Document Store for Web Crawling

Understanding the Concept of Different Types of Web Crawling and Its Implementation

Web Scraping or Web Crawling: State of Art, Techniques, Approaches and Application

Developing Products Update-Alert System for E-Commerce Websites Users using Html Data and Web Scraping Technique

Analysis of Fire Accident Factors on Construction Sites Using Web Crawling and Deep Learning Approach

The Suboptimal WMT Test Sets and Its Impact on Human Parity

Web crawling dla celów lingwistycznych. Wybrane aspekty gromadzenia i analizy danych tekstowych na przykładzie rosyjskojęzycznych newsów internetowych

Reinforcement Learning in Deep Web Crawling: Survey

web crawling
Recently Published Documents