Automated document metadata extraction

Web documents are available in various forms, most of which do not carry additional semantics. This paper presents a model for general document metadata extraction. The model, which combines segmentation by keywords and pattern matching techniques, was implemented using PHP, MySQL, JavaScript and HTML. The system was tested with 40 randomly selected PDF documents (mainly theses). An evaluation of the system was done using standard criteria measures namely precision, recall, accuracy and F-measure. The results show that the model is relatively effective for the task of metadata extraction, especially for theses and dissertations. A combination of machine learning with these rule-based methods will be explored in the future for better results.

Download Full-text

A NOVEL AND EFFICIENT METHOD FOR PARSING UNRESTRICTED TEXTS OF QUASI FREE WORD ORDER LANGUAGES

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213095000152 ◽

1995 ◽

Vol 04 (03) ◽

pp. 301-321 ◽

Cited By ~ 3

Author(s):

S.E. MICHOS ◽

N. FAKOTAKIS ◽

G. KOKKINAKIS

Keyword(s):

Pattern Matching ◽

Language Processing ◽

Word Order ◽

Theoretical Background ◽

Greek Language ◽

Rule Based ◽

Early Processing ◽

Matching Techniques ◽

Free Word ◽

Large Category

This paper deals with the problems stemming from the parsing of long sentences in quasi free word order languages. Due to the word order freedom of a large category of languages including Greek and the limitations of rule-based grammar parsers in parsing unrestricted texts of such languages, we propose a flexible and effective method for parsing long sentences of such languages that combines heuristic information and pattern-matching techniques in early processing levels. This method is deeply characterized by its simplicity and robustness. Although it has been developed and tested for the Greek language, its theoretical background, implementation algorithm and results are language independent and can be of considerable value for many practical natural language processing (NLP) applications involving parsing of unrestricted texts.

Download Full-text

A Brief Survey on Text Classification Using Various Machine Learning Techniques

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse.v8i1.521 ◽

2018 ◽

Vol 8 (1) ◽

pp. 14

Author(s):

Padmavathi .S ◽

M. Chidambaram

Keyword(s):

Machine Learning ◽

Text Classification ◽

Fixed Number ◽

Machine Learning Techniques ◽

Online Information ◽

Rule Based ◽

Learning Techniques ◽

Machine Learning Approach ◽

Rule Based Approach

Text classification has grown into more significant in managing and organizing the text data due to tremendous growth of online information. It does classification of documents in to fixed number of predefined categories. Rule based approach and Machine learning approach are the two ways of text classification. In rule based approach, classification of documents is done based on manually defined rules. In Machine learning based approach, classification rules or classifier are defined automatically using example documents. It has higher recall and quick process. This paper shows an investigation on text classification utilizing different machine learning techniques.

Download Full-text

Clinical applications of artificial intelligence and machine learning in cancer diagnosis: looking into the future

Cancer Cell International ◽

10.1186/s12935-021-01981-1 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Muhammad Javed Iqbal ◽

Zeeshan Javed ◽

Haleema Sadia ◽

Ijaz A. Qureshi ◽

Asma Irshad ◽

...

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Cancer Diagnosis ◽

Disease Risk ◽

Clinical Applications ◽

Optimal Decision ◽

Great Promise ◽

Time Range ◽

Base System ◽

The Future

AbstractArtificial intelligence (AI) is the use of mathematical algorithms to mimic human cognitive abilities and to address difficult healthcare challenges including complex biological abnormalities like cancer. The exponential growth of AI in the last decade is evidenced to be the potential platform for optimal decision-making by super-intelligence, where the human mind is limited to process huge data in a narrow time range. Cancer is a complex and multifaced disorder with thousands of genetic and epigenetic variations. AI-based algorithms hold great promise to pave the way to identify these genetic mutations and aberrant protein interactions at a very early stage. Modern biomedical research is also focused to bring AI technology to the clinics safely and ethically. AI-based assistance to pathologists and physicians could be the great leap forward towards prediction for disease risk, diagnosis, prognosis, and treatments. Clinical applications of AI and Machine Learning (ML) in cancer diagnosis and treatment are the future of medical guidance towards faster mapping of a new treatment for every individual. By using AI base system approach, researchers can collaborate in real-time and share knowledge digitally to potentially heal millions. In this review, we focused to present game-changing technology of the future in clinics, by connecting biology with Artificial Intelligence and explain how AI-based assistance help oncologist for precise treatment.

Download Full-text

Text Classification in Clinical Practice Guidelines Using Machine-Learning Assisted Pattern-Based Approach

Applied Sciences ◽

10.3390/app11083296 ◽

2021 ◽

Vol 11 (8) ◽

pp. 3296

Author(s):

Musarrat Hussain ◽

Jamil Hussain ◽

Taqdir Ali ◽

Syed Imran Ali ◽

Hafiz Syed Muhammad Bilal ◽

...

Keyword(s):

Machine Learning ◽

Decision Making ◽

Clinical Practice ◽

Clinical Practice Guidelines ◽

Practice Guidelines ◽

Machine Learning Algorithms ◽

Nominal Group ◽

Specific Information ◽

Matching Techniques ◽

Disease Specific

Clinical Practice Guidelines (CPGs) aim to optimize patient care by assisting physicians during the decision-making process. However, guideline adherence is highly affected by its unstructured format and aggregation of background information with disease-specific information. The objective of our study is to extract disease-specific information from CPG for enhancing its adherence ratio. In this research, we propose a semi-automatic mechanism for extracting disease-specific information from CPGs using pattern-matching techniques. We apply supervised and unsupervised machine-learning algorithms on CPG to extract a list of salient terms contributing to distinguishing recommendation sentences (RS) from non-recommendation sentences (NRS). Simultaneously, a group of experts also analyzes the same CPG and extract the initial patterns “Heuristic Patterns” using a group decision-making method, nominal group technique (NGT). We provide the list of salient terms to the experts and ask them to refine their extracted patterns. The experts refine patterns considering the provided salient terms. The extracted heuristic patterns depend on specific terms and suffer from the specialization problem due to synonymy and polysemy. Therefore, we generalize the heuristic patterns to part-of-speech (POS) patterns and unified medical language system (UMLS) patterns, which make the proposed method generalize for all types of CPGs. We evaluated the initial extracted patterns on asthma, rhinosinusitis, and hypertension guidelines with the accuracy of 76.92%, 84.63%, and 89.16%, respectively. The accuracy increased to 78.89%, 85.32%, and 92.07% with refined machine-learning assistive patterns, respectively. Our system assists physicians by locating disease-specific information in the CPGs, which enhances the physicians’ performance and reduces CPG processing time. Additionally, it is beneficial in CPGs content annotation.

Download Full-text

Integrating machine learning and blockchain to develop a system to veto the forgeries and provide efficient results in education sector

Visual Computing for Industry Biomedicine and Art ◽

10.1186/s42492-021-00084-y ◽

2021 ◽

Vol 4 (1) ◽

Author(s):

Dhruvil Shah ◽

Devarsh Patel ◽

Jainish Adesara ◽

Pruthvi Hingu ◽

Manan Shah

Keyword(s):

Machine Learning ◽

Education Sector ◽

Student Records ◽

Job Roles ◽

Student Data ◽

Disruptive Technologies ◽

Valid Data ◽

The Future ◽

Student Achievements ◽

The University

AbstractAlthough the education sector is improving more quickly than ever with the help of advancing technologies, there are still many areas yet to be discovered, and there will always be room for further enhancements. Two of the most disruptive technologies, machine learning (ML) and blockchain, have helped replace conventional approaches used in the education sector with highly technical and effective methods. In this study, a system is proposed that combines these two radiant technologies and helps resolve problems such as forgeries of educational records and fake degrees. The idea here is that if these technologies can be merged and a system can be developed that uses blockchain to store student data and ML to accurately predict the future job roles for students after graduation, the problems of further counterfeiting and insecurity in the student achievements can be avoided. Further, ML models will be used to train and predict valid data. This system will provide the university with an official decentralized database of student records who have graduated from there. In addition, this system provides employers with a platform where the educational records of the employees can be verified. Students can share their educational information in their e-portfolios on platforms such as LinkedIn, which is a platform for managing professional profiles. This allows students, companies, and other industries to find approval for student data more easily.

Download Full-text

Machine learning-guided phenotyping of dilated cardiomyopathy and treatment of heart failure by antisense oligonucleotides: the future has begun

European Heart Journal ◽

10.1093/eurheartj/ehaa1063 ◽

2021 ◽

Vol 42 (2) ◽

pp. 139-142

Author(s):

Filippo Crea

Keyword(s):

Machine Learning ◽

Heart Failure ◽

Dilated Cardiomyopathy ◽

Antisense Oligonucleotides ◽

The Future

Download Full-text

An Experimental Study of Diversity of Diabetes Disease Features by Bagging and Boosting Ensemble Method with Rule Based Machine Learning Classifier Algorithms

SN Computer Science ◽

10.1007/s42979-020-00446-y ◽

2021 ◽

Vol 2 (1) ◽

Author(s):

Dhyan Chandra Yadav ◽

Saurabh Pal

Keyword(s):

Machine Learning ◽

Experimental Study ◽

Ensemble Method ◽

Rule Based ◽

Learning Classifier ◽

Classifier Algorithms

Download Full-text

A Comparison of Rule-Based and Machine Learning Models for Classification of Human Factors Aviation Safety Event Reports

Proceedings of the Human Factors and Ergonomics Society Annual Meeting ◽

10.1177/1071181320641034 ◽

2020 ◽

Vol 64 (1) ◽

pp. 129-133

Author(s):

Katherine Darveau ◽

Daniel Hannon ◽

Chad Foster

Keyword(s):

Machine Learning ◽

Human Factors ◽

Human Error ◽

Data Science ◽

Aircraft Engine ◽

Rule Based ◽

Root Cause ◽

Textual Data ◽

Safety Event

There is growing interest in the study and practice of applying data science (DS) and machine learning (ML) to automate decision making in safety-critical industries. As an alternative or augmentation to human review, there are opportunities to explore these methods for classifying aviation operational events by root cause. This study seeks to apply a thoughtful approach to design, compare, and combine rule-based and ML techniques to classify events caused by human error in aircraft/engine assembly, maintenance or operation. Event reports contain a combination of continuous parameters, unstructured text entries, and categorical selections. A Human Factors approach to classifier development prioritizes the evaluation of distinct data features and entry methods to improve modeling. Findings, including the performance of tested models, led to recommendations for the design of textual data collection systems and classification approaches.

Download Full-text

Generating real-world evidence from unstructured clinical notes to examine clinical utility of genetic tests: use case in BRCAness

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-020-01364-y ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Yiqing Zhao ◽

Saravut J. Weroha ◽

Ellen L. Goode ◽

Hongfang Liu ◽

Chen Wang

Keyword(s):

Targeted Therapy ◽

Data Quality ◽

Real World ◽

Genetic Information ◽

Genetic Data ◽

Real World Data ◽

Rule Based ◽

Clinical Notes ◽

Real World Evidence ◽

F Measure

Abstract Background Next-generation sequencing provides comprehensive information about individuals’ genetic makeup and is commonplace in oncology clinical practice. However, the utility of genetic information in the clinical decision-making process has not been examined extensively from a real-world, data-driven perspective. Through mining real-world data (RWD) from clinical notes, we could extract patients’ genetic information and further associate treatment decisions with genetic information. Methods We proposed a real-world evidence (RWE) study framework that incorporates context-based natural language processing (NLP) methods and data quality examination before final association analysis. The framework was demonstrated in a Foundation-tested women cancer cohort (N = 196). Upon retrieval of patients’ genetic information using NLP system, we assessed the completeness of genetic data captured in unstructured clinical notes according to a genetic data-model. We examined the distribution of different topics regarding BRCA1/2 throughout patients’ treatment process, and then analyzed the association between BRCA1/2 mutation status and the discussion/prescription of targeted therapy. Results We identified seven topics in the clinical context of genetic mentions including: Information, Evaluation, Insurance, Order, Negative, Positive, and Variants of unknown significance. Our rule-based system achieved a precision of 0.87, recall of 0.93 and F-measure of 0.91. Our machine learning system achieved a precision of 0.901, recall of 0.899 and F-measure of 0.9 for four-topic classification and a precision of 0.833, recall of 0.823 and F-measure of 0.82 for seven-topic classification. We found in result-containing sentences, the capture of BRCA1/2 mutation information was 75%, but detailed variant information (e.g. variant types) is largely missing. Using cleaned RWD, significant associations were found between BRCA1/2 positive mutation and targeted therapies. Conclusions In conclusion, we demonstrated a framework to generate RWE using RWD from different clinical sources. Rule-based NLP system achieved the best performance for resolving contextual variability when extracting RWD from unstructured clinical notes. Data quality issues such as incompleteness and discrepancies exist thus manual data cleaning is needed before further analysis can be performed. Finally, we were able to use cleaned RWD to evaluate the real-world utility of genetic information to initiate a prescription of targeted therapy.

Download Full-text

An algorithm for rule-based layout pattern matching

Proceedings of the 39th International Conference on Computer-Aided Design ◽

10.1145/3400302.3415606 ◽

2020 ◽

Author(s):

Sheng-Hao Wang ◽

Yen-Jong Chen ◽

Ting-Chi Wang ◽

Oscar Chen

Keyword(s):

Pattern Matching ◽

Rule Based

Download Full-text