Data Compression as a Comprehensive Framework for Graph Drawing and Representation Learning

Digging for the truth: the case for active annotation in evaluating the credibility of online medical information (Preprint)

10.2196/preprints.25920 ◽

2020 ◽

Author(s):

Mikołaj Morzy ◽

Bartłomiej Balcerzak ◽

Adam Wierzbicki ◽

Adam Wierzbicki

Keyword(s):

Machine Learning ◽

Medical Information ◽

Representation Learning ◽

Training Dataset ◽

Highly Qualified ◽

Human In The Loop ◽

Annotation Process ◽

Comprehensive Framework ◽

Online Sources ◽

The Web

BACKGROUND With the rapidly accelerating spread of dissemination of false medical information on the Web, the task of establishing the credibility of online sources of medical information becomes a pressing necessity. The sheer number of websites offering questionable medical information presented as reliable and actionable suggestions with possibly harmful effects poses an additional requirement for potential solutions, as they have to scale to the size of the problem. Machine learning is one such solution which, when properly deployed, can be an effective tool in fighting medical disinformation on the Web. OBJECTIVE We present a comprehensive framework for designing and curating of machine learning training datasets for online medical information credibility assessment. We show how the annotation process should be constructed and what pitfalls should be avoided. Our main objective is to provide researchers from medical and computer science communities with guidelines on how to construct datasets for machine learning models for various areas of medical information wars. METHODS The key component of our approach is the active annotation process. We begin by outlining the annotation protocol for the curation of high-quality training dataset, which then can be augmented and rapidly extended by employing the human-in-the-loop paradigm to machine learning training. To circumvent the cold start problem of insufficient gold standard annotations, we propose a pre-processing pipeline consisting of representation learning, clustering, and re-ranking of sentences for the acceleration of the training process and the optimization of human resources involved in the annotation. RESULTS We collect over 10 000 annotations of sentences related to selected subjects (psychiatry, cholesterol, autism, antibiotics, vaccines, steroids, birth methods, food allergy testing) for less than $7 000 employing 9 highly qualified annotators (certified medical professionals) and we release this dataset to the general public. We develop an active annotation framework for more efficient annotation of non-credible medical statements. The results of the qualitative analysis support our claims of the efficacy of the presented method. CONCLUSIONS A set of very diverse incentives is driving the widespread dissemination of medical disinformation on the Web. An effective strategy of countering this spread is to use machine learning for automatically establishing the credibility of online medical information. This, however, requires a thoughtful design of the training pipeline. In this paper we present a comprehensive framework of active annotation. In addition, we publish a large curated dataset of medical statements labelled as credible, non-credible, or neutral.

Download Full-text

Massively Parallel Graph Drawing and Representation Learning

2020 IEEE International Conference on Big Data (Big Data) ◽

10.1109/bigdata50022.2020.9377976 ◽

2020 ◽

Author(s):

Christian Bohm ◽

Claudia Plant

Keyword(s):

Graph Drawing ◽

Representation Learning ◽

Massively Parallel ◽

Parallel Graph

Download Full-text

Applications of Formalized Information Theory

Formalized Probability Theory and Applications Using Theorem Proving ◽

10.4018/978-1-4666-8315-0.ch011 ◽

2015 ◽

pp. 159-178

Keyword(s):

Information Theory ◽

Data Compression ◽

Theorem Proving ◽

Formal Analysis ◽

Theoretic Analysis ◽

Information Theoretic ◽

Wide Range ◽

Comprehensive Framework

In previous chapters, the authors provided a comprehensive framework that can be used in the formal probabilistic and information-theoretic analysis of a wide range of systems and protocols. In this chapter, they illustrate the usefulness of conducting this analysis using theorem proving by tackling a number of applications including a data compression application, the formal analysis of an anonymity-based MIX channel, and the properties of the onetime pad encryption system.

Download Full-text

Digging for the truth: the case for active annotation in evaluating the credibility of online medical information (Preprint)

10.2196/preprints.26065 ◽

2020 ◽

Author(s):

Aleksandra Nabożny ◽

Bartłomiej Balcerzak ◽

Adam Wierzbicki ◽

Mikołaj Morzy

Keyword(s):

Machine Learning ◽

Medical Information ◽

Representation Learning ◽

Training Dataset ◽

Highly Qualified ◽

Human In The Loop ◽

Annotation Process ◽

Comprehensive Framework ◽

Online Sources ◽

The Web

BACKGROUND With the rapidly accelerating spread of dissemination of false medical information on the Web, the task of establishing the credibility of online sources of medical information becomes a pressing necessity. The sheer number of websites offering questionable medical information presented as reliable and actionable suggestions with possibly harmful effects poses an additional requirement for potential solutions, as they have to scale to the size of the problem. Machine learning is one such solution which, when properly deployed, can be an effective tool in fighting medical disinformation on the Web. OBJECTIVE We present a comprehensive framework for designing and curating of machine learning training datasets for online medical information credibility assessment. We show how the annotation process should be constructed and what pitfalls should be avoided. Our main objective is to provide researchers from medical and computer science communities with guidelines on how to construct datasets for machine learning models for various areas of medical information wars. METHODS The key component of our approach is the active annotation process. We begin by outlining the annotation protocol for the curation of high-quality training dataset, which then can be augmented and rapidly extended by employing the human-in-the-loop paradigm to machine learning training. To circumvent the cold start problem of insufficient gold standard annotations, we propose a pre-processing pipeline consisting of representation learning, clustering, and re-ranking of sentences for the acceleration of the training process and the optimization of human resources involved in the annotation. RESULTS We collect over 10 000 annotations of sentences related to selected subjects (psychiatry, cholesterol, autism, antibiotics, vaccines, steroids, birth methods, food allergy testing) for less than $7 000 employing 9 highly qualified annotators (certified medical professionals) and we release this dataset to the general public. We develop an active annotation framework for more efficient annotation of non-credible medical statements. The results of the qualitative analysis support our claims of the efficacy of the presented method. CONCLUSIONS A set of very diverse incentives is driving the widespread dissemination of medical disinformation on the Web. An effective strategy of countering this spread is to use machine learning for automatically establishing the credibility of online medical information. This, however, requires a thoughtful design of the training pipeline. In this paper we present a comprehensive framework of active annotation. In addition, we publish a large curated dataset of medical statements labelled as credible, non-credible, or neutral.

Download Full-text

Effect of data compression on intracardiac bipolar atrial electrograms

EP Europace ◽

10.1016/s1099-5129(01)80354-8 ◽

2001 ◽

Vol 2 ◽

pp. A89-A89

Keyword(s):

Data Compression ◽

Atrial Electrograms

Download Full-text

Data compression

AccessScience ◽

10.1036/1097-8542.757264 ◽

2015 ◽

Keyword(s):

Data Compression

Download Full-text

Erratum: Image data compression, using 2-D lattice modelling method

IEE Proceedings F Communications Radar and Signal Processing ◽

10.1049/ip-f-1.1987.0112 ◽

1987 ◽

Vol 134 (7) ◽

pp. 680

Author(s):

H.K. Kwan ◽

Y.C. Lui

Keyword(s):

Data Compression ◽

Image Data ◽

Modelling Method ◽

Image Data Compression

Download Full-text

Communicating the Value of Design

Conference Proceedings of the Academy for Design Innovation Management ◽

10.33114/adim.2019.04.184 ◽

2019 ◽

Vol 2 (1) ◽

Author(s):

George Edward TORRENS ◽

Nicholas Samuel JOHNSON ◽

Ian STORER

Keyword(s):

Design Decision ◽

Card Sorting ◽

Practical Application ◽

Product Packaging ◽

Sorting Method ◽

Packaging Design ◽

Design Considerations ◽

Modified Delphi ◽

Fast Moving Consumer Goods ◽

Comprehensive Framework

Product packaging design is often produced through the practical application of tacit knowledge, rule of thumb and professional connoisseurship. Stakeholders are becoming increasingly demanding that design practitioners provide clarity of reasoning and accountability for their design proposals. Therefore, a better framework for the design of fast-moving consumer goods (FMCG) is required. This paper proposes a comprehensive taxonomy of ‘design considerations’ to assist the development of low involvement FMCG packaging and aid in rationale communication for design solutions. 302 academic sources were reviewed, inductive content analysis performed to code topics and output validation with academic and industry experts (n=9) through a modified-Delphi card sorting method. The research provides movement towards a comprehensive framework and common dialogue between stakeholders, practitioners and managers to assist in more effectively communicating the value that design can offer to FMCGs. The constructed taxonomy provides a set of 156 ‘design considerations’ to support in objective and informed design decision-making.

Download Full-text