scholarly journals Outlier Detection for Pandemic-Related Data Using Compositional Functional Data Analysis

2021 ◽  
pp. 251-266
Author(s):  
Christopher Rieser ◽  
Peter Filzmoser

AbstractWith accurate data, governments can make the most informed decisions to keep people safer through pandemics such as the COVID-19 coronavirus. In such events, data reliability is crucial and therefore outlier detection is an important and even unavoidable issue. Outliers are often considered as the most interesting observations, because the fact that they differ from the data majority may lead to relevant findings in the subject area. Outlier detection has also been addressed in the context of multivariate functional data, thus smooth functions of several characteristics, often derived from measurements at different time points (Hubert et al. in Stat Methods Appl 24(2):177–202, 2015b). Here the underlying data are regarded as compositions, with the compositional parts forming the multivariate information, and thus only relative information in terms of log-ratios between these parts is considered as relevant for the analysis. The multivariate functional data thus have to be derived as smooth functions by utilising this relative information. Subsequently, already established multivariate functional outlier detection procedures can be used, but for interpretation purposes, the functional data need to be presented in an appropriate space. The methodology is illustrated with publicly available data around the COVID-19 pandemic to find countries displaying outlying trends.

2018 ◽  
Vol 2018 ◽  
pp. 1-13 ◽  
Author(s):  
Laura Millán-Roures ◽  
Irene Epifanio ◽  
Vicente Martínez

A functional data analysis (FDA) based methodology for detecting anomalous flows in urban water networks is introduced. Primary hydraulic variables are recorded in real-time by telecontrol systems, so they are functional data (FD). In the first stage, the data are validated (false data are detected) and reconstructed, since there could be not only false data, but also missing and noisy data. FDA tools are used such as tolerance bands for FD and smoothing for dense and sparse FD. In the second stage, functional outlier detection tools are used in two phases. In Phase I, the data are cleared of anomalies to ensure that data are representative of the in-control system. The objective of Phase II is system monitoring. A new functional outlier detection method is also proposed based on archetypal analysis. The methodology is applied and illustrated with real data. A simulated study is also carried out to assess the performance of the outlier detection techniques, including our proposal. The results are very promising.


2020 ◽  
Vol 52 (8) ◽  
pp. 1049-1066
Author(s):  
Peter Filzmoser ◽  
Mariella Gregorich

AbstractOutliers are encountered in all practical situations of data analysis, regardless of the discipline of application. However, the term outlier is not uniformly defined across all these fields since the differentiation between regular and irregular behaviour is naturally embedded in the subject area under consideration. Generalized approaches for outlier identification have to be modified to allow the diligent search for potential outliers. Therefore, an overview of different techniques for multivariate outlier detection is presented within the scope of selected kinds of data frequently found in the field of geosciences. In particular, three common types of data in geological studies are explored: spatial, compositional and flat data. All of these formats motivate new outlier concepts, such as local outlyingness, where the spatial information of the data is used to define a neighbourhood structure. Another type are compositional data, which nicely illustrate the fact that some kinds of data require not only adaptations to standard outlier approaches, but also transformations of the data itself before conducting the outlier search. Finally, the very recently developed concept of cellwise outlyingness, typically used for high-dimensional data, allows one to identify atypical cells in a data matrix. In practice, the different data formats can be mixed, and it is demonstrated in various examples how to proceed in such situations.


2020 ◽  
Author(s):  
Sokhna DIENG ◽  
Pierre Michel ◽  
Abdoulaye Guindo ◽  
Kankoe Sallah ◽  
El-hadj Ba ◽  
...  

Abstract Background Effective targeting of malaria control in low transmission areas requires identification of transmission foci or hotspots. We investigated the use of functional data analysis to identify and describe spatio-temporal pattern of malaria incidence in an area with seasonal transmission in west-central Senegal. Method Malaria surveillance was maintained over 5 years from 2008 to 2012 at health facilities serving a population of 500,000 in 575 villages in two health districts in Senegal. Smooth functions were fitted from the time series of malaria incidence for each village, using cubic B-spline basis functions. The resulting smooth functions for each village were classified using hierarchical clustering (Ward’s method), using several different dissimilarity measures. The optimal number of clusters was then determined based on four cluster validity indices, to determine the main types of distinct temporal pattern of malaria incidence. Epidemiological indicators characterizing the resulting malaria incidence pattern in terms of the timing of seasonal outbreaks, were calculated based on the slope (velocity) and rate of change of the slope (acceleration) of the incidence over time. Results Three distinct patterns of malaria incidence were identified. A pattern characterized by high incidence, in 12/575 (2%) villages, with average incidence of 114 cases/1000 person-years over the 5 year study period; a pattern with intermediate incidence in 97 villages (17%), with average incidence of 13 cases/1000 person-years; and a pattern with low incidence in 466 (81%) villages, with average incidence 2.6 cases/1000 person-years. Epidemiological indicators characterizing the fluctuations in malaria incidence showed that seasonal outbreaks started later, and ended earlier, in the low incidence pattern. Conclusion Functional data analysis can be used to classify communities based on time series of malaria incidence, and to identify high incidence communities. Indicators can be derived from the fitted functions which characterize the timing of outbreaks. These tools may help to better target control measures.


2021 ◽  
Author(s):  
Wenlin Dai ◽  
Stavros Athanasiadis ◽  
Tomáš Mrkvička

Clustering is an essential task in functional data analysis. In this study, we propose a framework for a clustering procedure based on functional rankings or depth. Our methods naturally combine various types of between-cluster variation equally, which caters to various discriminative sources of functional data; for example, they combine raw data with transformed data or various components of multivariate functional data with their covariance. Our methods also enhance the clustering results with a visualization tool that allows intrinsic graphical interpretation. Finally, our methods are model-free and nonparametric and hence are robust to heavy-tailed distribution or potential outliers. The implementation and performance of the proposed methods are illustrated with a simulation study and applied to three real-world applications.


2020 ◽  
Vol 198 ◽  
pp. 105960
Author(s):  
Clément Lejeune ◽  
Josiane Mothe ◽  
Adil Soubki ◽  
Olivier Teste

2020 ◽  
Vol 10 (3) ◽  
pp. 881
Author(s):  
Myeong-Hun Jeong ◽  
Seung-Bae Jeon ◽  
Tae-Young Lee ◽  
Min Kyo Youm ◽  
Dong-Ha Lee

This study provides an automatic shipping-route construction method using functional data analysis (FDA), which analyzes information about curves, such as multiple data points over time. The proposed approach includes two steps: outlier detection and shipping-route construction. This study uses automatic-identification system (AIS) data for the experiments. The effectiveness of the proposed method is demonstrated through case studies, wherein our approach is compared with the Mahalanobis distance method for trajectory-outlier detection, and the performance of vessel trajectory reconstruction is compared with that of a density-based approach. The proposed method improves understanding of vessel-movement dynamics, thereby improving maritime monitoring and security.


2012 ◽  
Vol 2012 ◽  
pp. 1-30 ◽  
Author(s):  
Piotr Kokoszka

This paper reviews recent research on dependent functional data. After providing an introduction to functional data analysis, we focus on two types of dependent functional data structures: time series of curves and spatially distributed curves. We review statistical models, inferential methodology, and possible extensions. The paper is intended to provide a concise introduction to the subject with plentiful references.


2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Mirosław Krzyśko ◽  
Waldemar Wołyński ◽  
Marek Domin ◽  
Zofia Hanusz ◽  
Leszek Rydzak ◽  
...  

The study tested how the cooking process can change the dimensions of rice grains. The impact of set times of cooking or steaming process on the characteristics such as length, width, and height of two varieties of rice, namely, long-grain white and parboiled, was investigated. The measurements of the dimension characteristics obtained at different times of the cooking process were converted to functional data. Different methods of multivariate functional data analysis, namely, functional multivariate analysis of variance, functional discriminant coordinates, and cluster analysis, were applied to discover the differences between the two varieties and the two heat treatment methods.


Author(s):  
Sokhna Dieng ◽  
Pierre Michel ◽  
Abdoulaye Guindo ◽  
Kankoe Sallah ◽  
El-Hadj Ba ◽  
...  

We introduce an approach based on functional data analysis to identify patterns of malaria incidence to guide effective targeting of malaria control in a seasonal transmission area. Using functional data method, a smooth function (functional data or curve) was fitted from the time series of observed malaria incidence for each of 575 villages in west-central Senegal from 2008 to 2012. These 575 smooth functions were classified using hierarchical clustering (Ward’s method), and several different dissimilarity measures. Validity indices were used to determine the number of distinct temporal patterns of malaria incidence. Epidemiological indicators characterizing the resulting malaria incidence patterns were determined from the velocity and acceleration of their incidences over time. We identified three distinct patterns of malaria incidence: high-, intermediate-, and low-incidence patterns in respectively 2% (12/575), 17% (97/575), and 81% (466/575) of villages. Epidemiological indicators characterizing the fluctuations in malaria incidence showed that seasonal outbreaks started later, and ended earlier, in the low-incidence pattern. Functional data analysis can be used to identify patterns of malaria incidence, by considering their temporal dynamics. Epidemiological indicators derived from their velocities and accelerations, may guide to target control measures according to patterns.


Sign in / Sign up

Export Citation Format

Share Document