scholarly journals EVALUATING UNIFORM MANIFOLD APPROXIMATION AND PROJECTION FOR DIMENSION REDUCTION AND VISUALIZATION OF POLINSAR FEATURES

Author(s):  
S. Schmitz ◽  
U. Weidner ◽  
H. Hammer ◽  
A. Thiele

Abstract. In this paper, the nonlinear dimension reduction algorithm Uniform Manifold Approximation and Projection (UMAP) is investigated to visualize information contained in high dimensional feature representations of Polarimetric Interferometric Synthetic Aperture Radar (PolInSAR) data. Based on polarimetric parameters, target decomposition methods and interferometric coherences a wide range of features is extracted that spans the high dimensional feature space. UMAP is applied to determine a representation of the data in 2D and 3D euclidean space, preserving local and global structures of the data and still suited for classification. The performance of UMAP in terms of generating expressive visualizations is evaluated on PolInSAR data acquired by the F-SAR sensor and compared to that of Principal Component Analysis (PCA), Laplacian Eigenmaps (LE) and t-distributed Stochastic Neighbor embedding (t-SNE). For this purpose, a visual analysis of 2D embeddings is performed. In addition, a quantitative analysis is provided for evaluating the preservation of information in low dimensional representations with respect to separability of different land cover classes. The results show that UMAP exceeds the capability of PCA and LE in these regards and is competitive with t-SNE.

Mathematics ◽  
2021 ◽  
Vol 9 (16) ◽  
pp. 1985
Author(s):  
Yaohui Li ◽  
Junjun Shi ◽  
Zhifeng Yin ◽  
Jingfang Shen ◽  
Yizhong Wu ◽  
...  

The Kriging surrogate model in complex simulation problems uses as few expensive objectives as possible to establish a global or local approximate interpolation. However, due to the inversion of the covariance correlation matrix and the solving of Kriging-related parameters, the Kriging approximation process for high-dimensional problems is time consuming and even impossible to construct. For this reason, a high-dimensional Kriging modeling method through principal component dimension reduction (HDKM-PCDR) is proposed by considering the correlation parameters and the design variables of a Kriging model. It uses PCDR to transform a high-dimensional correlation parameter vector in Kriging into low-dimensional one, which is used to reconstruct a new correlation function. In this way, time consumption of correlation parameter optimization and correlation function matrix construction in the Kriging modeling process is greatly reduced. Compared with the original Kriging method and the high-dimensional Kriging modeling method based on partial least squares, the proposed method can achieve faster modeling efficiency under the premise of meeting certain accuracy requirements.


2021 ◽  
Vol 50 (1) ◽  
pp. 138-152
Author(s):  
Mujeeb Ur Rehman ◽  
Dost Muhammad Khan

Recently, anomaly detection has acquired a realistic response from data mining scientists as a graph of its reputation has increased smoothly in various practical domains like product marketing, fraud detection, medical diagnosis, fault detection and so many other fields. High dimensional data subjected to outlier detection poses exceptional challenges for data mining experts and it is because of natural problems of the curse of dimensionality and resemblance of distant and adjoining points. Traditional algorithms and techniques were experimented on full feature space regarding outlier detection. Customary methodologies concentrate largely on low dimensional data and hence show ineffectiveness while discovering anomalies in a data set comprised of a high number of dimensions. It becomes a very difficult and tiresome job to dig out anomalies present in high dimensional data set when all subsets of projections need to be explored. All data points in high dimensional data behave like similar observations because of its intrinsic feature i.e., the distance between observations approaches to zero as the number of dimensions extends towards infinity. This research work proposes a novel technique that explores deviation among all data points and embeds its findings inside well established density-based techniques. This is a state of art technique as it gives a new breadth of research towards resolving inherent problems of high dimensional data where outliers reside within clusters having different densities. A high dimensional dataset from UCI Machine Learning Repository is chosen to test the proposed technique and then its results are compared with that of density-based techniques to evaluate its efficiency.


2020 ◽  
pp. 147387162097820
Author(s):  
Haili Zhang ◽  
Pu Wang ◽  
Xuejin Gao ◽  
Yongsheng Qi ◽  
Huihui Gao

T-distributed stochastic neighbor embedding (t-SNE) is an effective visualization method. However, it is non-parametric and cannot be applied to steaming data or online scenarios. Although kernel t-SNE provides an explicit projection from a high-dimensional data space to a low-dimensional feature space, some outliers are not well projected. In this paper, bi-kernel t-SNE is proposed for out-of-sample data visualization. Gaussian kernel matrices of the input and feature spaces are used to approximate the explicit projection. Then principal component analysis is applied to reduce the dimensionality of the feature kernel matrix. Thus, the difference between inliers and outliers is revealed. And any new sample can be well mapped. The performance of the proposed method for out-of-sample projection is tested on several benchmark datasets by comparing it with other state-of-the-art algorithms.


2013 ◽  
Vol 303-306 ◽  
pp. 1101-1104 ◽  
Author(s):  
Yong De Hu ◽  
Jing Chang Pan ◽  
Xin Tan

Kernel entropy component analysis (KECA) reveals the original data’s structure by kernel matrix. This structure is related to the Renyi entropy of the data. KECA maintains the invariance of the original data’s structure by keeping the data’s Renyi entropy unchanged. This paper described the original data by several components on the purpose of dimension reduction. Then the KECA was applied in celestial spectra reduction and was compared with Principal Component Analysis (PCA) and Kernel Principal Component Analysis (KPCA) by experiments. Experimental results show that the KECA is a good method in high-dimensional data reduction.


2014 ◽  
Vol 2014 ◽  
pp. 1-11 ◽  
Author(s):  
Jimmy C. Azar ◽  
Martin Simonsson ◽  
Ewert Bengtsson ◽  
Anders Hast

Comparing staining patterns of paired antibodies designed towards a specific protein but toward different epitopes of the protein provides quality control over the binding and the antibodies’ ability to identify the target protein correctly and exclusively. We present a method for automated quantification of immunostaining patterns for antibodies in breast tissue using the Human Protein Atlas database. In such tissue, dark brown dye 3,3′-diaminobenzidine is used as an antibody-specific stain whereas the blue dye hematoxylin is used as a counterstain. The proposed method is based on clustering and relative scaling of features following principal component analysis. Our method is able (1) to accurately segment and identify staining patterns and quantify the amount of staining and (2) to detect paired antibodies by correlating the segmentation results among different cases. Moreover, the method is simple, operating in a low-dimensional feature space, and computationally efficient which makes it suitable for high-throughput processing of tissue microarrays.


2016 ◽  
Vol 27 (5) ◽  
pp. 1331-1350 ◽  
Author(s):  
Maxime Turgeon ◽  
Karim Oualkacha ◽  
Antonio Ciampi ◽  
Hanane Miftah ◽  
Golsa Dehghan ◽  
...  

The genomics era has led to an increase in the dimensionality of data collected in the investigation of biological questions. In this context, dimension-reduction techniques can be used to summarise high-dimensional signals into low-dimensional ones, to further test for association with one or more covariates of interest. This paper revisits one such approach, previously known as principal component of heritability and renamed here as principal component of explained variance (PCEV). As its name suggests, the PCEV seeks a linear combination of outcomes in an optimal manner, by maximising the proportion of variance explained by one or several covariates of interest. By construction, this method optimises power; however, due to its computational complexity, it has unfortunately received little attention in the past. Here, we propose a general analytical PCEV framework that builds on the assets of the original method, i.e. conceptually simple and free of tuning parameters. Moreover, our framework extends the range of applications of the original procedure by providing a computationally simple strategy for high-dimensional outcomes, along with exact and asymptotic testing procedures that drastically reduce its computational cost. We investigate the merits of the PCEV using an extensive set of simulations. Furthermore, the use of the PCEV approach is illustrated using three examples taken from the fields of epigenetics and brain imaging.


2014 ◽  
Vol 490-491 ◽  
pp. 1293-1297
Author(s):  
Liang Zhu ◽  
Fei Fei Liu ◽  
Wu Chen ◽  
Qing Ma

Top-Nqueries are employed in a wide range of applications to obtain a ranked list of data objects that have the highest aggregate scores over certain attributes. The threshold algorithm (TA) is an important method in many scenarios. However, TA is effective only when the ranking function is monotone and the query point is fixed. In the paper, we propose an approach that alleviates the limitations of TA-like methods for processing top-Nqueries. Based onp-norm distances as ranking functions, our methods utilize the fundamental principle of Functional Analysis so that the candidate tuples of top-Nquery with ap-norm distance can be obtained by the Maximum distance. We conduct extensive experiments to prove the effectiveness and efficiency of our method for both low-dimensional (2, 3 and 4) and high-dimensional (25,50 and 104) data.


2019 ◽  
Vol 9 (18) ◽  
pp. 3723
Author(s):  
Sharif ◽  
Mumtaz ◽  
Shafiq ◽  
Riaz ◽  
Ali ◽  
...  

The rise of social media has led to an increasing online cyber-war via hate and violent comments or speeches, and even slick videos that lead to the promotion of extremism and radicalization. An analysis to sense cyber-extreme content from microblogging sites, specifically Twitter, is a challenging, and an evolving research area since it poses several challenges owing short, noisy, context-dependent, and dynamic nature content. The related tweets were crawled using query words and then carefully labelled into two classes: Extreme (having two sub-classes: pro-Afghanistan government and pro-Taliban) and Neutral. An Exploratory Data Analysis (EDA) using Principal Component Analysis (PCA), was performed for tweets data (having Term Frequency—Inverse Document Frequency (TF-IDF) features) to reduce a high-dimensional data space into a low-dimensional (usually 2-D or 3-D) space. PCA-based visualization has shown better cluster separation between two classes (extreme and neutral), whereas cluster separation, within sub-classes of extreme class, was not clear. The paper also discusses the pros and cons of applying PCA as an EDA in the context of textual data that is usually represented by a high-dimensional feature set. Furthermore, the classification algorithms like naïve Bayes’, K Nearest Neighbors (KNN), random forest, Support Vector Machine (SVM) and ensemble classification methods (with bagging and boosting), etc., were applied with PCA-based reduced features and with a complete set of features (TF-IDF features extracted from n-gram terms in the tweets). The analysis has shown that an SVM demonstrated an average accuracy of 84% compared with other classification models. It is pertinent to mention that this is the novel reported research work in the context of Afghanistan war zone for Twitter content analysis using machine learning methods.


2020 ◽  
Vol 10 (23) ◽  
pp. 8474
Author(s):  
Ibrahim Furkan Ince

Human activity recognition (HAR) has been an active area in computer vision with a broad range of applications, such as education, security surveillance, and healthcare. HAR is a general time series classification problem. LSTMs are widely used for time series classification tasks. However, they work well with high-dimensional feature vectors, which reduce the processing speed of LSTM in real-time applications. Therefore, dimension reduction is required to create low-dimensional feature space. As it is experimented in previous study, LSTM with dimension reduction yielded the worst performance among other classifiers, which are not deep learning methods. Therefore, in this paper, a novel scale and rotation invariant human activity recognition system, which can also work in low dimensional feature space is presented. For this purpose, Kinect depth sensor is employed to obtain skeleton joints. Since angles are used, proposed system is already scale invariant. In order to provide rotation invariance, body relative direction in egocentric coordinates is calculated. The 3D vector between right hip and left hip is used to get the horizontal axis and its cross product with the vertical axis of global coordinate system assumed to be the depth axis of the proposed local coordinate system. Instead of using 3D joint angles, 8 number of limbs and their corresponding 3D angles with X, Y, and Z axes of the proposed coordinate system are compressed with several dimension reduction methods such as averaging filter, Haar wavelet transform (HWT), and discrete cosine transform (DCT) and employed as the feature vector. Finally, extracted features are trained and tested with LSTM (long short-term memory) network, which is an artificial recurrent neural network (RNN) architecture. Experimental and benchmarking results indicate that proposed framework boosts the performance of LSTM by approximately 30% accuracy in low-dimensional feature space.


2014 ◽  
Vol 519-520 ◽  
pp. 661-666
Author(s):  
Qing Zhu ◽  
Jie Zhang

Abstract. This paper proposes an incomplete GEI gait recognition method based on Random Forests. There are numerous methods exist for git recognition,but they all lead to high dimensional feature spaces. To address the problem of high dimensional feature space, we propose the use of the Random Forest algorithm to rank features' importance . In order to efficiently search throughout subspaces, we apply a backward feature elimination search strategy.This demonstrate static areas of a GEI also contain useful information.Then, we project the selected feature to a low-dimensional feature subspace via the newly proposed two-dimensional locality preserving projections (2DLPP) method.Asa sequence,we further improve the discriminative power of the extracted features. Experimental results on the CASIA gait database demonstrate the effectiveness of the proposed method.


Sign in / Sign up

Export Citation Format

Share Document