nonlinear dimension reduction
Recently Published Documents


TOTAL DOCUMENTS

57
(FIVE YEARS 12)

H-INDEX

12
(FIVE YEARS 1)

2021 ◽  
Vol 2021 ◽  
pp. 1-15
Author(s):  
Sajjad Jahanbakhsh Gudakahriz ◽  
Amir Masoud Eftekhari Moghadam ◽  
Fariborz Mahmoudi

Nowadays, opinion texts are quickly published on websites and social networks by various users in the form of short texts and also in high volumes and various fields. Because these texts reflect the opinions of many users, their processing and analysis, such as clustering, can be very useful in a variety of applications including politics, industry, commerce, and economics. High dimensions of the text representation decrease efficiency of clustering, and an effective solution for this challenge is reducing dimensions of texts. Manifold learning is a powerful tool for nonlinear dimension reduction of high-dimensional data. Therefore, in this paper, for increasing efficiency of opinion texts clustering, by manifold learning, dimensions of the represented opinion texts are reduced based on sentiment and semantics, and their intrinsic dimensions are extracted. Then, the clustering algorithm is applied to dimension-reduced opinion texts. The proposed approach helps us to cluster opinion texts with simultaneous consideration of sentiment and semantics, which has received very little attention in the previous works. This type of clustering helps users of opinion texts to obtain more useful information from texts and also provides more accurate summaries in applications, such as the summarization of opinion texts. Experimental results on three datasets show better performance of the proposed approach on opinion texts in terms of important measures for evaluating clustering efficiency. An improvement of about 9% is observed in terms of accuracy on the third dataset and clustering based on sentiment and semantics.


2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Xiao Liao ◽  
WeiJia Wang ◽  
Wei Wang ◽  
Chong Liang

Image matching is a method of matching by analyzing the gray scale and texture information of the reference image and the image to be matched. Firstly, the scale invariant feature transform (SIFT) algorithm has long descriptor time and poor real time, a nonlinear dimension reduction method (LLE) based on local linear embedding is proposed to preserve the nonlinear information in the original data space as much as possible, shorten the running time of the algorithm, and improve the matching accuracy. Second, aiming at the problem that the Euclidean distance takes a large amount of calculation in the matching process, Manhattan distance is proposed to calculate the similarity between the reference image and the image to be matched, so as to further reduce the algorithm time. Through the improved LLE-SIFT algorithm, experimental results show that the algorithm has a high matching rate and improves the matching speed.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Miao Zhang ◽  
Yiwen Liu ◽  
Hua Zhou ◽  
Joseph Watkins ◽  
Jin Zhou

Abstract Background Low-depth sequencing allows researchers to increase sample size at the expense of lower accuracy. To incorporate uncertainties while maintaining statistical power, we introduce to analyze population structure of low-depth sequencing data. Results The method optimizes the choice of nonlinear transformations of dosages to maximize the Ky Fan norm of the covariance matrix. The transformation incorporates the uncertainty in calling between heterozygotes and the common homozygotes for loci having a rare allele and is more linear when both variants are common. Conclusions We apply to samples from two indigenous Siberian populations and reveal hidden population structure accurately using only a single chromosome. The package is available on https://github.com/yiwenstat/MCPCA_PopGen.


2021 ◽  
Vol 17 (3) ◽  
pp. e1008741
Author(s):  
Ya-Wei Eileen Lin ◽  
Tal Shnitzer ◽  
Ronen Talmon ◽  
Franz Villarroel-Espindola ◽  
Shruti Desai ◽  
...  

Imaging Mass Cytometry (IMC) combines laser ablation and mass spectrometry to quantitate metal-conjugated primary antibodies incubated in intact tumor tissue slides. This strategy allows spatially-resolved multiplexing of dozens of simultaneous protein targets with 1μm resolution. Each slide is a spatial assay consisting of high-dimensional multivariate observations (m-dimensional feature space) collected at different spatial positions and capturing data from a single biological sample or even representative spots from multiple samples when using tissue microarrays. Often, each of these spatial assays could be characterized by several regions of interest (ROIs). To extract meaningful information from the multi-dimensional observations recorded at different ROIs across different assays, we propose to analyze such datasets using a two-step graph-based approach. We first construct for each ROI a graph representing the interactions between the m covariates and compute an m dimensional vector characterizing the steady state distribution among features. We then use all these m-dimensional vectors to construct a graph between the ROIs from all assays. This second graph is subjected to a nonlinear dimension reduction analysis, retrieving the intrinsic geometric representation of the ROIs. Such a representation provides the foundation for efficient and accurate organization of the different ROIs that correlates with their phenotypes. Theoretically, we show that when the ROIs have a particular bi-modal distribution, the new representation gives rise to a better distinction between the two modalities compared to the maximum a posteriori (MAP) estimator. We applied our method to predict the sensitivity to PD-1 axis blockers treatment of lung cancer subjects based on IMC data, achieving 97.3% average accuracy on two IMC datasets. This serves as empirical evidence that the graph of graphs approach enables us to integrate multiple ROIs and the intra-relationships between the features at each ROI, giving rise to an informative representation that is strongly associated with the phenotypic state of the entire image.


Author(s):  
Ahmed Lasisi ◽  
Nii Attoh-Okine

Track Geometry parameters from rail track inspection are regulated within unique safety limits for different track classes. This paper focuses on developing an index that combines safety and track quality because of the inefficiency of having corrective maintenance activities between routine maintenance cycles when federal geometry limits are violated. This combination is achievable by summarizing multivariate track geometry parameters, as an improvement to previous linear approaches to address the problem of inefficient track geometry maintenance programs. The use of nonlinear dimension reduction (T-Stochastic Neighbor Embedding-T-SNE) for Hybrid Track Quality Index development, and the influence of time-based parameters on track quality is evaluated in this study. Results show that probability of geometry defects are correlated with principal components but T-SNE had the best prediction on train-test splits despite its poor performance on a blind validation set. The absence of observable correlation between track geometry and acceleration data calls for further investigation.


2020 ◽  
Vol 1 (3) ◽  
Author(s):  
Mahwish Yousaf ◽  
Tanzeel U. Rehman ◽  
Li Jing

Sign in / Sign up

Export Citation Format

Share Document