Swarm Intelligence for Dimensionality Reduction

Author(s):  
Andreas Janecek ◽  
Ying Tan

Low-rank approximations allow for compact representations of data with reduced storage and runtime requirements and reduced redundancy and noise. The Non-Negative Matrix Factorization (NMF) is a special low-rank approximation that allows for additive parts-based, interpretable representation of the data. Various properties of NMF are similar to Swarm Intelligence (SI) methods: indeed, most NMF objective functions and most SI fitness functions are non-convex, discontinuous, and may possess many local minima. This chapter summarizes efforts on improving convergence, approximation quality, and classification accuracy of NMF using five different meta-heuristics based on SI and evolutionary computation. The authors present (1) new initialization strategies for NMF, and (2) an iterative update strategy for NMF. The applicability of the approach is illustrated on data sets coming from the areas of spam filtering and email classification. Experimental results show that both optimization strategies are able to improve NMF in terms of faster convergence, lower approximation error, and/or better classification accuracy.

2016 ◽  
pp. 1564-1589 ◽  
Author(s):  
Andreas Janecek ◽  
Ying Tan

Low-rank approximations allow for compact representations of data with reduced storage and runtime requirements and reduced redundancy and noise. The Non-Negative Matrix Factorization (NMF) is a special low-rank approximation that allows for additive parts-based, interpretable representation of the data. Various properties of NMF are similar to Swarm Intelligence (SI) methods: indeed, most NMF objective functions and most SI fitness functions are non-convex, discontinuous, and may possess many local minima. This chapter summarizes efforts on improving convergence, approximation quality, and classification accuracy of NMF using five different meta-heuristics based on SI and evolutionary computation. The authors present (1) new initialization strategies for NMF, and (2) an iterative update strategy for NMF. The applicability of the approach is illustrated on data sets coming from the areas of spam filtering and email classification. Experimental results show that both optimization strategies are able to improve NMF in terms of faster convergence, lower approximation error, and/or better classification accuracy.


2011 ◽  
Vol 2 (4) ◽  
pp. 12-34 ◽  
Author(s):  
Andreas Janecek ◽  
Ying Tan

The Non-negative Matrix Factorization (NMF) is a special low-rank approximation which allows for an additive parts-based and interpretable representation of the data. This article presents efforts to improve the convergence, approximation quality, and classification accuracy of NMF using five different meta-heuristics based on swarm intelligence. Several properties of the NMF objective function motivate the utilization of meta-heuristics: this function is non-convex, discontinuous, and may possess many local minima. The proposed optimization strategies are two-fold: On the one hand, a new initialization strategy for NMF is presented in order to initialize the NMF factors prior to the factorization; on the other hand, an iterative update strategy is proposed, which improves the accuracy per runtime for the multiplicative update NMF algorithm. The success of the proposed optimization strategies are shown by applying them on synthetic data and data sets coming from the areas of spam filtering/email classification, and evaluate them also in their application context. Experimental results show that both optimization strategies are able to improve NMF in terms of faster convergence, lower approximation error, and better classification accuracy. Especially the initialization strategy leads to significant reductions of the runtime per accuracy ratio for both, the NMF approximation as well as the classification results achieved with NMF.


Author(s):  
Andreas Janecek ◽  
Ying Tan

The Non-negative Matrix Factorization (NMF) is a special low-rank approximation which allows for an additive parts-based and interpretable representation of the data. This article presents efforts to improve the convergence, approximation quality, and classification accuracy of NMF using five different meta-heuristics based on swarm intelligence. Several properties of the NMF objective function motivate the utilization of meta-heuristics: this function is non-convex, discontinuous, and may possess many local minima. The proposed optimization strategies are two-fold: On the one hand, a new initialization strategy for NMF is presented in order to initialize the NMF factors prior to the factorization; on the other hand, an iterative update strategy is proposed, which improves the accuracy per runtime for the multiplicative update NMF algorithm. The success of the proposed optimization strategies are shown by applying them on synthetic data and data sets coming from the areas of spam filtering/email classification, and evaluate them also in their application context. Experimental results show that both optimization strategies are able to improve NMF in terms of faster convergence, lower approximation error, and better classification accuracy. Especially the initialization strategy leads to significant reductions of the runtime per accuracy ratio for both, the NMF approximation as well as the classification results achieved with NMF.


2021 ◽  
Vol 15 (4) ◽  
pp. 1731-1750
Author(s):  
Olalekan Babaniyi ◽  
Ruanui Nicholson ◽  
Umberto Villa ◽  
Noémi Petra

Abstract. We consider the problem of inferring the basal sliding coefficient field for an uncertain Stokes ice sheet forward model from synthetic surface velocity measurements. The uncertainty in the forward model stems from unknown (or uncertain) auxiliary parameters (e.g., rheology parameters). This inverse problem is posed within the Bayesian framework, which provides a systematic means of quantifying uncertainty in the solution. To account for the associated model uncertainty (error), we employ the Bayesian approximation error (BAE) approach to approximately premarginalize simultaneously over both the noise in measurements and uncertainty in the forward model. We also carry out approximative posterior uncertainty quantification based on a linearization of the parameter-to-observable map centered at the maximum a posteriori (MAP) basal sliding coefficient estimate, i.e., by taking the Laplace approximation. The MAP estimate is found by minimizing the negative log posterior using an inexact Newton conjugate gradient method. The gradient and Hessian actions to vectors are efficiently computed using adjoints. Sampling from the approximate covariance is made tractable by invoking a low-rank approximation of the data misfit component of the Hessian. We study the performance of the BAE approach in the context of three numerical examples in two and three dimensions. For each example, the basal sliding coefficient field is the parameter of primary interest which we seek to infer, and the rheology parameters (e.g., the flow rate factor or the Glen's flow law exponent coefficient field) represent so-called nuisance (secondary uncertain) parameters. Our results indicate that accounting for model uncertainty stemming from the presence of nuisance parameters is crucial. Namely our findings suggest that using nominal values for these parameters, as is often done in practice, without taking into account the resulting modeling error, can lead to overconfident and heavily biased results. We also show that the BAE approach can be used to account for the additional model uncertainty at no additional cost at the online stage.


2021 ◽  
Vol 47 (3) ◽  
pp. 1-37
Author(s):  
Srinivas Eswar ◽  
Koby Hayashi ◽  
Grey Ballard ◽  
Ramakrishnan Kannan ◽  
Michael A. Matheson ◽  
...  

We consider the problem of low-rank approximation of massive dense nonnegative tensor data, for example, to discover latent patterns in video and imaging applications. As the size of data sets grows, single workstations are hitting bottlenecks in both computation time and available memory. We propose a distributed-memory parallel computing solution to handle massive data sets, loading the input data across the memories of multiple nodes, and performing efficient and scalable parallel algorithms to compute the low-rank approximation. We present a software package called Parallel Low-rank Approximation with Nonnegativity Constraints, which implements our solution and allows for extension in terms of data (dense or sparse, matrices or tensors of any order), algorithm (e.g., from multiplicative updating techniques to alternating direction method of multipliers), and architecture (we exploit GPUs to accelerate the computation in this work). We describe our parallel distributions and algorithms, which are careful to avoid unnecessary communication and computation, show how to extend the software to include new algorithms and/or constraints, and report efficiency and scalability results for both synthetic and real-world data sets.


2012 ◽  
Vol 21 (06) ◽  
pp. 1250033
Author(s):  
MANOLIS G. VOZALIS ◽  
ANGELOS I. MARKOS ◽  
KONSTANTINOS G. MARGARITIS

Collaborative Filtering (CF) is a popular technique employed by Recommender Systems, a term used to describe intelligent methods that generate personalized recommendations. Some of the most efficient approaches to CF are based on latent factor models and nearest neighbor methods, and have received considerable attention in recent literature. Latent factor models can tackle some fundamental challenges of CF, such as data sparsity and scalability. In this work, we present an optimal scaling framework to address these problems using Categorical Principal Component Analysis (CatPCA) for the low-rank approximation of the user-item ratings matrix, followed by a neighborhood formation step. CatPCA is a versatile technique that utilizes an optimal scaling process where original data are transformed so that their overall variance is maximized. We considered both smooth and non-smooth transformations for the observed variables (items), such as numeric, (spline) ordinal, (spline) nominal and multiple nominal. The method was extended to handle missing data and incorporate differential weighting for items. Experiments were executed on three data sets of different sparsity and size, MovieLens 100k, 1M and Jester, aiming to evaluate the aforementioned options in terms of accuracy. A combined approach with a multiple nominal transformation and a "passive" missing data strategy clearly outperformed the other tested options for all three data sets. The results are comparable with those reported for single methods in the CF literature.


2020 ◽  
Vol 36 (36) ◽  
pp. 694-697
Author(s):  
Chi-Kwong Li ◽  
Gilbert Strang

An elementary proof is given for Mirsky's result on best low rank approximations of a given matrix with respect to all unitarily invariant norms.


2020 ◽  
Author(s):  
Olalekan Babaniyi ◽  
Ruanui Nicholson ◽  
Umberto Villa ◽  
Noémi Petra

Abstract. We consider the problem of inferring the basal sliding coefficient field for an uncertain Stokes ice sheet forward model from surface velocity measurements. The uncertainty in the forward model stems from unknown (or uncertain) auxiliary parameters (e.g., rheology parameters). This inverse problem is posed within the Bayesian framework, which provides a systematic means of quantifying uncertainty in the solution. To account for the associated model uncertainty (error), we employ the Bayesian Approximation Error (BAE) approach to approximately premarginalize simultaneously over both the noise in measurements and uncertainty in the forward model. We also carry out approximative posterior uncertainty quantification based on a linearization of the parameter-to-observable map centered at the maximum a posteriori (MAP) basal sliding coefficient estimate, i.e., by taking the Laplace approximation. The MAP estimate is found by minimizing the negative log posterior using an inexact Newton conjugate gradient method. The gradient and Hessian actions to vectors are efficiently computed using adjoints. Sampling from the approximate covariance is made tractable by invoking a low-rank approximation of the data misfit component of the Hessian. We study the performance of the BAE approach in the context of three numerical examples in two and three dimensions. For each example the basal sliding coefficient field is the parameter of primary interest, which we seek to infer, and the rheology parameters (e.g., the flow rate factor, or the Glen's flow law exponent coefficient field) represent so called nuisance (secondary uncertain) parameters. Our results indicate that accounting for model uncertainty stemming from the presence of nuisance parameters is crucial. Namely our findings suggest that using nominal values for these parameters, as is often done in practice, without taking into account the resulting modeling error, can lead to overconfident and heavily biased results. We also show that the BAE approach can be used to account for the additional model uncertainty at no additional cost at the online stage.


10.2196/20597 ◽  
2020 ◽  
Vol 8 (12) ◽  
pp. e20597
Author(s):  
Ki-Hun Kim ◽  
Kwang-Jae Kim

Background A lifelogs-based wellness index (LWI) is a function for calculating wellness scores based on health behavior lifelogs (eg, daily walking steps and sleep times collected via a smartwatch). A wellness score intuitively shows the users of smart wellness services the overall condition of their health behaviors. LWI development includes estimation (ie, estimating coefficients in LWI with data). A panel data set comprising health behavior lifelogs allows LWI estimation to control for unobserved variables, thereby resulting in less bias. However, these data sets typically have missing data due to events that occur in daily life (eg, smart devices stop collecting data when batteries are depleted), which can introduce biases into LWI coefficients. Thus, the appropriate choice of method to handle missing data is important for reducing biases in LWI estimations with panel data. However, there is a lack of research in this area. Objective This study aims to identify a suitable missing-data handling method for LWI estimation with panel data. Methods Listwise deletion, mean imputation, expectation maximization–based multiple imputation, predictive-mean matching–based multiple imputation, k-nearest neighbors–based imputation, and low-rank approximation–based imputation were comparatively evaluated by simulating an existing case of LWI development. A panel data set comprising health behavior lifelogs of 41 college students over 4 weeks was transformed into a reference data set without any missing data. Then, 200 simulated data sets were generated by randomly introducing missing data at proportions from 1% to 80%. The missing-data handling methods were each applied to transform the simulated data sets into complete data sets, and coefficients in a linear LWI were estimated for each complete data set. For each proportion for each method, a bias measure was calculated by comparing the estimated coefficient values with values estimated from the reference data set. Results Methods performed differently depending on the proportion of missing data. For 1% to 30% proportions, low-rank approximation–based imputation, predictive-mean matching–based multiple imputation, and expectation maximization–based multiple imputation were superior. For 31% to 60% proportions, low-rank approximation–based imputation and predictive-mean matching–based multiple imputation performed best. For over 60% proportions, only low-rank approximation–based imputation performed acceptably. Conclusions Low-rank approximation–based imputation was the best of the 6 data-handling methods regardless of the proportion of missing data. This superiority is generalizable to other panel data sets comprising health behavior lifelogs given their verified low-rank nature, for which low-rank approximation–based imputation is known to perform effectively. This result will guide missing-data handling in reducing coefficient biases in new development cases of linear LWIs with panel data.


2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Yong Zeng ◽  
Yixin Li ◽  
Zhongyuan Jiang ◽  
Jianfeng Ma

It is crucial to generate random graphs with specific structural properties from real graphs, which could anonymize graphs or generate targeted graph data sets. The state-of-the-art method called spectral graph forge (SGF) was proposed at INFOCOM 2018. This method uses a low-rank approximation of the matrix by throwing away some spectrums, which provides privacy protection after distributing graphs while ensuring data availability to a certain extent. As shown in SGF, it needs to discard at least 20% spectrum to defend against deanonymous attacks. However, the data availability will be significantly decreased after more spectrum discarding. Thus, is there a way to generate a graph that guarantees maximum spectrum and anonymity at the same time? To solve this problem, this paper proposes graph nonlinear scaling (GNS). We firmly prove that GNS can preserve all eigenvectors meanwhile providing high anonymity for the forged graph. Precisely, the GNS scales the eigenvalues of the original spectrum and constructs the forged graph with scaled eigenvalues and original eigenvectors. This approach maximizes the preservation of spectrum information to guarantee data availability. Meanwhile, it provides high robustness towards deanonymous attacks. The experimental results show that when SGF discards only 10% of the spectrum, the forged graph has high data availability. At this time, if the distance vector deanonymity algorithm is used to attack the forged graph, almost 100% of the nodes can be identified, while when achieving the same availability, only about 20% of the nodes in the forged graph obtained from GNS can be identified. Moreover, our method is better than SGF in capturing the real graph’s structure in terms of modularity, the number of partitions, and average clustering.


Sign in / Sign up

Export Citation Format

Share Document