similarity function
Recently Published Documents


TOTAL DOCUMENTS

231
(FIVE YEARS 68)

H-INDEX

19
(FIVE YEARS 3)

Author(s):  
Bruno Ordozgoiti ◽  
Ananth Mahadevan ◽  
Antonis Matakos ◽  
Aristides Gionis

AbstractWhen searching for information in a data collection, we are often interested not only in finding relevant items, but also in assembling a diverse set, so as to explore different concepts that are present in the data. This problem has been researched extensively. However, finding a set of items with minimal pairwise similarities can be computationally challenging, and most existing works striving for quality guarantees assume that item relatedness is measured by a distance function. Given the widespread use of similarity functions in many domains, we believe this to be an important gap in the literature. In this paper we study the problem of finding a diverse set of items, when item relatedness is measured by a similarity function. We formulate the diversification task using a flexible, broadly applicable minimization objective, consisting of the sum of pairwise similarities of the selected items and a relevance penalty term. To find good solutions we adopt a randomized rounding strategy, which is challenging to analyze because of the cardinality constraint present in our formulation. Even though this obstacle can be overcome using dependent rounding, we show that it is possible to obtain provably good solutions using an independent approach, which is faster, simpler to implement and completely parallelizable. Our analysis relies on a novel bound for the ratio of Poisson-Binomial densities, which is of independent interest and has potential implications for other combinatorial-optimization problems. We leverage this result to design an efficient randomized algorithm that provides a lower-order additive approximation guarantee. We validate our method using several benchmark datasets, and show that it consistently outperforms the greedy approaches that are commonly used in the literature.


Author(s):  
Shi-Wei Ren

In this paper, the geometric structures and the melting-like processes of the 13-atom pure copper, pure cobalt cluster and their 13-atom mixed clusters are investigated and compared by the molecular dynamics method. The calculation shows that the pure copper and cobalt clusters have the standard icosahedral structures and the mixed clusters take on the deformed icosahedral structures. The quantitative analysis shows that the deformations are slight. Moreover, an element similarity function is introduced by which the contribution of the compositions of the clusters to the deformation of the mixed clusters is analyzed and discussed. With the increase of the temperature, the migrating and recombination of the atoms on the surface of the clusters are observed, indicating the starting of the transition from solid-like to liquid-like state for the clusters. Through the calculating of the relative root-mean-squared pair separation fluctuation and monitoring the dynamical structures of the clusters, it is found that the mixed clusters experience a multi-step process in the transition.


2021 ◽  
Vol 11 (24) ◽  
pp. 11974
Author(s):  
Shijie Zhang ◽  
Gang Wu

Logs, recording the system runtime information, are frequently used to ensure software system reliability. As the first and foremost step of typical log analysis, many data-driven methods have been proposed for automated log parsing. Most existing log parsers work offline, requiring a time-consuming training progress and retraining as the system upgrades. Meanwhile, the state of the art online log parsers are tree-based, which still have defects in robustness and efficiency. To overcome such limitations, we abandon the tree structure and propose a hash-like method. In this paper, we propose LogPunk, an efficient online log parsing method. The core of LogPunk is a novel log signature method based on log punctuations and length features. According to the signature, we can quickly find a small set of candidate templates. Further, the most suitable template is returned by traversing the candidate set with our log similarity function. We evaluated LogPunk on 16 public datasets from the LogHub comparing with five other log parsers. LogPunk achieves the best parsing accuracy of 91.9%. Evaluation results also demonstrate its superiority in terms of robustness and efficiency.


Author(s):  
Xun Han ◽  
Min Liu ◽  
Zhong Huang ◽  
Hui Huang ◽  
Xinping Long ◽  
...  

2021 ◽  
pp. 107863
Author(s):  
Marcelo B.A. Veras ◽  
Bishnu Sarker ◽  
Sabeur Aridhi ◽  
João P.P. Gomes ◽  
José A.F. Macêdo ◽  
...  

Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-15
Author(s):  
Chuanting Zhang ◽  
Ke-Ke Shang ◽  
Jingping Qiao

Link prediction is a fundamental problem of data science, which usually calls for unfolding the mechanisms that govern the micro-dynamics of networks. In this regard, using features obtained from network embedding for predicting links has drawn widespread attention. Although methods based on edge features or node similarity have been proposed to solve the link prediction problem, many technical challenges still exist due to the unique structural properties of networks, especially when the networks are sparse. From the graph mining perspective, we first give empirical evidence of the inconsistency between heuristic and learned edge features. Then, we propose a novel link prediction framework, AdaSim, by introducing an Adaptive Similarity function using features obtained from network embedding based on random walks. The node feature representations are obtained by optimizing a graph-based objective function. Instead of generating edge features using binary operators, we perform link prediction solely leveraging the node features of the network. We define a flexible similarity function with one tunable parameter, which serves as a penalty of the original similarity measure. The optimal value is learned through supervised learning and thus is adaptive to data distribution. To evaluate the performance of our proposed algorithm, we conduct extensive experiments on eleven disparate networks of the real world. Experimental results show that AdaSim achieves better performance than state-of-the-art algorithms and is robust to different sparsities of the networks.


Author(s):  
Chaoxun Hang ◽  
Holly J. Oldroyd ◽  
Marco G. Giometto ◽  
Eric R. Pardyjak ◽  
Marc B. Parlange

2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Fu Wei

Aiming at the problem of difficult selection of physical education online course resources, a method of recommending online course resources based on machine learning algorithms is proposed. The information recommendation model is established through the expression of a collaborative filtering algorithm and resource feedback matrix. According to the feedback score of any user on the same data resource in the project set, the interest matching degree is established by comparative analysis, and the matching degree is substituted into the cosine similarity function to calculate the similarity threshold between each item and so on, calculate the similarity threshold number of all items, select the project resource that best matches the user according to the threshold number, and complete the recommendation. The experimental results show that the recommended method of physical education network curriculum resources based on machine learning algorithm is relatively excellent in recommendation accuracy and efficiency; this method can realize the innovation of higher physical education network curriculum teaching mode.


2021 ◽  
Author(s):  
Chance Michael Nowak ◽  
Tyler Quarton ◽  
Leonidas Bleris

Cell cycle synchronization has been pivotal in the development of our understanding of cell population dynamics. Intriguingly, when cells are released from a synchronized state, they do not maintain synchronized cell division and rapidly become asynchronous. Here, using a combination of experiments and model simulations, we investigate this process of "cell cycle desynchronization" in cervical cancer cells (HeLa) that are arrested at the G1/S boundary. We tracked DNA content overtime at regular intervals to monitor cell cycle progression and developed a custom auto-similarity function to quantify the convergence to asynchronicity. In parallel, using experimental data, we developed a single-cell phenomenological model that returns DNA concentration across the cell cycle stages from a desynchronizing cell population. Our simulations revealed that desynchronization is primarily sensitive to cell cycle variability. We tested this prediction by introducing lipopolysaccharide to increase cellular noise, which resulted in greater cell cycle variability with an enhanced rate of desynchronization. Our results show that the desynchronization rate of cell populations can be used a proxy of the degree of variance in cell cycle periodicity.


Author(s):  
Veronica dos Santos ◽  
Sérgio Lifschitz

Information Retrieval Systems usually employ syntactic search techniques to match a set of keywords with the indexed content to retrieve results. But pure keyword-based matching lacks on capturing user's search intention and context and suffers of natural language ambiguity and vocabulary mismatch. Considering this scenario, the hypothesis raised is that the use of embeddings in a semantic search approach will make search results more meaningfully. Embeddings allow to minimize problems arising from terminology and context mismatch. This work proposes a semantic similarity function to support semantic search based on hyper relational knowledge graphs. This function uses embeddings in order to find the most similar nodes that satisfy a user query.


Sign in / Sign up

Export Citation Format

Share Document