Linear space string correction algorithm using the Damerau-Levenshtein distance

Chunchun Zhao; Sartaj Sahni

doi:10.1186/s12859-019-3184-8

Linear space string correction algorithm using the Damerau-Levenshtein distance

BMC Bioinformatics ◽

10.1186/s12859-019-3184-8 ◽

2020 ◽

Vol 21 (S1) ◽

Author(s):

Chunchun Zhao ◽

Sartaj Sahni

Keyword(s):

Linear Space ◽

Minimum Cost ◽

Efficient Algorithms ◽

Levenshtein Distance ◽

Correction Algorithm ◽

Similar Region ◽

String Correction ◽

Cache Efficient ◽

Run Time ◽

New Algorithms

Abstract Background The Damerau-Levenshtein (DL) distance metric has been widely used in the biological science. It tries to identify the similar region of DNA,RNA and protein sequences by transforming one sequence to the another using the substitution, insertion, deletion and transposition operations. Lowrance and Wagner have developed an O(mn) time O(mn) space algorithm to find the minimum cost edit sequence between strings of length m and n, respectively. In our previous research, we have developed algorithms that run in O(mn) time using only O(s∗min{m,n}+m+n) space, where s is the size of the alphabet comprising the strings, to compute the DL distance as well as the corresponding edit sequence. These are so far the fastest and most space efficient algorithms. In this paper, we focus on the development of algorithms whose asymptotic space complexity is linear. Results We develop linear space algorithms to compute the Damerau-Levenshtein (DL) distance between two strings and determine the optimal trace (corresponding edit operations.)Extensive experiments conducted on three computational platforms–Xeon E5 2603, I7-x980 and Xeon E5 2695–show that, our algorithms, in addition to using less space, are much faster than earlier algorithms. Conclusion Besides using less space than the previously known algorithms,significant run-time improvement was seen for our new algorithms on all three of our experimental platforms. On all platforms, our linear-space cache-efficient algorithms reduced run time by as much as 56.4% and 57.4% in respect to compute the DL distance and an optimal edit sequences compared to previous algorithms. Our multi-core algorithms reduced the run time by up to 59.3% compared to the best previously known multi-core algorithms.

Download Full-text

Linear Space String Correction Algorithm Using The Damerau-Levenshtein Distance

2018 IEEE 8th International Conference on Computational Advances in Bio and Medical Sciences (ICCABS) ◽

10.1109/iccabs.2018.8541927 ◽

2018 ◽

Author(s):

Chunchun Zhao ◽

Sartaj Sahni

Keyword(s):

Linear Space ◽

Levenshtein Distance ◽

Correction Algorithm ◽

String Correction

Download Full-text

Efficient Algorithms for Identifying Loop Formation and Computing θ Value for Solving Minimum Cost Flow Network Problems

WSEAS TRANSACTIONS ON CIRCUITS AND SYSTEMS ◽

10.37394/23201.2021.20.14 ◽

2021 ◽

Vol 20 ◽

pp. 107-117

Author(s):

TIMOTHY MICHAEL CHÁVEZ ◽

DUC THAI NGUYEN

Keyword(s):

Real Life ◽

Minimum Cost ◽

Simplex Algorithm ◽

Efficient Algorithms ◽

Computer Implementation ◽

Loop Formation ◽

Minimum Cost Flow ◽

Flow Network ◽

Network Problems ◽

Cost Flow

While the minimum cost flow (MCF) problems have been well documented in many publications, due to its broad applications, little or no effort have been devoted to explaining the algorithms for identifying loop formation and computing the θ value needed to solve MCF network problems. This paper proposes efficient algorithms, and MATLAB computer implementation, for solving MCF problems. Several academic and real-life network problems have been solved to validate the proposed algorithms; the numerical results obtained by the developed MCF code have been compared and matched with the built-in MATLAB function Linprog() (Simplex algorithm) for further validation.

Download Full-text

Efficient algorithms for delay-bounded minimum cost path problem in communication networks

Proceedings. Fifth International Conference on High Performance Computing (Cat. No. 98EX238) ◽

10.1109/hipc.1998.737982 ◽

2002 ◽

Cited By ~ 7

Author(s):

G. Kumar ◽

N. Narang ◽

C.P. Ravikumar

Keyword(s):

Communication Networks ◽

Minimum Cost ◽

Efficient Algorithms

Download Full-text

Effectiveness of Implementing Load Balancing via SDN

10.5753/sbrc_estendido.2019.7796 ◽

2019 ◽

Author(s):

Leonardo Aguilar ◽

Daniel Macêdo Batista

Keyword(s):

Load Balancing ◽

Software Defined Networking ◽

Free Software ◽

The Real ◽

Network Applications ◽

Run Time ◽

The Creation ◽

Wide Dissemination ◽

New Algorithms

Software-Defined Networking (SDN) is an architecture that allows the creation, management and customization of the network through programmable switches and centralized controllers via a well-defined protocol. Despite the wide dissemination of general advantages in using SDN, it is always important to evaluate the real advantages considering specific network applications. In line with this, the purpose of this work is to analyze the effectiveness of using SDN for load balancing by developing a balancer, made available as free software, that can execute three different algorithms, giving to the administrator the possibility to choose, at run time, which will be used as well as their configurations, and the possibility to implement new algorithms.

Download Full-text

New and Efficient Algorithms for Producing Frequent Itemsets with the Map-Reduce Framework

Algorithms ◽

10.3390/a11120194 ◽

2018 ◽

Vol 11 (12) ◽

pp. 194

Author(s):

Yaron Gonen ◽

Ehud Gudes ◽

Kirill Kandalov

Keyword(s):

Data Mining ◽

Big Data ◽

Experimental Evaluation ◽

Distributed Databases ◽

Frequent Itemsets ◽

Parallel Architectures ◽

Efficient Algorithms ◽

Map Reduce ◽

Closed Frequent Itemsets ◽

New Algorithms

The Map-Reduce (MR) framework has become a popular framework for developing new parallel algorithms for Big Data. Efficient algorithms for data mining of big data and distributed databases has become an important problem. In this paper we focus on algorithms producing association rules and frequent itemsets. After reviewing the most recent algorithms that perform this task within the MR framework, we present two new algorithms: one algorithm for producing closed frequent itemsets, and the second one for producing frequent itemsets when the database is updated and new data is added to the old database. Both algorithms include novel optimizations which are suitable to the MR framework, as well as to other parallel architectures. A detailed experimental evaluation shows the effectiveness and advantages of the algorithms over existing methods when it comes to large distributed databases.

Download Full-text

Application of new algorithms on asymmetric cascaded multilevel inverter

COMPEL The International Journal for Computation and Mathematics in Electrical and Electronic Engineering ◽

10.1108/compel-02-2020-0082 ◽

2020 ◽

Vol 39 (4) ◽

pp. 943-958

Author(s):

Ashraf Yahya ◽

Syed M. Usman Ali ◽

Muhammad Farhan Khan

Keyword(s):

Minimum Cost ◽

Input Voltage ◽

Symmetric Design ◽

Power Sharing ◽

Multilevel Inverter ◽

Design Parameters ◽

Content Type ◽

Voltage Range ◽

Cascaded Multilevel Inverter ◽

New Algorithms

Purpose Multilevel inverter (MLI) is an established design approach for inverter applications in medium-voltage and high-voltage range of applications. An asymmetric design synthesizes multiple DC input voltage sources of unequal magnitudes to generate a high-quality staircase sinewave comprising a large number of steps or levels. However, the implications of using sources of unequal magnitudes results in the requirements of a large variety of inverter switches and higher magnitudes of the total blocking voltage (TBV) rating of the inverter, which increase the cost. The purpose of this study is to present a solution based on algorithms for establishing DC source magnitudes and other design parameters. Design/methodology/approach The approach used in this study is to develop algorithms that bring an asymmetric cascaded MLI (ACMLI) design close to symmetric design. This approach then reduces the variety of switch ratings and minimizes the TBV of the inverter. Thus, the benefits of both asymmetric design (generation of a large number of voltage levels in the output waveform) and symmetric design (modularity) are achieved. The proposed algorithms can be applied to a number of ACMLI topologies, including classical cascaded H-bridge (CHB). The effectiveness of the proposed algorithms is validated by simulation in Matlab-Simulink and experimental setup. Findings Two new algorithms are proposed that reduce the number of variety of switches to just three. The variety can further be reduced to two under a specified condition. The algorithms are compared with the existing ones, and the results are promising in minimizing the TBV rating of the inverter, which results in cost reduction as well. For a specific case of four CHBs, the proposed Algorithm-1 produced 27% and Algorithm-2 produced 53% higher levels. Moreover, the presented algorithms produced minimum values of the TBV and resulted in minimum cost of inverter. Originality/value The proposed algorithms are novel in structure and have achieved the targeted values of minimized switch variety and reduced TBV ratings. Due to less variety, the inverter achieves a near symmetric design, which enables to attain the added advantages of modularity and reduced difference of power sharing among the DC sources.

Download Full-text

Efficient Algorithms for gcd and Cubic Residuosity in the Ring of Eisenstein Integers

BRICS Report Series ◽

10.7146/brics.v10i8.21779 ◽

2003 ◽

Vol 10 (8) ◽

Cited By ~ 1

Author(s):

Ivan B. Damgård ◽

Gudmund Skovbjerg Frandsen

Keyword(s):

Fast Algorithms ◽

Cryptographic Protocols ◽

Efficient Algorithms ◽

Jacobi Symbol ◽

Gaussian Integers ◽

Root Of Unity ◽

New Algorithms

We present simple and efficient algorithms for computing gcd and cubic residuosity in the ring of Eisenstein integers, Z[zeta] , i.e. the integers extended with zeta , a complex primitive third root of unity. The algorithms are similar and may be seen as generalisations of the binary integer gcd and derived Jacobi symbol algorithms. Our algorithms take time O(n^2) for n bit input. This is an improvement from the known results based on the Euclidian algorithm, and taking time O(n· M(n)), where M(n) denotes the complexity of multiplying n bit integers. The new algorithms have applications in practical primality tests and the implementation of cryptographic protocols. The technique underlying our algorithms can be used to obtain equally fast algorithms for gcd and quartic residuosity in the ring of Gaussian integers, Z[i].

Download Full-text

Efficient algorithms for minimum-cost flow problems with piecewise-linear convex costs

Algorithmica ◽

10.1007/bf01240736 ◽

1994 ◽

Vol 11 (3) ◽

pp. 256-277 ◽

Cited By ~ 4

Author(s):

Yaron Pinto ◽

Ron Shamir

Keyword(s):

Piecewise Linear ◽

Minimum Cost ◽

Efficient Algorithms ◽

Minimum Cost Flow ◽

Flow Problems ◽

Cost Flow ◽

Convex Costs ◽

Minimum Cost Flow Problems

Download Full-text

String correction using the Damerau-Levenshtein distance

BMC Bioinformatics ◽

10.1186/s12859-019-2819-0 ◽

2019 ◽

Vol 20 (S11) ◽

Cited By ~ 3

Author(s):

Chunchun Zhao ◽

Sartaj Sahni

Keyword(s):

Levenshtein Distance ◽

String Correction

Download Full-text

Towards a theory of cache-efficient algorithms

Journal of the ACM ◽

10.1145/602220.602225 ◽

2002 ◽

Vol 49 (6) ◽

pp. 828-858 ◽

Cited By ~ 30

Author(s):

Sandeep Sen ◽

Siddhartha Chatterjee ◽

Neeraj Dumir

Keyword(s):

Efficient Algorithms ◽

Cache Efficient

Download Full-text