Distance-based pattern matching of DNA sequences for evaluating primary mutation

Author(s):  
Berlian Al Kindhi ◽  
Muhammad Afif Hendrawan ◽  
Diana Purwitasari ◽  
Tri Arief Sardjono ◽  
Mauridhi Hery Purnomo
2017 ◽  
Vol 80 ◽  
pp. 162-170 ◽  
Author(s):  
Muhammad Tahir ◽  
Muhammad Sardaraz ◽  
Ataul Aziz Ikram

2014 ◽  
Vol 53 ◽  
Author(s):  
Loek Cleophas ◽  
Derrick G. Kourie ◽  
Bruce W. Watson

In indexing of, and pattern matching on, DNA and text sequences, it is often important to represent all factors of a sequence. One efficient, compact representation is the factor oracle (FO). At the same time, any classical deterministic finite automata (DFA) can be transformed to a so-called failure one (FDFA), which may use failure transitions to replace multiple symbol transitions, potentially yielding a more compact representation. We combine the two ideas and directly construct a failure factor oracle (FFO) from a given sequence, in contrast to ex post facto transformation to an FDFA. The algorithm is suitable for both short and long sequences. We empirically compared the resulting FFOs and FOs on number of transitions for many DNA sequences of lengths 4 − 512, showing gains of up to 10% in total number of transitions, with failure transitions also taking up less space than symbol transitions. The resulting FFOs can be used for indexing, as well as in a variant of the FO-using backward oracle matching algorithm. We discuss and classify this pattern matching algorithm in terms of the keyword pattern matching taxonomies of Watson, Cleophas and Zwaan. We also empirically compared the use of FOs and FFOs in such backward reading pattern matching algorithms, using both DNA and natural language (English) data sets. The results indicate that the decrease in pattern matching performance of an algorithm using an FFO instead of an FO may outweigh the gain in representation space by using an FFO instead of an FO.


2007 ◽  
Vol 31 (4) ◽  
pp. 247-253 ◽  
Author(s):  
K. Basu ◽  
N. Sriraam ◽  
R. J. A. Richard

2019 ◽  
Vol 2019 ◽  
pp. 1-9 ◽  
Author(s):  
Maleeha Najam ◽  
Raihan Ur Rasool ◽  
Hafiz Farooq Ahmad ◽  
Usman Ashraf ◽  
Asad Waqar Malik

Storing and processing of large DNA sequences has always been a major problem due to increasing volume of DNA sequence data. However, a number of solutions have been proposed but they require significant computation and memory. Therefore, an efficient storage and pattern matching solution is required for DNA sequencing data. Bloom filters (BFs) represent an efficient data structure, which is mostly used in the domain of bioinformatics for classification of DNA sequences. In this paper, we explore more dimensions where BFs can be used other than classification. A proposed solution is based on Multiple Bloom Filters (MBFs) that finds all the locations and number of repetitions of the specified pattern inside a DNA sequence. Both of these factors are extremely important in determining the type and intensity of any disease. This paper serves as a first effort towards optimizing the search for location and frequency of substrings in DNA sequences using MBFs. We expect that further optimizations in the proposed solution can bring remarkable results as this paper presents a proof of concept implementation for a given set of data using proposed MBFs technique. Performance evaluation shows improved accuracy and time efficiency of the proposed approach.


Sign in / Sign up

Export Citation Format

Share Document