Efficient similarity search in large databases of tree structured objects

Image Similarity Search in Large Databases Using a Fast Machine Learning Approach

Studies in Computational Intelligence - New Directions in Intelligent Interactive Multimedia ◽

10.1007/978-3-540-68127-4_9 ◽

2008 ◽

pp. 85-93

Author(s):

Smiljan Šinjur ◽

Damjan Zazula

Keyword(s):

Machine Learning ◽

Similarity Search ◽

Image Similarity ◽

Learning Approach ◽

Large Databases ◽

Machine Learning Approach

Download Full-text

SketchSort: Fast All Pairs Similarity Search for Large Databases of Molecular Fingerprints

Molecular Informatics ◽

10.1002/minf.201100050 ◽

2011 ◽

Vol 30 (9) ◽

pp. 801-807 ◽

Cited By ~ 11

Author(s):

Yasuo Tabei ◽

Koji Tsuda

Keyword(s):

Similarity Search ◽

Molecular Fingerprints ◽

Large Databases

Download Full-text

Content-based video copy detection in large databases: a local fingerprints statistical similarity search approach

IEEE International Conference on Image Processing 2005 ◽

10.1109/icip.2005.1529798 ◽

2005 ◽

Cited By ~ 15

Author(s):

A. Joly ◽

C. Frelicot ◽

O. Buisson

Keyword(s):

Similarity Search ◽

Video Copy Detection ◽

Copy Detection ◽

Large Databases ◽

Statistical Similarity ◽

Search Approach

Download Full-text

RAFTS3: Rapid Alignment-Free Tool for Sequence Similarity Search

10.1101/055269 ◽

2016 ◽

Cited By ~ 11

Author(s):

Ricardo Assunção Vialle ◽

Fábio de Oliveira Pedrosa ◽

Vinicius Almir Weiss ◽

Dieval Guizelini ◽

Juliana Helena Tibaes ◽

...

Keyword(s):

Similarity Search ◽

Sequence Similarity ◽

Biological Data ◽

Amino Acid Residues ◽

Binary Matrix ◽

Protein Databases ◽

Alignment Free ◽

Large Databases ◽

Search Tool ◽

Time Required

AbstractBackgroundSimilarity search of a given protein sequence against a database is an essential task in genome analysis. Sequence alignment is the most used method to perform such analysis. Although this approach is efficient, the time required to perform searches against large databases is always a challenge. Alignment-free techniques offer alternatives to comparing sequences without the need of alignment.ResultsHere We developed RAFTS3, a fast protein similarity search tool that utilizes a filter step for candidate selection based on shared k-mers and a comparison measure using a binary matrix of co-occurrence of amino acid residues. RAFTS3performed searches many times faster than those with BLASTp against large protein databases, such as NR, Pfam or UniRef, with a small loss of sensitivity depending on the similarity degree of the sequences.ConclusionsRAFTS3 is a new alternative for fast comparison of proteinsequences genome annotation and biological data mining. The source code and the standalone files for Windows and Linux platform are available at: https://sourceforge.net/projects/rafts3/

Download Full-text

A non-linear dimensionality-reduction technique for fast similarity search in large databases

Proceedings of the 2006 ACM SIGMOD international conference on Management of data - SIGMOD '06 ◽

10.1145/1142473.1142532 ◽

2006 ◽

Cited By ~ 14

Author(s):

Khanh Vu ◽

Kien A. Hua ◽

Hao Cheng ◽

Sheau-Dong Lang

Keyword(s):

Dimensionality Reduction ◽

Similarity Search ◽

Reduction Technique ◽

Large Databases ◽

Dimensionality Reduction Technique ◽

Non Linear ◽

Linear Dimensionality Reduction

Download Full-text

Molecular-level similarity search brings computing to DNA data storage

Nature Communications ◽

10.1038/s41467-021-24991-z ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Callista Bee ◽

Yuan-Jyue Chen ◽

Melissa Queen ◽

David Ward ◽

Xiaomeng Liu ◽

...

Keyword(s):

Data Storage ◽

Similarity Search ◽

Molecular Mechanisms ◽

Error Correcting Codes ◽

Synthetic Dna ◽

Digital Storage ◽

Hybridization Probes ◽

Large Databases ◽

Similar Images ◽

Storage Technologies

AbstractAs global demand for digital storage capacity grows, storage technologies based on synthetic DNA have emerged as a dense and durable alternative to traditional media. Existing approaches leverage robust error correcting codes and precise molecular mechanisms to reliably retrieve specific files from large databases. Typically, files are retrieved using a pre-specified key, analogous to a filename. However, these approaches lack the ability to perform more complex computations over the stored data, such as similarity search: e.g., finding images that look similar to an image of interest without prior knowledge of their file names. Here we demonstrate a technique for executing similarity search over a DNA-based database of 1.6 million images. Queries are implemented as hybridization probes, and a key step in our approach was to learn an image-to-sequence encoding ensuring that queries preferentially bind to targets representing visually similar images. Experimental results show that our molecular implementation performs comparably to state-of-the-art in silico algorithms for similarity search.

Download Full-text

Efficient Similarity Search for Hierarchical Data in Large Databases

Advances in Database Technology - EDBT 2004 - Lecture Notes in Computer Science ◽

10.1007/978-3-540-24741-8_39 ◽

2004 ◽

pp. 676-693 ◽

Cited By ~ 30

Author(s):

Karin Kailing ◽

Hans-Peter Kriegel ◽

Stefan Schönauer ◽

Thomas Seidl

Keyword(s):

Similarity Search ◽

Hierarchical Data ◽

Large Databases

Download Full-text

Adaptive similarity search in large databases - application to image/video copy detection

2008 International Workshop on Content-Based Multimedia Indexing ◽

10.1109/cbmi.2008.4564988 ◽

2008 ◽

Cited By ~ 3

Author(s):

Nicolas Gengembre ◽

Sid-Ahmed Berrani ◽

Patrick Lechat

Keyword(s):

Similarity Search ◽

Video Copy Detection ◽

Copy Detection ◽

Large Databases

Download Full-text

Converging Evidence for Developmental Trauma Disorder: Empirical Support From Large Databases

PsycEXTRA Dataset ◽

10.1037/e517302011-231 ◽

2008 ◽

Author(s):

Bradley C. Stolbach ◽

Frank Putnam ◽

Melissa Perry ◽

Karen Putnam ◽

William Harris ◽

...

Keyword(s):

Empirical Support ◽

Developmental Trauma ◽

Developmental Trauma Disorder ◽

Large Databases

Download Full-text

ClusPhylo: Spark Based Fast and Reliable Approach for Reconstruction of Phylogenetic Network Using Large Databases

Journal of Biology and Today s World ◽

10.15412/j.jbtw.01060602 ◽

2017 ◽

Vol 6 (6) ◽

Author(s):

S. Malik ◽

S. Khatri ◽

D. Sharma

Keyword(s):

Phylogenetic Network ◽

Large Databases

Download Full-text