molecular sequences
Recently Published Documents


TOTAL DOCUMENTS

122
(FIVE YEARS 23)

H-INDEX

28
(FIVE YEARS 3)

Author(s):  
Claire Goodwin ◽  
Judith Brown ◽  
Rachel Downey ◽  
Nhu Trieu ◽  
Paul E. Brewin ◽  
...  

Abstract We surveyed the shallow-water sponges of Ascension Island using scuba diving. In total, we collected 58 sponge specimens from 17 locations at depths of 0.5–30 m. In addition, we compiled historical records of sponges. We describe nine species new to science: Niphates verityae sp. nov., Petrosia (Petrosia) ernesti sp. nov., Monanchora downesae sp. nov., Svenzea weberorum sp. nov., Erylus williamsae sp. nov., Ircinia nolanae sp. nov., Ircinia richardsoni sp. nov., Ircinia simae sp. nov. and Chondrosia browningorum sp. nov. We provide molecular sequences for three of the new species. We have added 50% to the number of known species and added two new genera and one family to the known Ascension Island sponge fauna. Twenty-six species, from 16 genera, and 13 families, are now reported from Ascension's shallow waters. Many of these may be endemic to the island. We discuss the biogeographic affinities of Ascension Island and emphasize the need for additional survey of the sponge fauna of remote islands such as Ascension.


Parasitology ◽  
2021 ◽  
pp. 1-26
Author(s):  
Diane P. Barton ◽  
Xiaocheng Zhu ◽  
Vanessa Lee ◽  
Shokoofeh Shamsi

2021 ◽  
Author(s):  
Dong Chen ◽  
Guowei Wei ◽  
Feng Pan

Abstract Although deep learning can automatically extract features in relatively simple tasks such as image analysis, the construction of appropriate representations remains essential for molecular predictions due to intricate molecular complexity. Additionally, it is often expensive, time-consuming, and ethically constrained to generate labeled data for supervised learning in molecular sciences, leading to challenging small and diverse datasets. In this work, we develop a self-supervised learning approach via a masking strategy to pre-train transformer models from over 700 million unlabeled molecules in multiple databases. The intrinsic chemical logic learned from this approach enables the extraction of predictive representations from task-specific molecular sequences in a fine-tuned process. To understand the importance of self-supervised learning from unlabeled molecules, we assemble three models with different combinations of databases. Moreover, we propose a new protocol based on data traits to automatically select the optimal model for a specific predictive task. To validate the proposed representation and protocol, we consider 10 benchmark datasets in addition to 38 ligand-based virtual screening datasets. Extensive validation indicates that the proposed representation and protocol show superb performance.


2021 ◽  
Author(s):  
Leon K Tran ◽  
Dai-Wei Huang ◽  
Nien-Kung Li ◽  
Julia Palacios ◽  
Hsiao-Han Chang ◽  
...  

To quantify the impact of COVID-19-related control measures on the spread of human influenza virus, we analyzed case numbers, viral molecular sequences, personal behavior data, and policy stringency data from various countries, and found consistent evidence of decrease in influenza incidence after the emergence of COVID-19.


Author(s):  
Mariana L Santana-Cisneros ◽  
Rossanna Rodríguez-Canul ◽  
Jesús Alejandro Zamora-Briseño ◽  
Monica Améndola-Pimenta ◽  
Roxana De Silva-Dávila ◽  
...  

Paralarvae (PL) are crucial to understanding the life cycle and population dynamics of cephalopods. Misidentification of species with similar morphology is a problem that hampers understanding of cephalopod composition and distribution. In this study, we used morphological and molecular approaches to carry out a comprehensive identification of Octopoda PL that inhabit two main areas (Tamaulipas and Yucatán) in the southern Gulf of Mexico (GoM). A total of 189 paralarvae were identified using morphological criteria. Of these, 52 PL were analyzed molecularly by sequencing the mitochondrial cytochrome c oxidase subunit I (COI) gene. We identified four species and five morphotypes. The molecular tools corroborated three of four species, while the molecular sequences of three out of four morphotypes indicated that they belong to three different species. All the genetic sequences had high similarities (99.3%–100%) with previous records. One species and one morphotype could not be sequenced because of unsatisfactory fixation; one morphotype remained as such after the molecular analysis. An identification tree was constructed for the species identified with the molecular approach. The species found off the Yucatán platform were Octopus vulgaris Type I, Octopus americanus, Macrotritopus defilippi, Amphioctopus burryi, A. cf. burryi, Octopus sp., and Callistoctopus furvus. The species identified off the Tamaulipas coast were Octopus insularis and M. defilippi. Paralarvae of O. vulgaris Type I and M. defilippi were the most abundant during 2016–2017. This study provides the first record of Octopoda PL in the southern GoM, including morphological descriptions and molecular sequences of the analyzed taxa.


PeerJ ◽  
2020 ◽  
Vol 8 ◽  
pp. e10063
Author(s):  
Sam Humphrey ◽  
Alastair Kerr ◽  
Magnus Rattray ◽  
Caroline Dive ◽  
Crispin J. Miller

Molecular sequences carry information. Analysis of sequence conservation between homologous loci is a proven approach with which to explore the information content of molecular sequences. This is often done using multiple sequence alignments to support comparisons between homologous loci. These methods therefore rely on sufficient underlying sequence similarity with which to construct a representative alignment. Here we describe a method using a formal metric of information, surprisal, to analyse biological sub-sequences without alignment constraints. We applied our model to the genomes of five different species to reveal similar patterns across a panel of eukaryotes. As the surprisal of a sub-sequence is inversely proportional to its occurrence within the genome, the optimal size of the sub-sequences was selected for each species under consideration. With the model optimized, we found a strong correlation between surprisal and CG dinucleotide usage. The utility of our model was tested by examining the sequences of genes known to undergo splicing. We demonstrate that our model can identify biological features of interest such as known donor and acceptor sites. Analysis across all annotated coding exon junctions in Homo sapiens reveals the information content of coding exons to be greater than the surrounding intron regions, a consequence of increased suppression of the CG dinucleotide in intronic space. Sequences within coding regions proximal to exon junctions exhibited novel patterns within DNA and coding mRNA that are not a function of the encoded amino acid sequence. Our findings are consistent with the presence of secondary information encoding features such as DNA and RNA binding sites, multiplexed through the coding sequence and independent of the information required to define the corresponding amino-acid sequence. We conclude that surprisal provides a complementary methodology with which to locate regions of interest in the genome, particularly in situations that lack an appropriate multiple sequence alignment.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Vlad Novitsky ◽  
Jon A. Steingrimsson ◽  
Mark Howison ◽  
Fizza S. Gillani ◽  
Yuanning Li ◽  
...  

Abstract Public health interventions guided by clustering of HIV-1 molecular sequences may be impacted by choices of analytical approaches. We identified commonly-used clustering analytical approaches, applied them to 1886 HIV-1 Rhode Island sequences from 2004–2018, and compared concordance in identifying molecular HIV-1 clusters within and between approaches. We used strict (topological support ≥ 0.95; distance 0.015 substitutions/site) and relaxed (topological support 0.80–0.95; distance 0.030–0.045 substitutions/site) thresholds to reflect different epidemiological scenarios. We found that clustering differed by method and threshold and depended more on distance than topological support thresholds. Clustering concordance analyses demonstrated some differences across analytical approaches, with RAxML having the highest (91%) mean summary percent concordance when strict thresholds were applied, and three (RAxML-, FastTree regular bootstrap- and IQ-Tree regular bootstrap-based) analytical approaches having the highest (86%) mean summary percent concordance when relaxed thresholds were applied. We conclude that different analytical approaches can yield diverse HIV-1 clustering outcomes and may need to be differentially used in diverse public health scenarios. Recognizing the variability and limitations of commonly-used methods in cluster identification is important for guiding clustering-triggered interventions to disrupt new transmissions and end the HIV epidemic.


2020 ◽  
Vol 21 (S12) ◽  
Author(s):  
Tatiana Dvorkina ◽  
Dmitry Antipov ◽  
Anton Korobeynikov ◽  
Sergey Nurk
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document