Automated Categorisation of Patent Claims that Reference Human Genome Sequences

Abstract The completion of human genome sequences and the advancement of next-generation sequencing technologies have engendered a clear understanding of all human genes. Overlapping genes are usually observed in compact genomes, such as those of bacteria and viruses. Notably, overlapping protein-coding genes do exist in human genome sequences. Accordingly, we used the current Ensembl gene annotations to identify overlapping human protein-coding genes. We analysed 19,200 well-annotated protein-coding genes and determined that 4,951 protein-coding genes overlapped with their adjacent genes. Approximately a quarter of all human protein-coding genes were overlapping genes. We observed different clusters of overlapping protein-coding genes, ranging from two genes (paired overlapping genes) to 22 genes. We also divided the paired overlapping protein-coding gene groups into four subtypes. We found that the divergent overlapping gene subtype had a stronger expression association than did the subtypes of 5ʹ-tandem overlapping and 3ʹ-tandem overlapping genes. The majority of paired overlapping genes exhibited comparable coincidental tissue expression profiles; however, a few overlapping gene pairs displayed distinctive tissue expression association patterns. In summary, we have carefully examined the genomic features and distributions about human overlapping protein-coding genes and found coincidental expression in tissues for most overlapping protein-coding genes.

Download Full-text

Interpretation, Stratification and Evidence for Sequence Variants Affecting mRNA Splicing in Complete Human Genome Sequences

Genomics Proteomics & Bioinformatics ◽

10.1016/j.gpb.2013.01.008 ◽

2013 ◽

Vol 11 (2) ◽

pp. 77-85 ◽

Cited By ~ 21

Author(s):

Ben C. Shirley ◽

Eliseos J. Mucaki ◽

Tyson Whitehead ◽

Paul I. Costea ◽

Pelin Akan ◽

...

Keyword(s):

Human Genome ◽

Mrna Splicing ◽

Sequence Variants ◽

Genome Sequences

Download Full-text

Evaluation and Fuzzy Classification of Gene Finding Programs on Human Genome Sequences

Fuzzy Systems and Knowledge Discovery - Lecture Notes in Computer Science ◽

10.1007/11540007_102 ◽

2005 ◽

pp. 821-829 ◽

Cited By ~ 2

Author(s):

Atulya Nagar ◽

Sujita Purushothaman ◽

Hissam Tawfik

Keyword(s):

Human Genome ◽

Fuzzy Classification ◽

Gene Finding ◽

Genome Sequences

Download Full-text

Towards the Human Cancer Genome Project: A Sequence-Ready Physical Map of a Follicular Lymphoma Genome.

Blood ◽

10.1182/blood.v106.11.605.605 ◽

2005 ◽

Vol 106 (11) ◽

pp. 605-605

Author(s):

Marco A. Marra ◽

Martin Krzywinski ◽

Readman Chiu ◽

Matthew Field ◽

Inanc Birol ◽

...

Keyword(s):

Follicular Lymphoma ◽

Human Genome ◽

Large Scale ◽

Reference Genome ◽

Reference Sequence ◽

Whole Genome ◽

Bac Clones ◽

Genome Maps ◽

Tumor Genome ◽

Reference Human Genome

Abstract With the aim of identifying and sequencing mutations in follicular lymphoma genomes, we have begun a project to generate at least 24 deeply redundant sequence-ready Bacterial Artificial Clone (BAC) - based whole genome maps, each from a different individual’s lymphoma. BAC-array CGH and Affymetrix whole-genome sampling assays (WGSA) will be used along with the mapping data to identify genomic amplifications and losses in the lymphomas. Results from the mapping and array studies will be used to prioritize BAC clones for sequence analysis. Because each map will span essentially the entire genome of the corresponding lymphoma, we anticipate that essentially all regions of each tumor genome will be represented in easily sequenced BAC clones. This approach facilitates targeted sequencing of genomic regions of interest, including those containing genes relevant to cancer or harboring amplifications or deletions. Our mapping strategy hinges on the successful creation of deeply redundant high quality BAC libraries from primary lymphomas and large scale high throughput restriction enzyme fingerprinting of individual BACs with a version of the technology we used to map the human, mouse, rat and other genomes. The effort is large-scale, and will result in the generation of at least 2.5 million fingerprinted BAC clones over the next three years. Using the fingerprints, we will align the BACs to the reference human genome to assess genome coverage and to identify candidate genome rearrangements. In parallel, we will assemble the fingerprints into genome maps, looking for larger-scale genome variations between the lymphoma maps and the reference genome sequence. To test the feasibility of our approach, we obtained two restriction digest fingerprints from each of 140,000 individual BAC clones. BACs were sampled from a 7-fold redundant BAC library that had been created from genomic DNA purified from a primary follicular lymphoma sample. The fingerprints are being assembled into a clone map with the intent of reconstructing the entire tumor genome. 90,377 fingerprinted clones with unambiguous single alignments to the reference sequence were automatically assembled into 15,538 contigs. Subsequent rounds of semi-automatic contig merging further reduced the number of contigs to 5,433. Only 1,241 clones remained unassembled. We anchored the tumor genome map to the reference human genome sequence by aligning the clone fingerprints to the restriction map computed from the reference sequence assembly. As a result of this, we identified a BAC that captured the canonical t(14;18) translocation characteristic of follicular lymphomas. We sequenced this BAC and confirmed that it contains the expected translocation. Almost 2.6 gigabases (~91%) of the reference genome are represented in the evolving map, with an additional 50,000 clone fingerprints awaiting incorporation into the map assembly. Among these are repeat-rich and other clones that may well harbor genome rearrangements. Additional prioritization of sequencing targets will be undertaken when map construction and analysis of genome copy number alterations are complete.

Download Full-text

GENCODE: The reference human genome annotation for The ENCODE Project

Genome Research ◽

10.1101/gr.135350.111 ◽

2012 ◽

Vol 22 (9) ◽

pp. 1760-1774 ◽

Cited By ~ 2787

Author(s):

J. Harrow ◽

A. Frankish ◽

J. M. Gonzalez ◽

E. Tapanari ◽

M. Diekhans ◽

...

Keyword(s):

Human Genome ◽

Genome Annotation ◽

Encode Project ◽

Reference Human Genome

Download Full-text

A standard variation file format for human genome sequences

Genome Biology ◽

10.1186/gb-2010-11-8-r88 ◽

2010 ◽

Vol 11 (8) ◽

pp. R88 ◽

Cited By ~ 66

Author(s):

Martin G Reese ◽

Barry Moore ◽

Colin Batchelor ◽

Fidel Salas ◽

Fiona Cunningham ◽

...

Keyword(s):

Human Genome ◽

File Format ◽

Genome Sequences

Download Full-text

The Need for a Human Pangenome Reference Sequence

Annual Review of Genomics and Human Genetics ◽

10.1146/annurev-genom-120120-081921 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Karen H. Miga ◽

Ting Wang

Keyword(s):

Human Genome ◽

Genome Sequence ◽

Human Genetics ◽

Genomic Diversity ◽

Human Genome Sequence ◽

Reference Sequence ◽

Annual Review ◽

Publication Date ◽

Reference Structure ◽

Reference Human Genome

The reference human genome sequence is inarguably the most important and widely used resource in the fields of human genetics and genomics. It has transformed the conduct of biomedical sciences and brought invaluable benefits to the understanding and improvement of human health. However, the commonly used reference sequence has profound limitations, because across much of its span, it represents the sequence of just one human haplotype. This single, monoploid reference structure presents a critical barrier to representing the broad genomic diversity in the human population. In this review, we discuss the modernization of the reference human genome sequence to a more complete reference of human genomic diversity, known as a human pangenome. Expected final online publication date for the Annual Review of Genomics and Human Genetics, Volume 22 is August 2021. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.

Download Full-text

Generation of small interfering RNA (siRNA) database from SARS-CoV-2 genome sequences

10.21203/rs.3.pex-1207/v1 ◽

2020 ◽

Author(s):

Inácio Gomes Medeiros ◽

André Salim Khayat ◽

Beatriz Stransky ◽

Sidney Emanuel Batista dos Santos ◽

Paulo Pimentel de Assumpção ◽

...

Keyword(s):

Human Genome ◽

Design Process ◽

Small Interfering Rna ◽

Reference Genome ◽

Second Phase ◽

Computational Power ◽

Genome Sequences ◽

Interfering Rna ◽

Reference Genomes

Abstract This protocol aims to describe the building of a database of SARS-CoV-2 targets for siRNA approaches. Starting from the virus reference genome, we will derive sequences from 18 to 21nt-long and verify their similarity against the human genome and coding and non-coding transcriptome, as well as genomes from related viruses. We will also calculate a set of thermodynamic features for those sequences and will infer their efficiencies using three different predictors. The protocol has two main phases: at first, we align sequences against reference genomes. In the second one, we extract the features. The first phase varies in terms of duration, depending on computational power from the running machine and the number of reference genomes. Despite that, the second phase lasts about thirty minutes of execution, also depending on the number of cores of running machine. The constructed database aims to speed the design process by providing a broad set of possible SARS-CoV-2 sequences targets and siRNA sequences.

Download Full-text