Automated Categorisation of Patent Claims that Reference Human Genome Sequences

Author(s):  
Donglu Wang ◽  
Gabriela Ferraro ◽  
Hanna Suominen ◽  
Osmat A. Jefferson
2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Chao-Hsin Chen ◽  
Chao-Yu Pan ◽  
Wen-chang Lin

Abstract The completion of human genome sequences and the advancement of next-generation sequencing technologies have engendered a clear understanding of all human genes. Overlapping genes are usually observed in compact genomes, such as those of bacteria and viruses. Notably, overlapping protein-coding genes do exist in human genome sequences. Accordingly, we used the current Ensembl gene annotations to identify overlapping human protein-coding genes. We analysed 19,200 well-annotated protein-coding genes and determined that 4,951 protein-coding genes overlapped with their adjacent genes. Approximately a quarter of all human protein-coding genes were overlapping genes. We observed different clusters of overlapping protein-coding genes, ranging from two genes (paired overlapping genes) to 22 genes. We also divided the paired overlapping protein-coding gene groups into four subtypes. We found that the divergent overlapping gene subtype had a stronger expression association than did the subtypes of 5ʹ-tandem overlapping and 3ʹ-tandem overlapping genes. The majority of paired overlapping genes exhibited comparable coincidental tissue expression profiles; however, a few overlapping gene pairs displayed distinctive tissue expression association patterns. In summary, we have carefully examined the genomic features and distributions about human overlapping protein-coding genes and found coincidental expression in tissues for most overlapping protein-coding genes.


2013 ◽  
Vol 11 (2) ◽  
pp. 77-85 ◽  
Author(s):  
Ben C. Shirley ◽  
Eliseos J. Mucaki ◽  
Tyson Whitehead ◽  
Paul I. Costea ◽  
Pelin Akan ◽  
...  

Blood ◽  
2005 ◽  
Vol 106 (11) ◽  
pp. 605-605
Author(s):  
Marco A. Marra ◽  
Martin Krzywinski ◽  
Readman Chiu ◽  
Matthew Field ◽  
Inanc Birol ◽  
...  

Abstract With the aim of identifying and sequencing mutations in follicular lymphoma genomes, we have begun a project to generate at least 24 deeply redundant sequence-ready Bacterial Artificial Clone (BAC) - based whole genome maps, each from a different individual’s lymphoma. BAC-array CGH and Affymetrix whole-genome sampling assays (WGSA) will be used along with the mapping data to identify genomic amplifications and losses in the lymphomas. Results from the mapping and array studies will be used to prioritize BAC clones for sequence analysis. Because each map will span essentially the entire genome of the corresponding lymphoma, we anticipate that essentially all regions of each tumor genome will be represented in easily sequenced BAC clones. This approach facilitates targeted sequencing of genomic regions of interest, including those containing genes relevant to cancer or harboring amplifications or deletions. Our mapping strategy hinges on the successful creation of deeply redundant high quality BAC libraries from primary lymphomas and large scale high throughput restriction enzyme fingerprinting of individual BACs with a version of the technology we used to map the human, mouse, rat and other genomes. The effort is large-scale, and will result in the generation of at least 2.5 million fingerprinted BAC clones over the next three years. Using the fingerprints, we will align the BACs to the reference human genome to assess genome coverage and to identify candidate genome rearrangements. In parallel, we will assemble the fingerprints into genome maps, looking for larger-scale genome variations between the lymphoma maps and the reference genome sequence. To test the feasibility of our approach, we obtained two restriction digest fingerprints from each of 140,000 individual BAC clones. BACs were sampled from a 7-fold redundant BAC library that had been created from genomic DNA purified from a primary follicular lymphoma sample. The fingerprints are being assembled into a clone map with the intent of reconstructing the entire tumor genome. 90,377 fingerprinted clones with unambiguous single alignments to the reference sequence were automatically assembled into 15,538 contigs. Subsequent rounds of semi-automatic contig merging further reduced the number of contigs to 5,433. Only 1,241 clones remained unassembled. We anchored the tumor genome map to the reference human genome sequence by aligning the clone fingerprints to the restriction map computed from the reference sequence assembly. As a result of this, we identified a BAC that captured the canonical t(14;18) translocation characteristic of follicular lymphomas. We sequenced this BAC and confirmed that it contains the expected translocation. Almost 2.6 gigabases (~91%) of the reference genome are represented in the evolving map, with an additional 50,000 clone fingerprints awaiting incorporation into the map assembly. Among these are repeat-rich and other clones that may well harbor genome rearrangements. Additional prioritization of sequencing targets will be undertaken when map construction and analysis of genome copy number alterations are complete.


2012 ◽  
Vol 22 (9) ◽  
pp. 1760-1774 ◽  
Author(s):  
J. Harrow ◽  
A. Frankish ◽  
J. M. Gonzalez ◽  
E. Tapanari ◽  
M. Diekhans ◽  
...  

2010 ◽  
Vol 11 (8) ◽  
pp. R88 ◽  
Author(s):  
Martin G Reese ◽  
Barry Moore ◽  
Colin Batchelor ◽  
Fidel Salas ◽  
Fiona Cunningham ◽  
...  

Author(s):  
Karen H. Miga ◽  
Ting Wang

The reference human genome sequence is inarguably the most important and widely used resource in the fields of human genetics and genomics. It has transformed the conduct of biomedical sciences and brought invaluable benefits to the understanding and improvement of human health. However, the commonly used reference sequence has profound limitations, because across much of its span, it represents the sequence of just one human haplotype. This single, monoploid reference structure presents a critical barrier to representing the broad genomic diversity in the human population. In this review, we discuss the modernization of the reference human genome sequence to a more complete reference of human genomic diversity, known as a human pangenome. Expected final online publication date for the Annual Review of Genomics and Human Genetics, Volume 22 is August 2021. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.


2020 ◽  
Author(s):  
Inácio Gomes Medeiros ◽  
André Salim Khayat ◽  
Beatriz Stransky ◽  
Sidney Emanuel Batista dos Santos ◽  
Paulo Pimentel de Assumpção ◽  
...  

Abstract This protocol aims to describe the building of a database of SARS-CoV-2 targets for siRNA approaches. Starting from the virus reference genome, we will derive sequences from 18 to 21nt-long and verify their similarity against the human genome and coding and non-coding transcriptome, as well as genomes from related viruses. We will also calculate a set of thermodynamic features for those sequences and will infer their efficiencies using three different predictors. The protocol has two main phases: at first, we align sequences against reference genomes. In the second one, we extract the features. The first phase varies in terms of duration, depending on computational power from the running machine and the number of reference genomes. Despite that, the second phase lasts about thirty minutes of execution, also depending on the number of cores of running machine. The constructed database aims to speed the design process by providing a broad set of possible SARS-CoV-2 sequences targets and siRNA sequences.


2017 ◽  
Vol 26 (16) ◽  
pp. 4145-4157 ◽  
Author(s):  
Michael D. Martin ◽  
Flora Jay ◽  
Sergi Castellano ◽  
Montgomery Slatkin

Sign in / Sign up

Export Citation Format

Share Document