Genome-wide Imputation Using the Practical Haplotype Graph in the Heterozygous Crop Cassava
Abstract Genomic applications such as genomic selection and genome-wide association have become increasingly common since the advent of genome sequencing. The cost of sequencing has decreased in the past two decades, however genotyping costs are still prohibitive to gathering large datasets for these genomic applications, especially in non-model species where resources are less abundant. Genotype imputation makes it possible to infer whole genome information from limited input data, making large sampling for genomic applications more feasible. Imputation becomes increasingly difficult in heterozygous species where haplotypes must be phased. The Practical Haplotype Graph is a recently developed tool that can accurately impute genotypes, using a reference panel of haplotypes. We showcase the ability of the Practical Haplotype Graph to impute genomic information in the highly heterozygous crop cassava (Manihot esculenta). Accurately phased haplotypes were sampled from runs of homozygosity across a diverse panel of individuals to populate PHG, which proved more accurate than relying on computational phasing methods. The Practical Haplotype Graph achieved high imputation accuracy, using sparse skim-sequencing input, which translated to substantial genomic prediction accuracy in cross validation testing. The Practical Haplotype Graph showed improved imputation accuracy, compared to a standard imputation tool Beagle, especially in predicting rare alleles.