Resolving Complex Microbial Populations and Transmission Networks Through Haplotype Reconstruction
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Next-generation sequencing (NGS) is frequently applied to a mixture of genomes from a complex population that has to be bulk-sequenced. For instance, in virology, the outcome of viral sequencing is generally a mixture of different viruses called ‘quasispecies’. For downstream analyses such as within-host evolution, the quasispecies need to be reconstructed by de-convoluting the aggregated variation data in silico. In this project, I have contributed to the fields of within- and between-host evolutionary analysis by 1) designing and implementing a state-of-the-art haplotype reconstruction program, PoolHapX and 2) quantifying the accuracy of transmission relationship inference from patient pathogen consensus sequences. Existing haplotype reconstruction tools usually use either read-based genomics information or statistics-based linkage sharing across population(s). PoolHapX is the first haplotype reconstruction tool to integrate read-based genomics information and statistics-based linkage-sharing across the population to handle very long sequences, opening new avenues for study into complex within-host populations. I have additionally demonstrated that with consensus sequences alone, a maximum of 67% of person-to-person transmission relationships can be accurately recovered. In the future, within-host haplotypes will be integrated into transmission inference methods to improve inference accuracy. Improvements in the resolution of within- and between-host linkage patterns will empower local epidemiological control; for example, by identifying the genetic properties of high-risk transmission groups to target for clinical support.