Skip to main navigation Skip to search Skip to main content

Genomic variant-identification methods may alter mycobacterium tuberculosis transmission inferences

  • Katharine S. Walter*
  • , Caroline Colijn
  • , Ted Cohen
  • , Barun Mathema
  • , Qingyun Liu
  • , Jolene Bowers
  • , David M. Engelthaler
  • , Apurva Narechania
  • , Darrin Lemmer
  • , Julio Croda
  • , Jason R. Andrews
  • *Corresponding author for this work
  • Stanford University
  • Simon Fraser University
  • Yale University
  • Columbia University
  • Fudan University
  • Translational Genomics Research Institute
  • American Museum of Natural History
  • Universidade Federal de Mato Grosso do Sul
  • Fundação Oswaldo Cruz

Research output: Contribution to journalArticleAcademicpeer-review

36 Downloads (Pure)

Abstract

Pathogen genomic data are increasingly used to characterize global and local transmission patterns of important human pathogens and to inform public health interventions. Yet, there is no current consensus on how to measure genomic variation. To test the effect of the variant-identification approach on transmission inferences for Mycobacterium tuberculosis, we conducted an experi-ment in which five genomic epidemiology groups applied variant-identification pipelines to the same outbreak sequence data. We compared the variants identified by each group in addition to transmission and phylogenetic inferences made with each variant set. To measure the performance of commonly used variant-identification tools, we simulated an outbreak. We compared the performance of three mapping algorithms, five variant callers and two variant filters in recovering true outbreak variants. Finally, we investigated the effect of applying increasingly stringent filters on transmission inferences and phylogenies. We found that variant-calling approaches used by different groups do not recover consistent sets of variants, which can lead to conflicting transmission inferences. Further, performance in recovering true variation varied widely across approaches. While no single variant-identification approach outperforms others in both recovering true genome-wide and outbreak-level variation, variant-identification algorithms calibrated upon real sequence data or that incorporate local reassembly outperform others in recovering true pairwise differences between isolates. The choice of variant filters contributed to extensive differences across pipelines, and applying increasingly stringent filters rapidly eroded the accuracy of transmission inferences and quality of phylogenies reconstructed from outbreak variation. Commonly used approaches to identify M. tuberculosis genomic variation have variable performance, particularly when predicting potential transmission links from pairwise genetic distances. Phylogenetic reconstruction may be improved by less stringent variant filtering. Approaches that improve variant identification in repetitive, hypervariable regions, such as long-read assemblies, may improve transmission inference.
Original languageEnglish
Article number000418
Pages (from-to)1-16
Number of pages16
JournalMicrobial genomics
Volume6
Issue number8
DOIs
Publication statusPublished - 2020

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 3 - Good Health and Well-being
    SDG 3 Good Health and Well-being

Keywords

  • Genomic epidemiology
  • Pathogen genomics
  • Transmission
  • Tuberculosis
  • Variant identification

Fingerprint

Dive into the research topics of 'Genomic variant-identification methods may alter mycobacterium tuberculosis transmission inferences'. Together they form a unique fingerprint.

Cite this