Characterization and modeling of the Haemophilus influenzae core and supragenomes based on the complete genomic sequences of Rd and 12 clinical nontypeable strains

Justin S. Hogg, Fen Z. Hu, Benjamin Janto, Robert Boissy, Jay Hayes, Randy Keefe, J. Christopher Post, Garth D. Ehrlich

Research output: Contribution to journalArticle

170 Citations (Scopus)

Abstract

Background: The distributed genome hypothesis (DGH) posits that chronic bacterial pathogens utilize polyclonal infection and reassortment of genic characters to ensure persistence in the face of adaptive host defenses. Studies based on random sequencing of multiple strain libraries suggested that free-living bacterial species possess a supragenome that is much larger than the genome of any single bacterium. Results: We derived high depth genomic coverage of nine nontypeable Haemophilus influenzae (NTHi) clinical isolates, bringing to 13 the number of sequenced NTHi genomes. Clustering identified 2,786 genes, of which 1,461 were common to all strains, with each of the remaining 1,328 found in a subset of strains; the number of clusters ranged from 1,686 to 1,878 per strain. Genic differences of between 96 and 585 were identified per strain pair. Comparisons of each of the NTHi strains with the Rd strain revealed between 107 and 158 insertions and 100 and 213 deletions per genome. The mean insertion and deletion sizes were 1,356 and 1,020 base-pairs, respectively, with mean maximum insertions and deletions of 26,977 and 37,299 base-pairs. This relatively large number of small rearrangements among strains is in keeping with what is known about the transformation mechanisms in this naturally competent pathogen. Conclusion: A finite supragenome model was developed to explain the distribution of genes among strains. The model predicts that the NTHi supragenome contains between 4,425 and 6,052 genes with most uncertainty regarding the number of rare genes, those that have a frequency of <0.1 among strains; collectively, these results support the DGH.

Original languageEnglish (US)
Article numberR103
JournalGenome biology
Volume8
Issue number6
DOIs
StatePublished - Jun 5 2007

Fingerprint

Haemophilus influenzae
influenza
genomics
genome
Genome
gene
modeling
Base Pairing
Genes
pathogen
Libraries
Uncertainty
Cluster Analysis
genes
persistence
Bacteria
bacterium
pathogens
Infection
uncertainty

ASJC Scopus subject areas

  • Ecology, Evolution, Behavior and Systematics
  • Genetics
  • Cell Biology

Cite this

Characterization and modeling of the Haemophilus influenzae core and supragenomes based on the complete genomic sequences of Rd and 12 clinical nontypeable strains. / Hogg, Justin S.; Hu, Fen Z.; Janto, Benjamin; Boissy, Robert; Hayes, Jay; Keefe, Randy; Post, J. Christopher; Ehrlich, Garth D.

In: Genome biology, Vol. 8, No. 6, R103, 05.06.2007.

Research output: Contribution to journalArticle

Hogg, Justin S. ; Hu, Fen Z. ; Janto, Benjamin ; Boissy, Robert ; Hayes, Jay ; Keefe, Randy ; Post, J. Christopher ; Ehrlich, Garth D. / Characterization and modeling of the Haemophilus influenzae core and supragenomes based on the complete genomic sequences of Rd and 12 clinical nontypeable strains. In: Genome biology. 2007 ; Vol. 8, No. 6.
@article{dc82b75e4ae64a1a8cea92768d43c516,
title = "Characterization and modeling of the Haemophilus influenzae core and supragenomes based on the complete genomic sequences of Rd and 12 clinical nontypeable strains",
abstract = "Background: The distributed genome hypothesis (DGH) posits that chronic bacterial pathogens utilize polyclonal infection and reassortment of genic characters to ensure persistence in the face of adaptive host defenses. Studies based on random sequencing of multiple strain libraries suggested that free-living bacterial species possess a supragenome that is much larger than the genome of any single bacterium. Results: We derived high depth genomic coverage of nine nontypeable Haemophilus influenzae (NTHi) clinical isolates, bringing to 13 the number of sequenced NTHi genomes. Clustering identified 2,786 genes, of which 1,461 were common to all strains, with each of the remaining 1,328 found in a subset of strains; the number of clusters ranged from 1,686 to 1,878 per strain. Genic differences of between 96 and 585 were identified per strain pair. Comparisons of each of the NTHi strains with the Rd strain revealed between 107 and 158 insertions and 100 and 213 deletions per genome. The mean insertion and deletion sizes were 1,356 and 1,020 base-pairs, respectively, with mean maximum insertions and deletions of 26,977 and 37,299 base-pairs. This relatively large number of small rearrangements among strains is in keeping with what is known about the transformation mechanisms in this naturally competent pathogen. Conclusion: A finite supragenome model was developed to explain the distribution of genes among strains. The model predicts that the NTHi supragenome contains between 4,425 and 6,052 genes with most uncertainty regarding the number of rare genes, those that have a frequency of <0.1 among strains; collectively, these results support the DGH.",
author = "Hogg, {Justin S.} and Hu, {Fen Z.} and Benjamin Janto and Robert Boissy and Jay Hayes and Randy Keefe and Post, {J. Christopher} and Ehrlich, {Garth D.}",
year = "2007",
month = "6",
day = "5",
doi = "10.1186/gb-2007-8-6-r103",
language = "English (US)",
volume = "8",
journal = "Genome Biology",
issn = "1465-6906",
publisher = "BioMed Central",
number = "6",

}

TY - JOUR

T1 - Characterization and modeling of the Haemophilus influenzae core and supragenomes based on the complete genomic sequences of Rd and 12 clinical nontypeable strains

AU - Hogg, Justin S.

AU - Hu, Fen Z.

AU - Janto, Benjamin

AU - Boissy, Robert

AU - Hayes, Jay

AU - Keefe, Randy

AU - Post, J. Christopher

AU - Ehrlich, Garth D.

PY - 2007/6/5

Y1 - 2007/6/5

N2 - Background: The distributed genome hypothesis (DGH) posits that chronic bacterial pathogens utilize polyclonal infection and reassortment of genic characters to ensure persistence in the face of adaptive host defenses. Studies based on random sequencing of multiple strain libraries suggested that free-living bacterial species possess a supragenome that is much larger than the genome of any single bacterium. Results: We derived high depth genomic coverage of nine nontypeable Haemophilus influenzae (NTHi) clinical isolates, bringing to 13 the number of sequenced NTHi genomes. Clustering identified 2,786 genes, of which 1,461 were common to all strains, with each of the remaining 1,328 found in a subset of strains; the number of clusters ranged from 1,686 to 1,878 per strain. Genic differences of between 96 and 585 were identified per strain pair. Comparisons of each of the NTHi strains with the Rd strain revealed between 107 and 158 insertions and 100 and 213 deletions per genome. The mean insertion and deletion sizes were 1,356 and 1,020 base-pairs, respectively, with mean maximum insertions and deletions of 26,977 and 37,299 base-pairs. This relatively large number of small rearrangements among strains is in keeping with what is known about the transformation mechanisms in this naturally competent pathogen. Conclusion: A finite supragenome model was developed to explain the distribution of genes among strains. The model predicts that the NTHi supragenome contains between 4,425 and 6,052 genes with most uncertainty regarding the number of rare genes, those that have a frequency of <0.1 among strains; collectively, these results support the DGH.

AB - Background: The distributed genome hypothesis (DGH) posits that chronic bacterial pathogens utilize polyclonal infection and reassortment of genic characters to ensure persistence in the face of adaptive host defenses. Studies based on random sequencing of multiple strain libraries suggested that free-living bacterial species possess a supragenome that is much larger than the genome of any single bacterium. Results: We derived high depth genomic coverage of nine nontypeable Haemophilus influenzae (NTHi) clinical isolates, bringing to 13 the number of sequenced NTHi genomes. Clustering identified 2,786 genes, of which 1,461 were common to all strains, with each of the remaining 1,328 found in a subset of strains; the number of clusters ranged from 1,686 to 1,878 per strain. Genic differences of between 96 and 585 were identified per strain pair. Comparisons of each of the NTHi strains with the Rd strain revealed between 107 and 158 insertions and 100 and 213 deletions per genome. The mean insertion and deletion sizes were 1,356 and 1,020 base-pairs, respectively, with mean maximum insertions and deletions of 26,977 and 37,299 base-pairs. This relatively large number of small rearrangements among strains is in keeping with what is known about the transformation mechanisms in this naturally competent pathogen. Conclusion: A finite supragenome model was developed to explain the distribution of genes among strains. The model predicts that the NTHi supragenome contains between 4,425 and 6,052 genes with most uncertainty regarding the number of rare genes, those that have a frequency of <0.1 among strains; collectively, these results support the DGH.

UR - http://www.scopus.com/inward/record.url?scp=36549029848&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=36549029848&partnerID=8YFLogxK

U2 - 10.1186/gb-2007-8-6-r103

DO - 10.1186/gb-2007-8-6-r103

M3 - Article

C2 - 17550610

AN - SCOPUS:36549029848

VL - 8

JO - Genome Biology

JF - Genome Biology

SN - 1465-6906

IS - 6

M1 - R103

ER -