Thursday, May 31, 2012

Rapid Evolution of Enormous, Multichromosomal Genomes in Flowering Plant Mitochondria with Exceptionally High Mutation Rates
Genome size and complexity variation has been a long-term debate during the last decades.
In multi-cellular eukaryotes, genome expansion is a consequence of noncoding DNA proliferation [1]. Several theories have emerged to explain variation in genome size and complexity. Among them, the most generally accepted are the bulk-DNA hypothesis, followed by the selfish –DNA hypothesis [2]. However, theses hypotheses explain only partially divergent patterns observed in eukaryotes.

Mutational burden hypothesis (MBH), which is mostly based on population genetics principles, is a unifying concept that attempts to reconcile different points of view. This hypothesis implies that  “…noncoding element are generally deleterious but proliferate nonadaptively when small effective population reduce the effectiveness of selection relative to genetic drift”. In other words[3], the genome is constantly under two nonadaptative forces: random genetic drift and mutation pressure.

What was expected?
If the MBH is correct, a genome under high mutation rate would be reduce in term of size and complexity.

A glimpse of plant mitochondrial genomes: what make them special
Mitochondrial genomes exhibited a broad range of diversity in term of genome structure and diversity among eukaryotes [4]. The plant mitochondrial genome contain usually more than 90% of non coding DNA with usually low point mutation rate whereas animal mitochondrial seems refractory to such expansion of noncoding DNA [5]. The authors select the genus Silene, which include members with high mitochondrial mutations rate, while other members within the same genus have maintained their low rates.

Findings and Interpretations
A massive expansion of genome associated to massive acceleration of mutation rates at DNA level was clearly established in S. noctiflora and S. conica, as compared to S. vulgaris and S.  latifolia (Figs 1,2 and 3). However, during our round table discussion, it was unclear how the branch length of the tree presented in the figure 1 was computed. As no branch values were shown, was it done based on pre-computed data?

Theses observations were neither correlated to gene nor intron content. Usually genome growth is largely dependent on intronic and intergenic sequences. Intronic sequences did not shown significant variation among Silene species (shown in Table 1). As expected, this massive genome expansion was mostly due to intergenic sequences, which constitute 99% of the total genome size. These intergenic sequences in S. conica and S. noctiflora lack detectable homology when compared to other genomes. A possible explanation may be that high mutation rates may have exerted such pressure that made them significantly diverge from their counterparts in other Silene.

A striking feature in S. conica and S. noctiflora, was the large number of imperfect repeats observed, which were linked to the presence of large number of small circular-mapping chromosomes. It is also worthwhile to see that these chromosomes shared only short repeats with other parts of the genome. At the opposite of what was found in S. vulgaris and S. latifolia, fast-evolving genomes in S. conica and S. noctiflora had a reduced recombination rate (figure 6). The underlying idea is that high mutational rate may favor changes in the repeats that make them less efficient for recombination. However this argument has to be considered with caution, as recombination may also favor formation of novel sequences or chimeras, which may potentially contribute to genome instability instead of maintenance. It is still unclear whether this impaired recombination activity in fast evolving genome may be responsible for the expansion, but at least it would partially agree with the MBH.

The authors investigated, if the biparental inherence and heteroplasmy may play a role in genome expansion and finally claim that there is no significant impact, even if the supporting data was not shown, their logic behind was quite forward. ii) The same conclusion was draw from intraspecific nucleotide polymorphism.

Although the exact origin of expanded intergenic regions is still unclear, the authors discussed several potential answers:

1.     “Intergenic content may derive from nuclear genome”
This is unlikely as no significant homology with nuclear data could be readily identified.

2.     “Intergenic content may be due to selfish element proliferation”
The selfish DNA proliferation does not explain at all this genome expansion as no drastic change in terms of identifiable repeated elements was identified between fast evolving genomes and their counter parts.

3.     “Increase in intergenic content may be due to impairment of DNA repair mechanism coupled to high mutation rate?”
Since the population size and environmental conditions are important for MBH, at first glance it seems that the paper did not describe sufficiently the factors. For example, we discussed that the S. vulgaris and S. latifolia are known to be invasive, whereas their fast evolving S. conica and S. noctiflora are not invasive. A partial answer of this question is provide by Lynch [2], who wrote “that forces driving the evolution of genomic architecture are unlikely to be a direct consequences of organisms difference in lifestyle”. Since the genetic drift is important for MBH, the accumulation of intergenic sequences may be due also to a deficiency in removal mechanisms due to small population size.

Aspects not covered by the paper
The epigenome status and a potential link with genome expansion were not investigated at all in the paper. To which extent these factors affect variation in genome size and complexity remains an open question.

My take home message
The authors started with very interesting observations (i.e. massive genome expansion in two Silene species) and compared to the predictions from MBH. The finding described in the paper does not support prediction derived from the MBH. Despite significant effort from the authors, there is still no clear answer about the exact origin of overwhelming intergenic DNA in genome expansion neither the driving forces behind it.

Sloan, D., Alverson, A., Chuckalovcak, J., Wu, M., McCauley, D., Palmer, J., & Taylor, D. (2012). Rapid Evolution of Enormous, Multichromosomal Genomes in Flowering Plant Mitochondria with Exceptionally High Mutation Rates PLoS Biology, 10 (1) DOI: 10.1371/journal.pbio.1001241

Other References
1. Lynch M, Conery JS (2003) The origins of genome complexity. Science 302: 1401-1404.
2. Lynch M (2006) Streamlining and simplification of microbial genome architecture. Annu Rev Microbiol 60: 327-349.
3. Lynch M, Koskella B, Schaack S (2006) Mutation pressure and the evolution of organelle genomic architecture. Science 311: 1727-1730.
4. Lang BF, Gray MW, Burger G (1999) Mitochondrial genome evolution and the origin of eukaryotes. Annu Rev Genet 33: 351-397.
5. Wolfe KH, Li WH, Sharp PM (1987) Rates of nucleotide substitution vary greatly among plant mitochondrial, chloroplast, and nuclear DNAs. Proc Natl Acad Sci U S A 84: 9054-9058.

 posted by MRR for Ousmane H. Cissé

Friday, May 25, 2012

An Aboriginal Australian genome reveals separate human dispersals into Asia This blog section concerns a trendy debate in science, the human population history, which has extensions into daily life, as it can constitutes a topic of general public curiosity. Therefore, let’s see what is contribution described herein.


Modern human populations seems to be derived from a single African ancestral population, under the well supported “out of Africa” hypothesis (1). Particularly, for eastern Asian colonization a “single-dispersal” model have been hypothesized (2), which suggest the aboriginal australians are a lineage diversified recently within the Asian cluster. This hypothesis could be summarized in a topological representation, as drawn in figure 1A of the article (Africans,(Europeans,(Asians,Australians))). Recent studies dated the split between Europeans and Asians around 17K-43K years before the present (ybp). In addition, archaeological evidence supports modern humans in Australia back to ~50K ybp. Those inferences are incompatible with the above mentioned hypothesis, at least in a time framework. A second scenario could be hypothesized, with an early branching process and occupation of Australia, and probable later genetic exchange between Asians and Australians, described as (Africans, (Australians,(Asians, Europeans)). This possibility has been non tested so far. Using an ancient, free of current admixtures, aboriginal australian genome, and SNPs data from different human populations, as well as, a background in molecular evolution and population genetic theories, this paper aims to distinguish between competing hypotheses to tackle the human population relatedness and migrations history of ancient australian populations.

The facts in brief
  • A 100-year-old lock of hair from an aboriginal Australian male (from Museum of Archaeology and Ethnology, UK)
  • 31 Institutions implied in a worldwide scale
  • 58 Authors, with same geographical extent
  • An ancient genome sequenced by Illumina technology and SNP-chip on other human populations
  • Computational analyses (PCA, clustering methods, ABBA/BABA expectations)
  • A Science podcast interview (

We found the paper quite convincing in testing the two possible scenarios for human colonization in the Australian area. Next paragraphs will describe and discuss the evidence and test they used.

1. Testing the genetic clustering of Aboriginal Australian genome.

The principal component analysis illustrated in figure 1B shows the clustering pattern from 1220 individuals SNP chip data (449k SNPs), covering 79 human populations. This figure revealed a close relationship between the Australian genome, Highland Papua New Guinea (PNG), Bougainville and Aeta samples, all of them from the australo-melanesian region. That pattern could exclude any European contamination of the sample, which is highly probable by his long handling by Europeans. We noted the geographical tendency of a “continuous” colonization for human populations outside of Africa. I quoted continuous to clarify we are not referring to a single wave of colonization, but to a geographical ordination of the populations. A confusing point was expressed for the PCA inset, which looks like a 3D-box, but it already corresponds just to a zoom-in on the same PCA graph. A further review of the next PCA axes on supplementary material evidenced a very clear differentiation of the australo-melanesian sequences in the axis4.

We speculated about the amount of data explained in the first two PCA axes, which is not described. Contrary to our expectations, from experiences in other types of characters (as morphology and climatic variables), the proportion of variance explained on this plot seems to be very low, as usual for genomic studies. Then, we discussed a bit the idea of a checklist of requirements when a publication is being prepared: if you are planning to present an analysis, take at hand i, ii, iii and please do not forget to include them.

2. Testing admixture between Aboriginal Australian genome and other populations

The figure 1C describes the ancestry proportions of all individuals SNPs set, obtained by a maximum likelihood estimation in Admixture software. This clustering analysis resembles the Structure k-categories approach, in which each line in the plot correspond to an individual and the colors represent the ancestral populations identities. The number of k-categories is assigned a-priori, and can modify the ancestry proportions of certain individuals revealing admixture processes between populations. At first, using a k=5, the aboriginal australian sample appears belonging to the same ancestral population than PNG and a higher proportion of the Bougainville individuals. Interestingly, south Asian population seems to share a small proportion of the SNPs with the ancestral aboriginal australian category. Once we moved in deep k-values, as far as k=20, the aboriginal australian genome appears more mixed with PNG, Bougainville, Aetas and South Asian populations.

We debated the accuracy of use an individual genome to represent the admixture in the ancestral aboriginal australian population, and the unknown variability of the population at the ancient time, which is not being considered here. We formulated how could be affected the admixture patterns if this aboriginal Australian genome represents the most or the least mixed individual in the ancestral population? We wondered why there are not other recent Australian samples? Even if current aborigines inhabit in Australia. At this point in the discussion, we moved into more socio-political issues about the use of samples and information, as I stated at the beginning, this topic could be of general concern and discussion for several reasons.

The evidence presented so far and an additional test below can help to distinguish between single vs. multiple dispersals “out of Africa” and likely the proportion of admixture between the first established populations and the second wave of migration. Furthermore, questions about how or why the second migration replaced almost in a complete way the first one, from my point of view, constitute statements largely "historical" and therefore difficult to draw and test from the evidence available. I consider is very difficult to go beyond of the patterns and processes we are able to model and test.

3. D-test and ABBA/BABA hypothesis

We tried to identify the goal and configuration of this test to discriminate between the competing hypotheses. Complete information of the test could be found in references 3 and 4. I will try to summarize it in a nutshell. The D-test is a four-taxon configuration (see figure) in which only biallelic sites are considered (A and B variants), two out of four taxa have fixed states, commonly on the outgroup sequence (here the Africans, but also the Europeans), and the other two sites differ between groups (here Aboriginals and Asians). This configuration produces either BABA or ABBA patterns. The next step is to count the number of sites supporting one or other patterns. The D test = ∑ (sites ABBA - sites BABA) / ∑ total sites. Usually, the test was defined to identify admixture between populations (with AB/BA sites), with the expectation of an equal number of the two types of sites. D test can be considered more robust to sequencing errors because it compares nucleotides in more than one sequence, which is less probable that have been taken place twice by error. The authors explicitly said the test do not allow to distinguish neither between the two models of origin, nor gene flow between Asians and Australian populations, however I consider the D-test performed here can support the multiple dispersal model, due to a statistically significant excess of sites grouping Africans and Australian Aboriginal genomes (sites with pattern 2 in figure).

Expected vs. observed values of the D-test can facilitate the hypotheses discrimination (as they tried on the Table 2), however the expected values reported here for single and multiple dispersal models are so closer each other (~50%), with no credible intervals, that does difficult to support one or other hypothesis with the observed patterns. Finally, it is worthy of attention in the implementation of the D-test, consider that the patterns on current populations given the hypothetical past events, may have been altered by many other evolutionary processes as secondary gene flow, structure in the ancient population, incomplete lineage sorting, among others.

Figure 1. Grouping site patterns 1 and 2 used in D-test. Note that African and European populations have fixed states, whereas that Aboriginal Australian and Asian populations vary. This figure is a modification of the figure 3 in reference 5. Even though it is not clear the ABBA/BABA patters, the different grouping patterns are based on the article text describing the two models of early dispersal hypotheses used to perform the test.

Rasmussen, M., Guo, X., Wang, Y., Lohmueller, K., Rasmussen, S., Albrechtsen, A., Skotte, L., Lindgreen, S., Metspalu, M., Jombart, T., Kivisild, T., Zhai, W., Eriksson, A., Manica, A., Orlando, L., De La Vega, F., Tridico, S., Metspalu, E., Nielsen, K., Avila-Arcos, M., Moreno-Mayar, J., Muller, C., Dortch, J., Gilbert, M., Lund, O., Wesolowska, A., Karmin, M., Weinert, L., Wang, B., Li, J., Tai, S., Xiao, F., Hanihara, T., van Driem, G., Jha, A., Ricaut, F., de Knijff, P., Migliano, A., Gallego Romero, I., Kristiansen, K., Lambert, D., Brunak, S., Forster, P., Brinkmann, B., Nehlich, O., Bunce, M., Richards, M., Gupta, R., Bustamante, C., Krogh, A., Foley, R., Lahr, M., Balloux, F., Sicheritz-Ponten, T., Villems, R., Nielsen, R., Wang, J., & Willerslev, E. (2011). An Aboriginal Australian Genome Reveals Separate Human Dispersals into Asia Science, 334 (6052), 94-98 DOI: 10.1126/science.1211177

Additional references

1. H. Liu, F. Prugnolle, A. Manica, F. Balloux, A geographically explicit genetic model of worldwide human-settlement history. Am. J. Hum. Genet. 79, 230 (2006)
2. HUGO Pan-Asian SNP Consortium, Mapping human genetic diversity in Asia. Science 326, 1541 (2009).
3. Green RE, Krause J, Briggs AW, Maricic T, Stenzel U, Kircher M, Patterson N, Li H, Zhai W, Fritz MH, Hansen NF, Durand EY, Malaspinas AS, Jensen JD, Marques-Bonet T, Alkan C, Prüfer K, Meyer M, Burbano HA, Good JM, Schultz R, Aximu-Petri A, Butthof A, Höber B, Höffner B, Siegemund M, Weihmann A, Nusbaum C, Lander ES, Russ C, Novod N, Affourtit J, Egholm M, Verna C, Rudan P, Brajkovic D, Kucan Z, Gusic I, Doronichev VB, Golovanova LV, Lalueza-Fox C, de la Rasilla M, Fortea J, Rosas A, Schmitz RW, Johnson PL, Eichler EE, Falush D, Birney E, Mullikin JC, Slatkin M, Nielsen R, Kelso J, Lachmann M, Reich D, Pääbo S. A draft sequence of the Neandertal genome. Science 328, 5979, 2010.
4. Durand, E., Patterson, N., Reich, D., Slatkin, M. Testing for ancient admixture between closely related populations. Mol Biol Evol, 2011.
5. The Heliconius Genome Consortium. Butterfly genome reveals promiscuous exchange of mimicry adaptations among species. Nature. 2012

posted by MRR for Martha Serrano

Friday, May 4, 2012

Distinct signatures of diversifying selection revealed by genome analysis of respiratory tract and invasive bacterial populations (Shea et al, PNAS 2011)

Diversifying selection is a form of natural selection where intermediate values of a trait become less represented within a population, in favour of extreme values; a process that may subdivide a population between specialized niches and eventually lead to speciation. For instance, it can be theorized that a pathogen colonising several sites of the human body, where it is exposed to wildly different conditions and selective pressures, would have greater chances of survival by expressing a multitude of site-appropriate phenotypes than by reaching an adaptive compromise. While this strategy could be achieved through phenotypic plasticity, it could also result from genetically distinct strains of the pathogen.

Streptococcus pyogenes, also known as the group A streptococcus, or GAS, is a Gram-positive human bacterial pathogen. It is responsible for diseases such as impetigo, a localized skin infection, or pharyngitis, the streptococcal “sore throat”, both of which are mild superficial infections. The same bacterium is involved in a wide range of “invasive” infections, i.e. infections of sterile sites such as blood, which can be severe. On an experimental standpoint, S. pyogenes is a useful model for studying bacterial clonal evolution, because its strains exhibit relatively limited amounts of horizontal transfer across portions of the core genome. This is in contrast to bacterial species that frequently exchange genetic material, thus complicating phylogenetic inference.

The authors of this paper compare S. pyogenes strains found in superficial infections, more precisely in pharyngitis cases, to strains found in invasive infections. The authors enunciate several objectives:
  • First, they want to extend our limited knowledge about the genomes of pharyngitis strains. Greater efforts have so far been expended to dissect the molecular basis of the more health threatening invasive infections.
  • Secondly, they point out that very little is known about the precise genetic relationship between those two categories, and present their work as the first full genome analysis performed to address this issue. This analysis has been made possible thanks to high-thoughput DNA sequencing technologies.
  • In particular, they want to test the widely accepted model, supported by epidemiologic studies, that most strains causing invasive infections arise from pharyngeal or other benign infections. In other words, do pharyngitis and invasive strains belong to the same genetic pool, provided they were collected from the same geographical location?
  • Finally, they try to make sense of the genetic differences between pharyngitis and invasive strains in the light of diversifying selection. Can a link be made between the genomic sequences and the selective forces expected from the host oropharynx or sterile-site environments?

On the origin of data

During the tutorial session, we discussed the notions of convenience sampling and reusing material from previous studies. The work presented in this paper is based on eighty-six serotype M3 GAS pharyngitis strains collected from six regional laboratories across Ontario from 2002 to 2010, as well as on two hundred fifteen serotype M3 GAS invasive strains collected from the same location as part of a prospective population-based surveillance study of invasive GAS infections from 1992 to 2009. Those invasive infections include unequal numbers of soft tissue infections, bacteremias, lower respiratory infections, unknown invasive infections, septic arthritides, necrotizing fasciitis, meningitides, toxic shock syndrome cases, peritonitis and other unspecified invasive infections. We were unsure as to whether the different and long time periods involved, or the high number and diversity of invasive infections when compared to pharyngitis, should be seen as strengths or weaknesses for the pertinence of the paper. Some of our concerns on the matter resurfaced while we were discussing figure 4, as will be explained later.

The DNA sequence data obtained from those strains was mapped to the genome sequence of the M3 reference strain MGAS315 (NC_004070). A different but related experiment, also described in this paper, involved strains obtained from experimentally inoculated nonhuman primates [1].

Go figure

As with previous sessions of this tutorial, we organised our discussion on a figure-by-figure basis. We found most of the figures in this paper to be in a large part confusing and / or unconvincing:
  • Figure 1 shows the distribution of Chi2 statistics for unique polymorphisms per gene. The corresponding Bonferroni-adjusted P values, to correct for multiple testing, are written next to the dots on the plot. The meaning of the x-axis is not indicated, making the figure difficult to understand.
  • Figure 3 shows two unrooted neighbour-joining phylogenetic trees assembled from the complete list of all core biallelic SNPs. One corresponds to the eighty-six pharyngitis strains and the other to one hundred temporally matched invasive strains. The two trees were assembled completely independently from each other. The authors claim that their remarkably similar overall structure suggests common evolutionary histories.  First, we discussed whether or not it would have been possible to root the trees, and concluded that it probably would have been very difficult. We also had our doubts about the focus on SNPs that seems to appear through this paper. But most importantly, we didn’t see any striking similarity between the shapes of those two trees.
  • Figure 4, a combined phylogenetic tree for pharyngitis and invasive strains, was much more convincing in regard to the last issue. Pharyngitis strains are not massed on one branch of the tree, nor are invasive strains. An invasive strain will often be closer to a pharyngitis strain than to another invasive strain, supporting the idea that the two kinds of strains belong to the same genetic pool. Still, we also had problems with figure 4. The meaning of “SC”, as in SC1 to SC10, is unclear. Assuming those represent ten different strain collections, it would further suggest that strains from the same geographical area are more closely related. But the figure should more explicitly indicate which strains belong to which collection and give us more information about those collections. Otherwise, we are left wondering, for example, why SC3, SC4 and SC7 are so close from each other. Another issue, resurfacing from earlier in our discussion, is that the tree was assembled from a lot more invasive strains than pharyngitis strains. In particular, there is no pharyngitis strain in the “SC3 & SC4” region of the tree. A reader could believe that pharyngitis strains arise from invasive strains rather than the other way around. Finally, the emm3.53 pharyngitis strains, described in the paper as recently emerged subclone lineages, are presented in a very crowded part of the figure, and there would have been space for a zooming lens.

We spent a lot of time on Figure 2, because it is very detailed and made of five smaller figures. The first three figures are a schematic of polymorphisms within the has operon promoter as well as the hasA, hasB and covS genes, with a distinction between polymorphisms found in pharyngitis strains and those found in invasive strains. Those genes and several others were identified by the authors as having a significant excess of allelic variation, i.e. greater than expected by chance alone, although we think they should have better defined what they meant by an excess of polymorphism. Changes observed from the reference genome were predicted to either jeopardize or upregulate the expression of hasA and hasB. As the paper explains, those two genes encode proteins essential for the synthesis of the antiphagocytic hyaluronic capsule [2]. The rest of Figure 2 shows how the authors tested the effect of several of these polymorphisms on hasA transcript levels, hyaluronic acid production and colony morphology. It would appear that pharyngitis strains lose their antiphagocytic capsule, believably because they don’t need it in the host oropharynx, because it is expensive to produce and because it reduces exchanges between the cell and its environment. Invasive strains, on the other hand, need an increased resistance to the human immune system [3]. Although this is a very clear figure that supports the authors’ hypotheses, some of us still had doubt regarding whether or not S. pyogenes really was an example of diversifying selection.


The authors make several conclusions at the end of their paper:
  • It is the largest whole-genome comparative analysis of a bacterial pathogen to date.
  • It is a genome wide investigation of S. pyogenes strains involved in upper respiratory tract infection.
  • Invasive strains are genetically more similar to a given population of pharyngis strains than to invasive strains as a whole, confirming previous morphological observations.
  • They didn’t identify a single highly prevalent genetic variant explaining the various diseases. Instead, an accumulation of rare variants would be involved, altering functions such as the CovR/S global gene regulatory system [4].

We also had questions regarding research funding, the ecology of S. pyogenes or the reason they excluded prophage sequences from their main study (although there is a paragraph on prophage content). We noticed that this paper was a direct submission with a prearranged editor, a fact that might explain how we were able to come up with so many questions and criticisms in just an hour long tutorial session.

In my opinion, this paper was nonetheless a very interesting read. It shows the possibilities of new sequencing technologies, as well as the kind of thinking that must be done in order to understand diseases and epidemics. It made for a lively tutorial session.

  1. Virtaneva K, et al. (2005) Longitudinal analysis of the group A Streptococcus transcriptome in experimental pharyngitis in cynomolgus macaques. Proc Natl Acad Sci USA 102:9014–9019.
  2. Dougherty BA, van de Rijn I (1994) Molecular characterization of hasA from an operon required for hyaluronic acid synthesis in group A streptococci. J Biol Chem 269:169–175.
  3. Stollerman GH, Dale JB (2008) The importance of the group a streptococcus capsule in the pathogenesis of human infections: A historical perspective. Clin Infect Dis 46: 1038–1045.
  4. Federle MJ, McIver KS, Scott JR (1999) A response regulator that represses transcription of several virulence operons in the group A streptococcus. J Bacteriol 181:3649–3657.

Shea, P., Beres, S., Flores, A., Ewbank, A., Gonzalez-Lugo, J., Martagon-Rosado, A., Martinez-Gutierrez, J., Rehman, H., Serrano-Gonzalez, M., Fittipaldi, N., Ayers, S., Webb, P., Willey, B., Low, D., & Musser, J. (2011). Distinct signatures of diversifying selection revealed by genome analysis of respiratory tract and invasive bacterial populations Proceedings of the National Academy of Sciences, 108 (12), 5039-5044 DOI: 10.1073/pnas.1016282108