Friday, March 30, 2012

Cryptic genetic variation promotes rapid evolutionary adaptation in an RNA enzyme (Hayden et al, Nature, 2011)
Cryptic genetic variation (CGV) is defined as “standing genetic variation that does not contribute to the normal range of phenotypes observed in a population, but that is available to modify a phenotype that arises after environmental change or the introduction of novel alleles” [Gibson & Dworkin, 2004]. As such, CGV fills the gap between :
1.    expressed genetic variation, defined as genetic variation that contributes to the normal range of phenotypes actually present in a population ;
2.     neutral genetic variation, that does not contribute to phenotypes under any likely genetic or environmental conditions ; a typical example of neutral genetic variation would be synonymous substitutions in protein coding sequences.
The necessity of the concept of CGV stems from the observation that environmental or genetic perturbations can reveal standing genetic variation that was silent or “cryptic” under standard conditions. CGV relative to a trait can thus be considered as genetic variation that conditionally affects that trait, the conditions or the trait itself being absent in the actual population and environment.
A classic example of CGV concept is the scutellar bristle number in Drosophila. The number of bristles on the scutellum is 4 with low variation in wild type Drosophilae; when the mutation scute is introduced into the genetic background, the number of bristles becomes variable. Artificial selection experiments show that the underlying variation is not stochastic but genetic. In this example, a genetic perturbation—the introduction of a mutation—reveals unforeseen genetic variation [example from Gibson & Dworkin, 2004].
The concept of CGV could be more than a genetic curiosity and a further escalation in the complexity of the links between genotype and phenotype. It is notably proposed to help explaining major transitions during macroevolution [McGuigan & Sgrò, 2009]: as it is relatively protected from selection, CGV could accumulate in natural populations under stable conditions and be released upon major environmental changes, promoting the adaptation of the population and precipitating its evolution.
However, this proposed role for CGV in helping adaptation is still largely theoretical and speculative. One aim of the paper we discussed this week is to demonstrate that CGV in a very simple molecular system under Darwinian selection can help adaptation of the system to a new environment. Success in this demonstration would be a proof of principle that CGV can actually be an important factor in evolution.

An account of the discussion

1. Experimental design

The experimental system chosen by the authors of the paper is based on a bacterial group I ribozyme (the self-splicing isoleucine tRNA from the betaproteobacterium Azoarcus). Group I ribozymes have been used multiple times in experiments of directed in vitro evolution : it is indeed easy with relatively simple molecular techniques—PCR, in vitro transcription and retrotranscription—to link the single enzymatic function encoded in the molecule—the capacity to perform a splicing reaction—to replication competence. The principle of the experimental system, inspired by Lehman & al. (1993), is illustrated in supplementary figure 1a of the paper. Briefly, a population of ribozymes (first generation) is left to react with an external RNA substrate; ribozymes able to react with the substrate are retrotranscribed in vitro (selection procedure for the “active fraction” of the population); the retrotranscribed molecules are then amplified by mutagenic or non-mutagenic PCR and finally transcribed into new reactivated RNA molecules (second generation). The procedure can be repeated multiple times. The Azoarcus ribozyme has been chosen over the Tetrahymena ribozyme used in the early 1990ies because its enzymatic function has been shown to be, for structural reasons, particularly robust to heat and denaturation [Kuo & al, 1999].

As described during the discussion, the experimental design comprises two main steps:
1.     During the first step, genetic variation is allowed to accumulate in the population of ribozymes over 10 generations. At this step, the amplification procedure involved a strongly mutagenic PCR ensuring the rapid accumulation of mutations in the population; at the same time, selection for active molecules made sure that the variation accumulating in the population would minimally affect the activity. Consequently, after 10 generations, the fraction of the RNA molecules able to react with the substrate in the evolved populations is similar to the one in the initial ribozyme population. The in vitro evolution was carried out in parallel in two conditions (line A and line B), the second increasing the stringency of the purifying selection by addition of a denaturing agent in the reaction (formamide).
2.     During the second step, the populations of the 10th generation from step 1 and the initial homogenous population of ribozymes were tested on a new substrate (RNA molecule with a phosphorothioate bond instead of the original phosphodiester bond). The same selection procedure as before was carried out with standard (moderately mutagenic) PCR. The rapidity of the adaptation—the increase in enzymatic efficiency over generations—of the initial population and the evolved populations is compared.
At each step, a sample of the population of molecules was sequenced and the changes in the population composition were monitored.

The genetic variation that accumulates in the population of molecules during step 1 is considered as “cryptic” by the authors, because it does not significantly affect the only trait or phenotype considered, i. e. the average “activity” of the population, estimated as the fraction of RNA molecules able to react with the substrate in given conditions (compare generation 1 and generation 10 in fig 1a).  During the discussion, it became clear that this trait is equivalent to the fitness.

2. Creating genetic variation in a population of ribozymes

The discussion focused first on figure 1 and the accumulation of genetic variation in the population of ribozymes.
Fig. 1bc shows that the average number of mutations per molecule and their variance increases over the mutation/selection procedure from generation 1 to generation 10 in both lines A and B. The average number of mutations per molecule seems slightly higher in line B (formamide) than in line A in all generations, whereas the number of mutable positions is higher in line A (35) than in line B (19).  This paradox remains unexplained, but it could point towards more extended epistasis in line B: mutable positions are less frequent, but more positions mutated together could ensure a better fitness in line B given the mutation/selection procedure.
Fig. 1a shows the change in the fraction of active molecules in the population in line A and B over 10 generations. No significant difference exists between the average active fraction in generation 1 and the average active fraction generation 10. However, there is a clear decrease between generation 1 and generation 4 and then an increase towards initial levels from generation 4 and generation 10 in both lines.  We speculated about this pattern without reaching a clear conclusion:
a.     one possibility mentioned during the discussion is that the initial decrease reflects the accumulation of mutations that affect the efficiency of the enzymatic reaction in individual molecules (some of them are inactive due to deleterious mutations and cannot propagate to the following generation, some are probably only less active or enzymatically efficient than the wild type and are less likely to pass to the next generation;
b.     the following increase in efficiency could involve the appearance in the population of some kind of compensatory mutations that help the population as a whole resist the mutagenic procedure or the appearance of fitter variants (higher enzymatic efficiency) that progressively invade the population; a point made during our session is that the selection conditions do not change over generations, thus the difference has to be in the genetic composition of the population.

3. Testing the adaptability of populations having accumulated genetic variation

When ribozyme molecules from the initial population and from the generation 10 of lines A and B are selected on a new substrate (RNA with a phosphorothioate bond), the adaptation of lines having accumulated mutations (New-A and New-B) was much faster than the initial homogenous wild type population (New-wild-type) (fig. 2a: highest difference at generation 5). The global interpretation of the authors is that the variation created in the lines A and B was somehow pre-adapted to the new environment; the Darwinian orthodoxy of this concept was briefly discussed and we concluded that the authors supposed no teleological meaning in it : “pre-adapted variation” is variation generated by chance that happens to give a higher fitness or is closer to more fit genotypes in the new environment.

Two ribozyme variants increased rapidly under the new selective conditions—Azo* and AzoD— and were discussed in more detail.

The Azo* mutant is a ribozyme variant with increased activity on the new substrate (Fig. 2c). The individual mutations composing the Azo* genotype and several combinations of them were detected in the lines A and B during step 1. Importantly, most of these genetic variants were confirmed to be cryptic in the sense that they do not affect (or affect negatively) the activity of the ribozyme in the presence of the initial RNA substrate. The scenario suggested by the authors is the following:
   Azo* individual mutations or combinations of individual mutations—more or less neutral under the selective conditions of step 1—could accumulate in the population because they were not under selection;
   When the selective conditions were changed and the phosphorothioate-bond-containing substrate was introduced, the populations of molecules containing individuals closer, in the genotype space, to Azo* could adapt more swiftly, because the number of mutational steps needed to make them more active on the new substrate was lower. Thus, populations containing cryptic genetic variation have a clear selective advantage under changing environments.

The AzoD genotype crystallized an interesting part of the discussion, as it is a variant that, although inactive on the new substrate, could become the dominant variant in one of the populations (line New-A) during the step 2 of the experiment. Apparently, what happens is that AzoD is unable to perform the splicing reaction by itself (possibly because of structural problems), but can perform it in the presence of either the wild type or the Azo* ribozyme variants (fig. 2d). This could be interpreted as a form of molecular parasitism or commensalism (or less likely mutualism). If AzoD needs a partner to perform the reaction and prevents its partner from performing it itself (or competes with it for the substrate), it would be a form of parasitism. If AzoD needs a partner to perform the reaction and does not prevent its partner from performing it itself (or, better, uses a partner that has already reacted), it would be a form of commensalism. Mutualism is less likely, because of the observed decreased fitness of Azo* in the presence of AzoD, but mechanistic studies would be needed to definitely solve the question.

During the discussion, it was noticed that the fitness values given in figure 2b are not only dependent on the substrate but also on the population composition (difference in fitness for Azo* in the presence of absence of AzoD). This clearly shows that the “environment” of a given molecule comprises not  only the substrate and reaction conditions but also the competing molecules.

4. Analysis of the evolution of the population of ribozymes under the new selective conditions

Sequence data of samples from all the experimental steps made it possible to study the genetic evolution of the population of ribozymes, in particular during the second step of the experiment. To facilitate representation and visualization, a subset of the data (generations 1, 4 and 8 of the new lines) were analyzed by principal component analysis (PCA). We discussed the procedure itself: on the basis of a multiple alignment of all sequence data to be analyzed (all 3 or maybe 8 generations together), each individual ribozyme molecule was first reduced to a vector of 5 possible values (gap or one of the 4 nucleotides); PCA was performed on this data and the position of each molecule in the multidimensional space of genotypes was projected into the two dimensions maximizing information, resulting in fig. 3a.

The fig. 3a helps understanding the evolutionary evolutionary trends in the presence of the new substrate. We discussed in particular the following elements:
a.     The figure shows the spectacular accumulation of molecules related to the AzoD and to the Azo* genotypes in lines New-A and New-B respectively. The AzoD genotype forms a group separated from the main New-A population along the second principal component axe with very little intermediate forms (possible disruptive selection). The Azo* genotype (yellow) is not efficiently discriminated from its original population by the PCA.
b.     From the figure, it is evident that the genetic variation is higher in the New-A and B lines than in the New-wild-type line across all generations and that, over generations, subsets from the New-wild-type line tend to invade positions closer to the original New-A and B populations (where Azo* can be found).

A personal touch

I have a few more personal comments and interrogations about this paper.

1.     I was wondering why the authors would start their paper with a sentence as strange as: “Cryptic variation is caused by the robustness of phenotypes to mutations”. My point is that variation cannot be caused by robustness if you understand these three words in their usual sense. One could meaningfully say that cryptic genetic variation is authorized or allowed by robustness of phenotypes to mutations or that cryptic genetic variation is caused by mutation assuming the robustness of the phenotype. But the sentence as it is written either contains a semantic mistake or is a meaningful paradox, in which case my natural reluctance to follow self-citation references (note 1) entirely accounts for my confusion…

2.     I am still puzzled about the argument the authors repeatedly put forward that the absence of a difference in the “active fraction” of ribozymes between the initial and the G10 populations of step 1 implies that the genetic variation present in G10 is cryptic. I do not see why this should necessarily be true, at least if the enzymatic reaction is not complete as it seems to be the case here (see supplementary figure 2: maximum ~30% of molecules reacted in wild type population). If a fraction of the ribozyme variants in the population containing genetic variation has better kinetic parameters (is more efficient) than the wild type molecule and another fraction has worse kinetic parameters than the wild type molecule, the apparent “active fraction“ (after 60 minutes reaction) of the population could be the same than in the initial population; in this particular case, the genetic variation, as it affects the trait under selection, would not be cryptic, only unseen (owing to limitations of the experimental design). But I agree that the protocol used by the author should by and large enrich the population for cryptic variation.

3.     As hard as I tried, I could not reconcile the data presented in the figure 1a and the supplementary figure 5. The initial decrease (G1 to G4) and following increase (G4 to G10) in the active fraction of ribozymes over selection rounds is clear in figure 1a. As the selection conditions do not change from G1 to G10, the pattern has to be explained by accumulation or loss of specific genetic variation in the population of molecules. I was expecting to see something happening around generation 4 in the graphs representing the fraction of molecules in the population with a mutation at a specific nucleotide as a function of time (selection rounds) (sup. Fig. 5); and nothing obvious seems to change in the population composition... Besides, about this supplementary figure 5, it is not clear to me how a position could be mutated in 40% (or even 20%) of the population after only 1 round of selection (lilac line), when the average number of mutations per molecule should be 1.  If I am not mistaken, molecules with a mutation at a given position should be 1 out of about 200 (length of the ribozyme) in the mutagenized population; if 30% of this population can react (suppl. Fig. 2) and the molecules containing that mutation are all functional, they should compose maximum 1 out of about 60 molecules reacted or 1.6%. My logics is most probably faulty, but I do not see where; if you find out, please tell me.

4.     I wonder what would come out of a repetition of the step 2 with the same three molecular populations but in the presence of the initial RNA substrate. If the change in the “active fraction” of the population over the first selection rounds is indistinguishable for all three lines, then it would support the claim that most of the variation in the mutated populations was actually cryptic; if the pattern is the same as in the actual step  2 (differences between new A and B on one side and new wild type on the other), then it could mean—but not necessarily—that some kind of variation accumulated in the populations from step 1 that actually affects the enzymatic activity or, alternatively, the capacity of the molecule to cope with mutations (that still occur during normal PCR). As an author, I would probably have done this experiment, even if its interpretation could have been problematic.


Gibson G. & Dworkin I., “Uncovering cryptic genetic variation”, Nat Rev Genet. 2004 Sep;5(9):681-90.
Kuo LY, Davidson LA, Pico S., “Characterization of the Azoarcus ribozyme: tight binding to guanosine and substrate by an unusually small group I ribozyme”, Biochim Biophys Acta. 1999 Dec 23;1489(2-3):281-92.
Lehman N, Joyce GF., “Evolution in vitro of an RNA enzyme with altered metal dependence”, Nature. 1993 Jan 14;361(6408):182-5.
McGuigan K. & Sgrò CM., “Evolutionary consequences of cryptic genetic variation”, Trends Ecol Evol. 2009 Jun;24(6):305-11.

Hayden, E., Ferrada, E., & Wagner, A. (2011). Cryptic genetic variation promotes rapid evolutionary adaptation in an RNA enzyme Nature, 474 (7349), 92-95 DOI: 10.1038/nature10083

Tuesday, March 20, 2012

The genome of the green anole lizard and a comparative analysis with birds and mammals

Reptiles had major evolutionary novelty: development of amniotic egg, which enabled breeding outside of the water. Until recently, only available genomes from the reptilian lineage were coming from the birds, therefore this paper and accompanying data provides a very valuable resource for further analysis of amniote evolution.

The different aspects of lizard genome that were considered:
  1. transposable elements
  2. microchromosomes and synteny
  3. GC content
  4. sex determination system
  5. egg protein evolution
  6. adaptive radiation/ecology

Around 30 percent of the lizard genome consists of transposable elements. It is fascinating that unlike in mammals and birds, there is much higher variety of active elements in the lizard genome, but also low rate of their accumulation. When the authors compared the mammalian conserved elements with the lizard genome, they found that several of these elements originate from transposable elements found in the lizard genome. The authors used the term exaptation to describe the process how certain mobile elements that were active in the amniote ancestor have putative function in mammals (most probably as regulatory elements). During the discussion we agreed that this term is not so appropriate since it would imply that the mobile elements had a function in the genome from the beginning (the time of insertion), which is not the case.

Another suprising finding in the paper is high synteny between chicken and lizard chromosomes, 19 out of 22 (anchored) chicken chromosomes are each syntenic to a single lizard chromosome over their entire length, whereas only 6 human chromosomes are syntenic to a single opossum chromosome. These findings are in contrast to what would be expected when considering the time of the divergence, 148 million years for human-opossum versus 280 million years since chicken-lizard divergence. The authors did not discuss putative reasons why this is the case. Moreover, it is characteristic for the reptiles to have microchromosomes. Amazingly, all lizard’s microchromosomes align to microshromosomes in the chicken implying that these chromosomes probably emerged in the reptile ancestor.

When looking at the GC content in the lizard genome there is local variation. However, when comparing syntenic regions from lizard, chicken and human, it is obvious that lizard genome lacks GC isochores. It is interesting to see that the mean GC content in all three species is very similar, but the lizard has more homogenous distribution.

It was known previously that this particular anole species has genetic sex determination, but it was not known which system was used XY or ZW. By performing FISH analysis, authors could see that X chromosome is present in 2 copies in female and one in the male indicating XY system. Unfortunately, they were not able to determine the Y chromosome, although they hypothesis it exists since males and females have the same number of chromosomes. Moreover, they did not identify lizard sex determining gene.

There were 17 172 protein-coding genes found in the lizard genome. It was found that the lizard has lineage specific duplications of various egg proteins. When comparing egg proteins vs. non-egg proteins in orthologues between the chicken and the lizard, dN/dS ratio is higher for the egg proteins suggesting reduced purifying selection and/or positive selection. This finding is not so surprising since it was previously known that sex and reproduction related proteins are one of the fastest evolving proteins. Additionally, 11 opsin genes were found supporting the notion that the lizards have excellent colour vision which is very important in sexual selection and species recognition.

Anole lizard represents a textbook case of adaptive evolution. By designing primers based on the sequenced genome, researches sampled 20 kb sequence datasets from protein coding and noncoding regions from 93 species of anoles. The analyses of these sequences confirmed previous notion based on morphological and molecular data that ecomorphs evolved independently on each island. Moreover, it shed light on the order of colonization events that took place.

Although in the beginning of the discussion there was an opinion that maybe the article addressed many different topics without going into depth, after discussing further the general opinion was changed. Since it is a genome paper, there is already a great effort invested in assembling the genome and conveying basic analysis. This article went even further adding a lot of additional analysis and experiments to contribute to our understanding of lizards, reptiles and consequently the amniote evolution.

Alföldi, J., Di Palma, F., Grabherr, M., Williams, C., Kong, L., Mauceli, E., Russell, P., Lowe, C., Glor, R., Jaffe, J., Ray, D., Boissinot, S., Shedlock, A., Botka, C., Castoe, T., Colbourne, J., Fujita, M., Moreno, R., ten Hallers, B., Haussler, D., Heger, A., Heiman, D., Janes, D., Johnson, J., de Jong, P., Koriabine, M., Lara, M., Novick, P., Organ, C., Peach, S., Poe, S., Pollock, D., de Queiroz, K., Sanger, T., Searle, S., Smith, J., Smith, Z., Swofford, R., Turner-Maier, J., Wade, J., Young, S., Zadissa, A., Edwards, S., Glenn, T., Schneider, C., Losos, J., Lander, E., Breen, M., Ponting, C., & Lindblad-Toh, K. (2011). The genome of the green anole lizard and a comparative analysis with birds and mammals Nature, 477 (7366), 587-591 DOI: 10.1038/nature10390