genomes ecology evolution etc: evolution

Showing posts with label evolution. Show all posts

Tuesday, September 18, 2012

The evolutionary history of polar bears

The study of the Ursus lineage, including brown bear (Ursus arctos), black bear (Ursus americanus) and polar bear (Ursus maritimus), provides the ability of addressing the subject of adaptation to extreme (salty and glacial) environments in mammals. Moreover, in last few decades, polar bears won public and media attention, being one of the most charismatic species endangered by global warming and Arctic ice melting. To trace history of innovations and determine response to environmental changes in populations of polar bears, two articles published in Science and Proceedings of the National Academy of Sciences in April and June 2012 provide new data and insights to resolve this question.

The absence of fossil of polar bears dating before the late Pleistocene (circa 126 000 years ago) and mitochondrial data, suggesting that polar bear were very closely related to a group of brown bear living in Admiralty, Baranof and Chichagof (ABC) islands in Alaska, previously led to believe that polar bears recently emerged from brown bears. The consequences of this hypotheses would be :

Polar bear underwent a very rapid and recent (less than 200 ky ago) adaptation to extreme environment (previously not seen in mammals)
Brown bear is a paraphyletic taxon, as polar bear is the sister specie of the ABC bears (see Fig. 1)


Fig. 1: Miller et al., Polar and brown bear genomes reveal ancient admixture and demographic footprints of past climate change, PNAS 2012

Phylogeny of bear lineage with mitochondrial DNA and Bayesian maximum clade credibility model

The blue box contains polar individuals coming from Svalbard and Alaska and an ancient sample 130ky to 110 ky old, the yellow box ABC individuals and the pink box other brown bear individuals. The outgroup is made of black bears individuals.

Nevertheless, both fossil data, as it can be incomplete, and mitochondrial data, as it sensitive to hybridization, are not sufficient to confirm this hypothesis. Thus the two publishing groups led in parallel projects aiming to collect nuclear data and test its agreement with mitochondrial data.

Hailer et al., in their work Nuclear Genomic Sequences Reveal that Polar Bears Are an Old and Distinct Bear Lineage published in Science, sequenced 9116 nucleotides from 14 independent introns in 45 individuals of black, brown and polar bears. Introns were sequenced to provide more variation between individuals: given the low amount of time since the divergence of the last common ancestor of bears (estimated between 559 to 1 429 ky ago in their study), choosing exons, whose evolution being more likely bounded by selection, would have yielded less information.

Using this data and various phylogenetic reconstructions (bayesian multilocus coalescent approach, bayesian inference for the concatenated data and neighbour-joining of the differentiation estimates between species) that all led to the same conclusion, they recovered the three species of bears as being monophyletic and observed in the species tree the polar bear clade being sister to the brown bear clade. They estimated the divergence time of the two species around 603 ky ago (338 to 934 ky being the 99% highest credibility range) and clearly revealed a discrepancy with the mitochondrial data.

The authors resolved this incongruence by stating that the most probable scenario was a divergence between polar and brown species 600 ky ago and an hybridization event between 111 to 166 ky ago between polar bears and ABC bears leading to the complete replacement of the former mtDNA by the latter. The opposite phenomenon (several and severe introgression events of polar bears mtDNA into brown bears leading to all extant mtDNA being of polar origin) is judged very unlikely by the authors given the extended range of distribution of the brown bear. The lack of finding of older fossil from polar bears was explained by their constantly changing living environment.

Despite the recent hybridization event, Hailer et al. found very few common nuclear haplotypes between polar and brown bears: out of the 35 polar and 79 brown haplotypes, only 6 of them were shared across both species. Nevertheless, we must bear in mind that given the relatively low amount of nuclear data analysed, those findings might not reflect the entire picture of polar and brown bears nuclear DNA ancestry.

In Polar and brown bear genomes reveal ancient admixture and demographics footprints of past climate change, published in PNAS by Miller et al., a genome-wide sequencing project was adopted to unravel the same problem. In this extensive study, the authors assembled a reference genome of a polar bear individual, deeply sequenced the genome of two ABC, one black and one non-ABC brown bear (GRZ). Finally, they produced low coverage data from 23 other polar bear individuals, one of them being an ancient specimen 110 to 130 ky old found in Svalbard.

Having aligned all reads from every samples to the polar bear genome reference, they identified 12 millions of what they called "SNPs" (even though they are dealing with three different species) and constructed the following phylogeny (Fig. 2).

Fig. 2: Miller et al., Polar and brown bear genomes reveal ancient admixture and demographic footprints of past climate change, PNAS 2012

Phylogeny based on the matrix of distances of the 12 millions SNP and using a neighbour-joining algorithm (probably given the amount of data and computational time needed with more sophisticated algorithms)

We observe that, as in the previous paper, the nuclear data is not in agreement with the mitochondrial data. A scenario where polar bears emerged as a sister species of the brown species and later experienced a massive and unique event of mtDNA introgression from ABC bears (as the polar bear individuals form only one group in Fig. 1) is again strongly favoured. Regarding the ancient polar bear specimen, both trees inform us that it dates after the mtDNA introgression event and that the modern individuals living in Svalbard are actually more closely related to the modern individuals in Alaska than to the ancient one.

Though up to this point both articles seem consistent, following findings radically differ with the previous study. Indeed, Miller et al., used a coalescence hidden Markov model for four of their deeply-covered genomes (one ABC, one polar bear, one brown bear, one black bear) to assess the history of the lineage. They estimated both the splits of polar bears with brown bears and the common ancestor of those two species with black bears to have occurred around 4 to 5 My ago, as shown in Fig. 3.


Fig. 3: Miller et al., Polar and brown bear genomes reveal ancient admixture and demographic footprints of past climate change, PNAS 2012

Reconstructed evolutionnary history of polar, brown and black bears

The black solid line represent the specie tree and the brown dashed lines the mtDNA tree

The X represents the introgression event, the shortened branch of the specie tree the disappearance of the ancient Svalbard lineage

It is however true that Hailer et al. reported on their article (that pre-dates the PNAS one) that other studies hint that the 600 ky-value is an underestimate of the splitting time of the two lineages under consideration, without it weakening their own conclusion.

Nevertheless, other discrepancies arise : Hailer et al. stated that no evidence of on going gene flow was found between polar bears and brown bears, whereas the coalescent model used by Miller et al. yielded that the time when this gene flow stopped was not significantly different from zero. Following the Science article, a comment arose relating two very recent cases of documented hybridization of polar/brown bears in the wild, among them a second generation hybrid. Interestingly, both crosses involved a polar bear female with a brown bear male: thus no cross leading to the introgression of brown bear mtDNA onto polar bear populations has yet been described.

Besides, where Hailer et al. found relatively few shared nuclear data between polar and brown bears, a PCA analysis of the SNPs identified in the ABC, non-ABC and polar bear genomes yielded that 5.5% of one of the ABC genome and 9.4% of the other one are related to the polar bear genome (Fig. 4).


Fig. 4: Miller et al., Polar and brown bear genomes reveal ancient admixture and demographic footprints of past climate change, PNAS 2012

PCA plot of SNP data for ABC1 & 2, polar and non-ABC brown bear (GRZ)

Following this PCA analysis, it is interesting to focus more precisely on the differentiation of populations of polar and brown bears, as the ABC and GRZ seem pretty much apart on the second component axis. Thus Miller et al. arbitrarily chose a subset of 100 SNPs identified from the genomes of all polar bear individuals and resequenced them for 118 individuals (58 polar bears, 9 ABC bears, 51 non-ABC brown bears). The PCA analysis yielded the following plot (Fig. 5).

Fig. 5: Miller et al., Polar and brown bear genomes reveal ancient admixture and demographic footprints of past climate change, PNAS 2012

On the one hand, ABC and brown bears cluster together even if we can still discriminate them into two groups. On the other hand, polar bear populations seem much more genetically heterogenous than their sister species counterparts. However one must always remain careful when drawing conclusion on such a low amount of data (100 SNPs). Focusing on the polar populations, the authors performed a structured analysis upon this data (Fig. 6).


Fig. 5: Miller et al., Polar and brown bear genomes reveal ancient admixture and demographic footprints of past climate change, PNAS 2012

Structure analysis of 58 polar bear individuals grouped into 4 population

The number of genetic population was set to 3

Here again lies a very striking difference between the two papers. Whereas Miller et al. clearly identified genetic structuring between the populations of polar bears, Hailer et al. used the same type of analysis upon the nuclear variation of their 45 individuals and it led them to conclude that the polar bears were much more genetically homogeneous than the brown bears.

Given the respective data set of both papers, only Miller et al. were able to address the point of adaptation to extreme environment. To do so, they aligned their deeply sequenced genome to the dog genome, choice resulting from a compromise between evolutionary distance and quality of the annotation (as the panda genome has been fully sequenced but being of less good quality). Having thus preserved sinteny accross the bear genomes, they were able to carry admixture analysis for the two ABC genomes (Fig. 6).

Fig. 6: Miller et al., Polar and brown bear genomes reveal ancient admixture and demographic footprints of past climate change, PNAS 2012

Admixture map of the ABC 1 & 2 diploid genomes region homologous to dog chromosome 11

Blue: polar bear origin, red: brown bear origin

In this particular example, based on the annotation of the dog genome, the authors focus on a gene (ALDH7A1) involved in salt resistance. It appears that copies of this gene in the two ABC bears come from the polar bear. As ABC bears live in a marine environment, the idea hinted behind this plot is that during the hybridization event between polar bear and ABC bears, polar bear (being already adapted to salty environment) copies of this gene introgressed into the ABC population and were subsequently selected for, thus appearing in modern ABC individuals.

Then, using Fst values, they were able to identify a few other genes that might have been selected for during the evolution of polar bears, such as DAG1 (involved in the muscular dystrophy) or BTN1A1 (involved in milk producing).

I think that to address the subject of adaptation in polar bear, a study of positive selection in protein-coding gene is lacking. As authors already conducted transcriptome sequencing of polar and brown bears, annotating gene in their genome, selecting orthologous genes together with other copies from completely sequenced genomes, as dog, panda and other mammals, and then using a model to test for positive selection such as implemented in PAML would be an efficient way to identify genes of interest in the polar (or ABC) bears. Nevertheless, I am very well aware of the tremendous amount of work already performed in this PNAS paper.

Regarding the evolution of the population size in bears, Miller et al. used a pairwise sequentially markovian coalescent model (that uses the length of homozygoteous regions of a diploid genome) to reconstruct the effective population size (number of individual in a perfectly panmictic population leading to the same genetic diversity as our observed population) from the four bear genomes (Fig. 7).

Fig. 6: Miller et al., Polar and brown bear genomes reveal ancient admixture and demographic footprints of past climate change, PNAS 2012

We observe the very closely related trends of both brown bear genomes and the continuous decline of non polar bears during the Early Pleistocene cooling. Conversly, the population of polar bears increased during this period but seemed very sensitive to the following warming period. Two points were raised when discussing this graph:

The bump in the polar bear curve signified as the "Post Eemian increase" was not significant when looking at the 95% interval range in the supplementary material
Knowing from the previous part of the article the extended hybridization between ABC and polar bears, would not the diversity introduced during those event affect the effective population size reconstruction ?

Putting those two papers in parallel allowed us to realize the difficulties of putting in agreement data from various origin, as in this case nuclear, mitochondrial, palaeontological and ecological. The amount of data needed to reconstruct the whole evolutionary history of such a complicated case becomes striking in the light of the work already performed here.

Hailer F, Kutschera VE, Hallström BM, Klassert D, Fain SR, Leonard JA, Arnason U, & Janke A (2012). Nuclear genomic sequences reveal that polar bears are an old and distinct bear lineage. Science (New York, N.Y.), 336 (6079), 344-347 PMID: 22517859

Miller W, Schuster SC, Welch AJ, Ratan A, Bedoya-Reina OC, Zhao F, Kim HL, Burhans RC, Drautz DI, Wittekindt NE, Tomsho LP, Ibarra-Laclette E, Herrera-Estrella L, Peacock E, Farley S, Sage GK, Rode K, Obbard M, Montiel R, Bachmann L, Ingólfsson O, Aars J, Mailund T, Wiig O, Talbot SL, & Lindqvist C (2012). Polar and brown bear genomes reveal ancient admixture and demographic footprints of past climate change. Proceedings of the National Academy of Sciences of the United States of America, 109 (36) PMID: 22826254

Friday, December 16, 2011

Hard selective sweeps do not seem to be the rule in human evolution.

by Ricardo Kanitz, based on the paper by Hernandez et al. published in Science (2011).

One of the main topics in evolution is – as it has always been – human evolution. Many new methods are applied first to humans; other methods, which are not applied there, often come to humans at some point anyway. This is particularly true in the field of genomics and it is no surprise since we are talking about our own species' evolution. The study commented here addresses an interesting general question in the subject. How selection shaped (if at all) our genomes?

More specifically, Hernandez and colleagues are interested in the classic signature of selection in genomes, the “selective sweep”. This so-called sweep is simply the reduction of measured diversity in the (genomic) surroundings of a positively selected mutation. This is observed when (1^st) a new beneficial mutation appears, (2^nd) it rapidly becomes the most common variant in a population and, (3^rd) because genomic positions are not physically independent, nearby positions also become more frequent. As we move further away from such positively selected position, we observe a decay of such pattern due to recombination (see cartoon below).

Based on functional groundings, the authors looked at different parts of the genome. They predicted that non-synonymous mutations (those which change the amino acid in the resulting protein) should show stronger signals of these sweeps when compared to the synonymous mutations. As shown in their Figure 2 (below here), there is no difference whatsoever.

However, they do see a decrease in diversity around all these positions, which is not observed in non-coding ones (see the gray area in their Figure S5A below).

To explore this discrepancy, the authors took advantage of simulations. As seen in Figure 3A below, they simulated a neutral (i.e. control) scenario and compared it to different selective scenarios accounting for varying proportions of human specific amino acid fixations (α = 10%, 15% and 25%) as favored with different selection coefficients (s = 1% or 0.1%). In such conditions, there should be power to detect selection. Based on the fact that they do not detect it, the authors claim that selection was rather rare (with α < 10% and s < 0.1%). Here, I must say that I found these numbers rather high and not at all conservative.

As it follows, they proposed a scenario of background purifying selection to explain the observed pattern. In Figure 3B above, they showed the fit of simulations with background selection (purple, green and orange) with the observations (dark blue, light blue and red). Such a fit appears to be very good and they conclude that the pattern they observed is better explained by purifying selection (a.k.a. strict neutrality) than by recurrent positive selection.

Finally, given (1) the fact that the observations did not fit the predictions of their (rather extreme) selection model, and (2) that a neutral model was able to explain the observations, the general conclusion is that classic selective sweeps resulting from strong positive selection were quite rare in the recent human evolution.

Although it would be interesting to see how the results would look like with lower (and more realistic) values for α and s, this study brings about the interesting discussion of the modus operandi of human adaptation. Classical examples based on phenotypes show that humans underwent recurrent adaptations when it comes to diet, immune response and skin pigmentation. The molecular mechanisms underlying these, however, might not be as simple as the “Classic Selective Sweeps”. Complex genetic architectures linking small effect polygenic variants, for example, may lead to soft sweeps; which do not leave the same sort of signature and can easily be missed in the background noise created by the potentially overwhelming neutral evolution. Therefore, there are still many unknown features related to recent human evolution – especially concerning non-neutral evolution – and the growing availability of data coupled with better analytical methods may bring new and possibly surprising results in the coming years of scientific investigation.

Wednesday, November 30, 2011

Classic Selective Sweeps Were Rare in Recent Human Evolution

With the rise of genomics and the availability of whole genome sequences, geneticists hope to be able to understand the recent adaptations humans underwent. Classic selective sweeps, where a beneficial allele arises in a population and subsequently goes to fixation, leave a specific pattern. Indeed, all variation is erased as the selected allele invades the population, and the neighboring neutral variation is also partially swept, with an intensity depending on the linkage with the selected region.

An example of classic selective sweep pattern. As the distance from the selected nucleotide increases, diversity increases. Fig. 2 from Hernandez et al. 2011.

The selective sweep pattern was used to find evidence for recent adaptation in humans. Many candidate genes for recent adaptation in humans were found. Nevertheless, the preeminence of classic selective sweeps compared with other modes of adaptation (like background selection or recurrent a.k.a. "soft" sweeps) is still unknown.

In this paper, the authors claim that classic selective sweeps are in fact a rare event in human recent evolution. They argue that the overall pattern found in genome scan studies can be explained with only nearly neutral mechanisms (neutral evolution plus some purifying selection), without any positive selection going on. This casts a doubt on our ability to detect regions under selection from molecular data with currently available techniques.

Their evidence is based on polymorphism data from 179 human genomes from the 1000 genome project (see Durbin et al. 2010). The authors identified single nucleotide polymorphism. They pooled together all exons in order to see the overall sweep pattern around each substitution. The first blow to the preeminence of classic selective sweeps comes from the fact that synonymous and non-synonymous sites show the exact same sweep pattern. We would expect that non-synonymous sites, as they should be the targets of adaptation, show a stronger sweep pattern. Another concern comes from the comparison of genetic data with the expectation under neutral evolution. They show (see fig. 3) that if classic selective sweeps are frequent (more than 10% of human specific substitutions), we have the statistical power to detect a difference with a purely neutral evolution scenario. Nevertheless, we do not observe any difference between the genomic data and the neutral simulations.

Comparison of simulations under a neutral model with a model with selection, and the actual human genomes data. What is interesting in panel A is that the power is strong for all fractions of the genome under selection the authors tested (alpha parameter). Therefore the authors claim that if classic selective sweeps are frequent in the population, we should be able to detect a significant departure from neutrality. Panel B completes the argument as we can see that all curves (neutral model and human genome data) are merged. Considering that we should have the power to detect a departure from neutrality, the authors claim that the neutral scenario cannot be rejected. Fig. 3 from Hernandez et al. 2011.

They conclude that classic selective sweeps should not have been the major mode of adaptation in recent human evolution.

I personally was not convinced by the relevance of using a mean pattern, over all coding regions, to attest that classic sweeps were rare in human evolution. Indeed, most coding regions have not experienced a selective sweep in the past, and thus the mean pattern should indeed not differ from a neutral or background selection model. Nevertheless, the authors anticipated this argument, as they run simulations where only a fraction of the genome is under positive selection. And as I wrote above, they show that we should be able to discriminate between selection and background mutation, even if the proportion of loci under selection are as low as 10% of human specific substitutions.

We raised during our discussion another concern, regarding the parameter range covered in their simulations. Indeed, the authors tested the power to distinguish selection and neutrality with several fractions of the genome under positive selection, but did not test a wide range of selection coefficient. A selection coefficient of 0.01 already seems very large, and the question remains to see if with weaker selection, we do expect to see a difference in the mean pattern of diversity over all exon SNPs.

In conclusion, I believe that the authors showed that so far we can only detect classic AND very strong selective sweeps from molecular data. In my opinion, this means that we can rarely detect classic selective sweeps. The question remains whether classic but weaker selective sweeps were rare in recent human evolution.

Hernandez, R., Kelley, J., Elyashiv, E., Melton, S., Auton, A., McVean, G., , ., Sella, G., & Przeworski, M. (2011). Classic Selective Sweeps Were Rare in Recent Human Evolution Science, 331 (6019), 920-924 DOI: 10.1126/science.1198878

Monday, November 7, 2011

RAD tagging adaptation

The threespine stickleback, Gasterosteus aculeatus, is a small fish that inhabits marine, estuarine and freshwater habitats in the holarctic. It has been previously inferred that in many regions, freshwater populations derived from oceanic ancestors. As soon as the freshwater populations are in different drainage systems, they can be considered as independent of each other. Those natural replicates are one of the reasons why sticklebacks are a model system to study adaptive evolution.

Sticklebacks adapt to freshwater habitats in a recurrent manner by modifying several key phenotypic traits. Many studies focused on identifying those traits and measuring their heritability or fitness properties. At the phenotypic level, there is a striking parallelism between derived freshwater population, but what is unclear is how much this parallelism is underlined by genome-wide patterns of parallel evolution.

That is the main question that Hohenlohe et al. tackled in their 2010 paper entitled "Population genomics of parallel adaptation in threespine stickleback using RAD Tags". They compared the genomes of fish originating from three lakes and two coastal saltwater habitats located along Alaska's southern coast. The three lakes were chosen in different drainage systems to have three independent instances of adaptation to freshwater (and maybe to have an excuse to hike from one sampling point to the other?).

The approach they developed (RAD tags) allows to detect single-nucleotide polymorphism (SNP) across the whole genome. The data processing analysis is nicely illustrated here. Such method produces an enormous amount of results. There is so much data, that any dubious point can be discarded prior to the final analysis to keep only the SNP that have the highest probability of actually representing existing polymorphism in the populations.

The results first confirm the classical hypothesis of a large oceanic population giving rise to divergent freshwater population. They also found many genomic regions showing signatures of balancing and divergent selection across all three freshwater populations. This suggests that phenotypic evolution occurs through parallel genetic evolution at the genome scale. Interestingly, they could, using the stickleback annotated genome, identify candidates genes that are linked with phenotypic changes.

While some parts of the methods lack transparency, the results they get are highly convincing. The fact that they were able to show parallelism at the genome level and then identify candidate loci that are important in the adaptive process is really interesting. This because it may motivate many in-depth studies on specific genes or pathways that have been shown to be related to adaptation. Regarding the paper, it took some time and attention to understand clearly the figures (mostly 6, 7 and 8). They hold tons of results and are not so straightforward to grasp quickly. In conclusion, the correlative patterns outlined by this research are striking, but call for experiments designed to test specific hypothesis on particular genomic regions.

Hohenlohe, P., Bassham, S., Etter, P., Stiffler, N., Johnson, E., & Cresko, W. (2010). Population Genomics of Parallel Adaptation in Threespine Stickleback using Sequenced RAD Tags PLoS Genetics, 6 (2) DOI: 10.1371/journal.pgen.1000862