-
Structure: The article is divided up in three main
parts: i) description of genomic variants, ii) examination of functional
consequences of allele-specific variation on transcript abundance, and iii)
investigation of the molecular nature of functional variants and their position
relative to genes.
-
Experimental
design: The 17 most widely
used mouse strains (liver tissue) were selected for whole genome sequencing on
the illumina GAIIx sequencing platform. To estimate error rates and evaluate
the method a NOD/ShiLtJ BAC clone library was constructed. 107 BACs from seven
loci on chromosomes 1, 6, 11 and 17 from this library were shotgun cloned and
capillary sequenced. SNPs, structural variants (inversions, balanced
translocations, CNVs), and transposable elements were identified based on a
reference genome (the one that had already been sequenced before: C57BL/6J).
Bayesian concordance analysis was used to construct gene trees across the
genomes of M. m. musculus, M. m.
domesticus and M. m. castaneus. M.
spretus was used as the outgroup. Allele specific expression was analyzed
in liver, thymus, spleen, lung, hippocampus and heart using RNA sequencing.
Each lane of transcriptome sequence was re-genotyped prior to downstream
analysis. For this transcriptome analysis a F1 hybrid of two sequenced strains
was used. To identify sequence variants that underlie quantitative traits and
investigate their common molecular features and their position relative to
coding genes the complete genome sequence of eight inbred strains (founder
haplotypes of lab strains) were used. QTLs used were chosen based on previous
literature (mainly [1]). For
more details on methods consult the supplementary information of the article.
-
Main results: The whole genome sequences of 17 inbred
laboratory mouse strains are reported. Ten times more variants than previously
known were found. The phylogenetic history of laboratory mice strains could not
be completely resolved. 12% of allele-specific transcripts showed a significant
tissue-specific expression pattern. The molecular nature of functional
variants, as well as their position relative to coding genes, varies according
to the effect size of the quantitative trait locus (QTL) and seems to have a
significant effect on the function.
Oddities of the article
-
22 authors
-
3 really big
guys in the end
-
3 guys
sharing first author
-
very
condensed
-
article
represents the integrative nature of current science
Discussion
among tutorial participants
-
General discussion
about hiring process
a)
Generally
it is good to be several times an author in the middle of an article if the PI,
as well as the journal itself have a good reputation.
b)
For first
authors mostly the reputation of the journal counts.
--> The hiring process is different for different
positions. For a technician it is good if a) applies and for a PhD position or
a post-doc position it is good if b) applies. For further steps in a career the
criteria are more stringent.
-
Experimental
design
Would
you repeat the experimental design of this study?
Yes)
The results are influential for all kinds of inbred mammal studies – even
humans, a lot of new information is produced and the study has a high impact.
No)
It might be considered a waste of money to spend on 17 lab strains if only a
subset is used in most analyses. Most of the lab strains look more or less the
same. Less of lab strains and more wild types could have been chosen.
--> Afterwards we always know more than beforehand! Conclusively, for the lab strains
behavior, morphology and physiology are better studied than of any species.
This information can be used to explain small genetic differences. There is
also a social constraint: You want to include as many people as possible to
make it more interesting for the whole mouse community. This contributes to the
Collaborative Cross, a community resource for the genetic analysis of complex
traits. The Complex Trait Consortium is to promote the development of resources
that can be used to understand, treat and ultimately prevent pervasive human
diseases [2]
-
Figure 1…
…caused
problems to understand. What is the reference genome? (C57BL/6J) What does
„inaccessible“ mean? (mostly LINEs, chr 17, chr X). There is more variation in
outbred strains (more color, longer distance). A lot of people had problems
with this figure. Most probably because figure 1a contains a lot of information
at different levels and it takes the reader a long time and a good color
printer to understand what they want to show. On the other hand figure 1b is
rather simple. From the left to the right the blue circle increases relative to
the red one. Does that represent how variation evolves in the genome? The SNPs
show a small blue circle. This could be explained by bottlenecking or by
selection acting on SNPs. Transposable elements show a large blue circle. Are
lab strains evolving faster in this class? Unfortunately this part of the
figure is not touched in the discussion section of the article.
-
The
generation and sequencing of NOD/ShiLtJ bacterial artificial chromosomes was
appreciated by most of the students. It is a nice way to estimate error rates
of the new sequencing techniques and it evaluates and confirms the method used.
Public databases contain lots of false negatives per se right now because not
many individuals/strains/species were fully sequenced. It is compulsory to show
the consistency of a new method.
-
The
estimation of the amount of structural variants in the laboratory mouse strains
caused some doubts among the students. Apparently, 48.4Mb of sequence of each
strain falls into structurally variant regions of the genome. These structural
variants cluster with SNPs in each strain. That means that 1.6% of the mouse
genome are structural variants. Is this amount common? We do not know. What we
know is that SNPs together with structural variants are of relatively old
origin in these genomes. They seem to occur together. The authors report that
many structural variants could not be mapped, so their estimation must be
biased.
-
Some
students had a problem with understanding figure 3. A simpler way to represent
the same data would be histograms of every tissue. I assume that the authors
like the “ggplot” package and Hadley Wickham’s way of presenting data in one
plot. To understand figure 3 is it necessary to understand that allelic bias is
defined as the proportion of expression attributable to a particular parental
strain, ranging from 0 to 1, with the null hypothesis of 0.5 in the absence of
any bias.
-
The
phylogenetic analysis revealed that all trees have similar probabilities.
--> compare with human-chimp-gorilla relationship [3]: The human and gorilla relationship can appear
closer than the one of human and chimp because of incomplete lineage sorting.
The gene tree does not equal the species tree, whereas the percentage of shared
autosomes equals about 10%. Here the percentage of shared autosomes is around
5% among mouse lab strains. Here we are comparing strains and not species, so
the differences are smaller and incomplete sorting might be more common due to
recent shared ancestry [4].
-
QTL:
I
find it interesting that wherever you go, students always find QTL analyses
difficult to understand. The same here. The idea was to use the whole genome in
an attempt to identify sequence variants that underlie quantitative traits. It
was asked if functional variants have common molecular features and if they are
more common within genes or outside them, as well as if they consist of
structural variants, indels or SNPs. As candidate loci 843 QTLs were selected,
as identified in the literature [1,5]. Two competing models were used to answer
their questions: Either the haplotype model where eight haplotypes are used
(eight sequenced strains of the founder haplotypes of all lab strains) or the
SNP allele model where two alleles at every locus were imputed. In 85% of the
cases there was at least one variant where the fit of the allelic model was
better than the haplotype model. It was concluded that at these QTLs, there is
either a single functional variant or a series of functional variants of the
same haplotype. We questioned whether using the two competing models is
sufficient for a thorough analysis to find functional variants. (Basic help for QTL analysis with R can be
found here: [6].)
Table 2
and figure 4 show the physical part of the QTL analysis. Interestingly, the table
and figure in question are almost redundant. On top of figure 4 the importance
of the position of a significant functional variant is shown and at the bottom
the molecular nature of quantitative trait variants that influence the effect
size of the QTL are represented. Position and molecular type are important. At
this moment it became clear to us that we still do not really know how genes
work. Five years ago it was the common believe that mostly flanking regions of
genes are important for their regulation. It seems like we are just advancing
in the dark and feeling the tail of an elephant. A small change in a regular
sequence can lead to a big change in a gene. It seems like trans regions are
much less important than cis-regulatory elements. We missed some kind of
categorization of QTLs in the collection of QTLs in mice. It might be that
there are bigger classes of QTLs that fall into different positions or
molecular types. Finally we also lacked a combined analysis where positions and
molecular types are analyzed together.
1. Valdar W, Solberg LC, Gauguier D, Burnett S,
Klenerman P, et al. (2006) Genome-wide genetic association of complex traits in
heterogeneous stock mice. Nat Genet 38: 879-887.
2. Churchill GA, Airey DC,
Allayee H, Angel JM, Attie AD, et al. (2004) The Collaborative Cross, a
community resource for the genetic analysis of complex traits. Nat Genet 36:
1133-1137.
3. Scally A, Dutheil JY,
Hillier LW, Jordan GE, Goodhead I, et al. (2012) Insights into hominid
evolution from the gorilla genome sequence. Nature 483: 169-175.
4. Ane C, Larget B, Baum DA,
Smith SD, Rokas A (2007) Bayesian estimation of concordance among gene trees.
Mol Biol Evol 24: 412-426.
5. Yalcin B, Flint J, Mott R
(2005) Using progenitor strain information to identify quantitative trait
nucleotides in outbred mice. Genetics 171: 673-681.
6. Zhou Q (2010) A Guide to
QTL Mapping with R/qtl. Journal of Statistical Software 32: 396.