  • Molecular Evolution of the TET Gene Family in Mammals.

    Akahori, Hiromichi   Guindon, Stephane   Yoshizaki, Sumio   Muto, Yoshinori  

    Ten-eleven translocation (TET) proteins, a family of Fe(2+)- and 2-oxoglutarate-dependent dioxygenases, are involved in DNA demethylation. They also help regulate various cellular functions. Three TET paralogs have been identified (TET1, TET2, and TET3) in humans. This study focuses on the evolution of mammalian TET genes. Distinct patterns in TET1 and TET2 vs. TET3 were revealed by codon-based tests of positive selection. Results indicate that TET1 and TET2 genes have experienced positive selection more frequently than TET3 gene, and that the majority of codon sites evolved under strong negative selection. These findings imply that the selective pressure on TET3 may have been relaxed in several lineages during the course of evolution. Our analysis of convergent amino acid substitutions also supports the different evolutionary dynamics among TET gene subfamily members. All of the five amino acid sites that are inferred to have evolved under positive selection in the catalytic domain of TET2 are localized at the protein's outer surface. The adaptive changes of these positively selected amino acid sites could be associated with dynamic interactions between other TET-interacting proteins, and positive selection thus appears to shift the regulatory scheme of TET enzyme function.=20
  • Modelling competition and dispersal in a statistical phylogeographic framework.

    Ranjard, Louis   Welch, David   Paturel, Marie   Guindon, Stephane  

    Competition between organisms influences the processes governing the colonization of new habitats. As a consequence, species or populations arriving first at a suitable location may prevent secondary colonization. Although adaptation to environmental variables (e.g., temperature, altitude, etc.) is essential, the presence or absence of certain species at a particular location often depends on whether or not competing species co-occur. For example, competition is thought to play an important role in structuring mammalian communities assembly. It can also explain spatial patterns of low genetic diversity following rapid colonization events or the "progression rule" displayed by phylogenies of species found on archipelagos. Despite the potential of competition to maintain populations in isolation, past quantitative analyses have largely ignored it because of the difficulty in designing adequate methods for assessing its impact. We present here a new model that integrates competition and dispersal into a Bayesian phylogeographic framework. Extensive simulations and analysis of real data show that our approach clearly outperforms the traditional Mantel test for detecting correlation between genetic and geographic distances. But most importantly, we demonstrate that competition can be detected with high sensitivity and specificity from the phylogenetic analysis of genetic variation in space. =C2=A9 The Author(s) 2014. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email:
  • Estimating Maximum Likelihood Phylogenies with PhyML

    Guindon, Stephane   Delsuc, Frederic   Dufayard, Jean-Francois   Gascuel, Olivier  

    Our understanding of the origins, the functions and/or the structures of biological sequences strongly depends on our ability to decipher the mechanisms of molecular evolution. These complex processes can be described through the comparison of homologous sequences in a phylogenetic framework. Moreover, phylogenetic inference provides Sound statistical tools to exhibit the main features of molecular evolution from the analysis of actual sequences. This chapter focuses on phylogenetic tree estimation under the maximum likelihood (ML) principle. Phylogenies inferred under this probabilistic criterion arc usually reliable and important biological hypotheses can be tested through the comparison of different models. Estimating ML phylogenics is computationally demanding, and careful examination of the results is warranted. This chapter focuses on PhyM L, a software that implements recent ML phylogenetic methods and algorithms. We illustrate the strengths and pitfalls of this program through the analysis of a real data set. PhyML v3.0 is available from
  • Viral quasi-species evolution during hepatitis Be antigen seroconversion.

    Lim, Seng Gee   Cheng, Yan   Guindon, Stephane   Seet, Bee Leng   Lee, Lay Yong   Hu, Peizhen   Wasser, Shanthi   Peter, Frank Josef   Tan, Theresa   Goode, Matthew   Rodrigo, Allen Gerard  

    BACKGROUND & AIMS: Although viral quasi-species evolution may be related to pathogenesis of disease, little is known about this in hepatitis B virus (HBV); consequently, we aimed to evaluate the evolution of HBV quasi-species in patients with well-characterized clinical phenotypes of chronic hepatitis B.METHODS: Four cohorts of well-defined clinical phenotypes of chronic hepatitis B, hepatitis Be antigen (HBeAg) seroconverters (spontaneous seroconverters and interferon-induced seroconverters) and nonseroconverters (controls and interferon nonresponders) were followed during 60 months on average. Serum from 4 to 5 time points was used for nested polymerase chain reaction, cloning, and sequencing of the precore/core gene (20 clones/sample). Only patients with genotype B were used. Sequences were aligned using Clustal X, then serial-sample unweighted pair grouping method with arithmetic means phylogenetic trees were constructed using Pebble 1.0 after which maximum likelihood estimates of pairwise distances under a GTR + I + G model was assessed. Viral diversity and substitution rates were then estimated.RESULTS: Analysis of 3386 sequences showed that HBeAg seroconverters had 2.4-fold higher preseroconversion viral sequence diversity (P = .0183), and 10-fold higher substitution rate (P < .0001) than did nonseroconverters, who had persistently low viral diversity (3.6 x 10(-3) substitutions/site) and substitution rate (2.2 x 10(-5) substitutions x site(-1) x month(-1)). After seroconversion, there was a striking increase in viral diversity. Most seroconverters had viral variants that showed evidence of positive selection, which was seen mainly after seroconversion.CONCLUSIONS: The high viral diversity before a reduction in HBV DNA and before HBeAg seroconversion could either be related to occurrence of stochastic mutations that lead to a break in immune tolerance or to increased immune reactivity that drives escape mutations.
  • Accounting for Calibration Uncertainty:Bayesian Molecular Dating as a "Doubly Intractable" Problem

    Guindon, Stephane  

    This study introduces a new Bayesian technique for molecular dating that explicitly accommodates for uncertainty in the phylogenetic position of calibrated nodes derived from the analysis of fossil data. The proposed approach thus defines an adequate framework for incorporating expert knowledge and/or prior information about the way fossils were collected in the inference of node ages. Although it belongs to the class of "node-dating" approaches, this method shares interesting properties with "tip-dating" techniques. Yet, it alleviates some of the computational and modeling difficulties that hamper tip-dating approaches. The influence of fossil data on the probabilistic distribution of trees is the crux of the matter considered here. More specifically, among all the phylogenies that a treemodel (e.g., the birth-death process) generates, only a fraction of them "agree" with the fossil data. Bayesian inference under the new model requires taking this fraction into account. However, evaluating this quantity is difficult in practice. A generic solution to this issue is presented here. The proposed approach relies on a recent statistical technique, the so-called exchange algorithm, dedicated to drawing samples from "doubly intractable" distributions. A small example illustrates the problem of interest and the impact of uncertainty in the placement of calibration constraints in the phylogeny given fossil data. An analysis of land plant sequences and multiple fossils further highlights the pertinence of the proposed approach.
  • The Influence of Rate Heterogeneity among Sites on the Time Dependence of Molecular Rates

    Soubrier, Julien   Steel, Mike   Lee, Michael S. Y.   Sarkissian, Clio Der   Guindon, Stephane   Ho, Simon Y. W.   Cooper, Alan  

    Molecular evolutionary rate estimates have been shown to depend on the time period over which they are estimated. Factors such as demographic processes, calibration errors, purifying selection, and the heterogeneity of substitution rates among sites (RHAS) are known to affect the accuracy with which rates of evolution are estimated. We use mathematical modeling and Bayesian analyses of simulated sequence alignments to explore how mutational hotspots can lead to time-dependent rate estimates. Mathematical modeling shows that underestimation of molecular rates over increasing time scales is inevitable when RHAS is ignored. Although a gamma distribution is commonly used to model RHAS, we show that when the actual RHAS deviates from a gamma-like distribution, rates can either be under- or overestimated in a time-dependent manner. Simulations performed under different scenarios of RHAS confirm the mathematical modeling and demonstrate the impacts of time-dependent rates on estimates of divergence times. Most notably, erroneous rate estimates can have narrow credibility intervals, leading to false confidence in biased estimates of rates, and node ages. Surprisingly, large errors in estimates of overall molecular rate do not necessarily generate large errors in divergence time estimates. Finally, we illustrate the correlation between time-dependent rate patterns and differential saturation between quickly and slowly evolving sites. Our results suggest that data partitioning or simple nonparametric mixture models of RHAS significantly improve the accuracy with which node ages and substitution rates can be estimated.
  • Identification of NF-kappaB responsive elements in follistatin related gene (FLRG) promoter

    Bartholin, Laurent   Guindon, Stephane   Martel, Sylvie   Corbo, Laura   Rimokh, Ruth  

    Follistatin related gene (FLRG) has been previously identified from a chromosomal translocation observed in a B-cell chronic lymphocytic leukemia (B-CLL). FLRG (alternative names: follistatin-related protein, FSRP/follistatin-like-3, FSTL3) is a secreted glycoprotein highly similar to follistatin. Like follistatin, FLRG is involved in the regulation of various biological effects through its binding to members of the transforming growth factor beta (TGF beta) superfamily such as activin A and myostatin. We have previously shown that TGF beta and activin A are potent inducers of,FLRG transcriptional activation through the Smad proteins. Using a biochemical approach, we investigated whether tumor necrosis factor alpha (TNF alpha) could regulate FLRG expression since TNFa plays a critical role in hematopoietic malignancies. We demonstrate that TNF alpha activates FLRG expression at the transcriptional level. This activation depends on a promoter region containing four 107-108 bp DNA repeats, which are evolutionary conserved in primates. These repeats carry a strong phylogenetic signal, which is not common among non-coding sequences. Each DNA repeat contains one TNF alpha responsive element (5 '-GGGAGAG/TTCC-3 ') able to bind nuclear factor kappaB (NF-kappa B) transcription factors. We also show that TGF beta through the Smad proteins, potentates the effect of TNF alpha on FLRG expression. This cooperation is unexpected since TGF beta and TNF alpha usually have opposite biological effects. In all, this work brings new insights in the understanding of FLRG regulation by cytokines and growth factors. It opens attractive perspectives of research that should allow us to better understand the role of FLRG during tumorigenesis. (C) 2007 Elsevier B.V. All rights reserved.
  • PartitionFinder: Combined Selection of Partitioning Schemes and Substitution Models for Phylogenetic Analyses

    Lanfear, Robert   Calcott, Brett   Ho, Simon Y. W.   Guindon, Stephane  

    In phylogenetic analyses of molecular sequence data, partitioning involves estimating independent models of molecular evolution for different sets of sites in a sequence alignment. Choosing an appropriate partitioning scheme is an important step in most analyses because it can affect the accuracy of phylogenetic reconstruction. Despite this, partitioning schemes are often chosen without explicit statistical justification. Here, we describe two new objective methods for the combined selection of best-fit partitioning schemes and nucleotide substitution models. These methods allow millions of partitioning schemes to be compared in realistic time frames and so permit the objective selection of partitioning schemes even for large multilocus DNA data sets. We demonstrate that these methods significantly outperform previous approaches, including both the ad hoc selection of partitioning schemes (e.g., partitioning by gene or codon position) and a recently proposed hierarchical clustering method. We have implemented these methods in an open-source program, PartitionFinder. This program allows users to select partitioning schemes and substitution models using a range of information-theoretic metrics (e.g., the Bayesian information criterion, akaike information criterion [AIC], and corrected AIC). We hope that PartitionFinder will encourage the objective selection of partitioning schemes and thus lead to improvements in phylogenetic analyses. PartitionFinder is written in Python and runs under Mac OSX 10.4 and above. The program, source code, and a detailed manual are freely available from
  • New Algorithms and Methods to Estimate Maximum-Likelihood Phylogenies: Assessing the Performance of PhyML 3.0

    Guindon, Stephane   Dufayard, Jean-Francois   Lefort, Vincent   Anisimova, Maria   Hordijk, Wim  

    PhyML is a phylogeny software based on the maximum-likelihood principle. Early PhyML versions used a fast algorithm performing nearest neighbor interchanges to improve a reasonable starting tree topology. Since the original publication (Guindon S., Gascuel O. 2003. A simple, fast and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. 52:696-704), PhyML has been widely used (> 2500 citations in ISI Web of Science) because of its simplicity and a fair compromise between accuracy and speed. In the meantime, research around PhyML has continued, and this article describes the new algorithms and methods implemented in the program. First, we introduce a new algorithm to search the tree space with user-defined intensity using subtree pruning and regrafting topological moves. The parsimony criterion is used here to filter out the least promising topology modifications with respect to the likelihood function. The analysis of a large collection of real nucleotide and amino acid data sets of various sizes demonstrates the good performance of this method. Second, we describe a new test to assess the support of the data for internal branches of a phylogeny. This approach extends the recently proposed approximate likelihood-ratio test and relies on a nonparametric, Shimodaira-Hasegawa-like procedure. A detailed analysis of real alignments sheds light on the links between this new approach and the more classical nonparametric bootstrap method. Overall, our tests show that the last version (3.0) of PhyML is fast, accurate, stable, and ready to use. A Web server and binary files are available from
  • SeaView Version 4: A Multiplatform Graphical User Interface for Sequence Alignment and Phylogenetic Tree Building

    Gouy, Manolo   Guindon, Stephane   Gascuel, Olivier  

    We present SeaView version 4, a multiplatform program designed to facilitate multiple alignment and phylogenetic tree building from molecular sequence data through the use of a graphical user interface. SeaView version 4 combines all the functions of the widely used programs SeaView (in its previous versions) and Phylo_win, and expands them by adding network access to sequence databases, alignment with arbitrary algorithm, maximum-likelihood tree building with PhyML, and display, printing, and copy-to-clipboard of rooted or unrooted, binary or multifurcating phylogenetic trees. In relation to the wide present offer of tools and algorithms for phylogenetic analyses, SeaView is especially useful for teaching and for occasional users of such software. SeaView is freely available at
  • Performance of standard and stochastic branch-site models for detecting positive selection among coding sequences.

    Lu, Ashley   Guindon, Stephane  

    The branch-site model is a widely popular approach that accommodates for the lineage- and the site-specific heterogeneity of natural selection regimes among coding sequences. This model relies on prior knowledge of the (foreground) lineage(s) evolving under positive selection at some sites. Unfortunately, such prior information is not always available in practice. A more recent technique (Guindon S, Rodrigo A, Dyer K, Huelsenbeck J. 2004. Modeling the site-specific variation of selection patterns along lineages. Proc Natl Acad Sci USA 101:12957-12962) alleviates this issue by explicitly modeling the variability of selection patterns using a stochastic process. However, the performance of this approach for deciding whether a set of homologous sequences evolved under positive selection at some point has not been assessed yet. This study compares the sensitivity and specificity of tests for positive selection derived from both the standard and the stochastic approaches using extensive simulations. We show that the two methods have low proportions of type I errors, that is, they tend to be conservative when testing the null hypothesis of no positive selection if sequences truly evolve under neutral or negative selection regimes. Also, the standard approach is more powerful than the stochastic one when the prior knowledge on foreground lineages is correct. When this prior is incorrect, however, the stochastic approach outperforms the standard model in a broad range of conditions. Additional comparisons also suggest that the stochastic branch-site method compares favorably with the recently proposed mixed-effects model of evolution of Murrell et al. (Murrell B, Wertheim JO, Moola S, Weighill T, Scheffler K, Pond SLK. 2012. Detecting individual sites subject to episodic diversifying selection. PLoS Genet. 8:e1002764). Altogether, our results show that the standard branch-site model is well suited to confirmatory analyses, whereas the stochastic approach should be preferred over the standard or the mixed-effects ones for exploratory studies. =20
  • Cumulative viral evolutionary changes in chronic hepatitis B virus infection precedes hepatitis B e antigen seroconversion

    Cheng, Yan   Guindon, Stephane   Rodrigo, Allen   Wee, Lin Ying   Inoue, Masafumi   Thompson, Alex J. V.   Locarnini, Stephen  

    ObjectiveTo examine viral evolutionary changes and their relationship to hepatitis B e antigen (HBeAg) seroconversion.DesignA matched case-control study of HBeAg seroconverters (n=3D8) and non-seroconverters (n=3D7) with adequate stored sera before seroconversion was performed. Nested PCR, cloning and sequencing of hepatitis B virus (HBV) precore/core gene was performed. Sequences were aligned using Clustal X2.0, followed by construction of phylogenetic trees using Pebble 1.0. Viral diversity, evolutionary rates and positive selection were then analysed.ResultsBaseline HBV quasispecies viral diversity was identical in seroconverters and non-seroconverters 10years before seroconversion but started to increase approximately 3years later. Concurrently, precore stop codon (PSC) mutations appeared. Some 2years later, HBV-DNA declined, together with a dramatic reduction in HBeAg titres. Just before HBeAg seroconversion, seroconverters had HBV-DNA levels 2log lower (p=3D0.008), HBeAg titres 310-fold smaller (p=3D0.02), PSC mutations >25% (p<0.001), viral evolution 8.1-fold higher (p=3D0.01) and viral diversity 2.9-fold higher (p<0.001), compared to non-seroconverters, with a 9.3-fold higher viral diversity than baseline (p=3D0.011). Phylogenetic trees in seroconverters showed clustering of separate time points and longer branch lengths than non-seroconverters (p=3D0.01). Positive selection was detected in five of eight seroconverters but none in non-seroconverters (p=3D0.026). There was significant negative correlation between viral diversity (r(s)=3D-0.60, p<0.001) and HBV-DNA or HBeAg (r(s)=3D-0.58, p=3D0.006) levels; and positive correlation with PSC mutations (r(s)=3D0.38, p=3D0.009). Over time, the significant positive correlation was viral diversity (r(s)=3D0.65, p<0.001), while negative correlation was HBV-DNA (r(s)=3D-0.627, p<0.001) and HBeAg levels (r(s)=3D-0.512, p=3D0.015).ConclusionsCumulative viral evolutionary changes that precede HBeAg seroconversion provide insights into this event that may have implications for therapy.
  • HIV-1 Full-Genome Phylogenetics of Generalized Epidemics in Sub-Saharan Africa:Impact of Missing Nucleotide Characters in Next-Generation Sequences

    Ratmann, Oliver   Wymant, Chris   Colijn, Caroline   Danaviah, Siva   Essex, Max   Frost, Simon   Gall, Astrid   Gaseitsiwe, Simani   Grabowski, Mary K.   Gray, Ronald   Guindon, Stephane   von Haeseler, Arndt   Kaleebu, Pontiano   Kendall, Michelle   Kozlov, Alexey   Manasa, Justen   Minh, Bui Quang   Moyo, Sikhulile   Novitsky, Vlad   Nsubuga, Rebecca   Pillay, Sureshnee   Quinn, Thomas C.   Serwadda, David   Ssemwanga, Deogratius   Stamatakis, Alexandros   Trifinopoulos, Jana   Wawer, Maria   Brown, Andy Leigh   de Oliveira, Tulio   Kellam, Paul   Pillay, Deenan   Fraser, Christophe  

    To characterize HIV-1 transmission dynamics in regions where the burden of HIV-1 is greatest, the Phylogenetics and Networks for Generalised HIV Epidemics in Africa consortium (PANGEA-HIV) is sequencing full-genome viral isolates from across sub-Saharan Africa. We report the first 3,985 PANGEA-HIV consensus sequences from four cohort sites (Rakai Community Cohort Study, n=3D2,833; MRC/UVRI Uganda, n=3D701; Mochudi Prevention Project, n=3D359; Africa Health Research Institute Resistance Cohort, n=3D92). Next-generation sequencing success rates varied: more than 80% of the viral genome from the gag to the nef genes could be determined for all sequences from South Africa, 75% of sequences from Mochudi, 60% of sequences from MRC/UVRI Uganda, and 22% of sequences from Rakai. Partial sequencing failure was primarily associated with low viral load, increased for amplicons closer to the 3 end of the genome, was not associated with subtype diversity except HIV-1 subtype D, and remained significantly associated with sampling location after controlling for other factors. We assessed the impact of the missing data patterns in PANGEA-HIV sequences on phylogeny reconstruction in simulations. We found a threshold in terms of taxon sampling below which the patchy distribution of missing characters in next-generation sequences (NGS) has an excess negative impact on the accuracy of HIV-1 phylogeny reconstruction, which is attributable to tree reconstruction artifacts that accumulate when branches in viral trees are long. The large number of PANGEA-HIV sequences provides unprecedented opportunities for evaluating HIV-1 transmission dynamics across sub-Saharan Africa and identifying prevention opportunities. Molecular epidemiological analyses of these data must proceed cautiously because sequence sampling remains below the identified threshold and a considerable negative impact of missing characters on phylogeny reconstruction is expected.
  • Evolution of Plant MADS Box Transcription Factors: Evidence for Shifts in Selection Associated with Early Angiosperm Diversification and Concerted Gene Duplications RID B-1784-2012

    Shan, Hongyan   Zahn, Laura   Guindon, Stephane   Wall, P. Kerr   Kong, Hongzhi   Ma, Hong   dePamphilis, Claude W.   Leebens-Mack, Jim  

    Phylogenomic analyses show that gene and genome duplication events have led to the diversification of transcription factor gene families throughout the evolutionary history of land plants and that gene duplications have played an important role in shaping regulatory networks influencing key phenotypic characters including floral development and flowering time. A molecular evolutionary investigation of the mode and tempo of selection acting on the angiosperm MADS box AP1/SQUA, AP3/PI, AG/AGL11, and SEP gene subfamilies revealed site-specific patterns of shifting evolutionary constraint throughout angiosperm history. Specific positions in the four canonical MADS box gene regions, especially K domains and C-terminal regions of all four of these MADS box gene subfamilies exhibited clade-specific shifts in selective constraint following concerted duplication events. Moreover, the frequency of site-specific shifts in constraint was correlated with gene duplications and early angiosperm diversification. We hypothesize that coevolution among interacting MADS box proteins may be responsible for simultaneous increases in the ratio of nonsynonymous to synonymous substitutions (d(N)/d(S) = omega) early in angiosperm history and following concerted duplication events.
