Finding and testing hybridization events in Fragaria species history

Overview

This protocol was developed and employed for studying phylogeny of di- and polyploid Fragaria species using multi-locus sequence data obtained using Hyb-Seq approach (Kamneva et al. In revision) but can be applied to any group of organisms. It was developed on Mac OS but will probably work just as well in any Unix based environment.

This tutorial is still work in progress. Please feel free to contact me if you have questions, suggestions, need help or something does not work.
Custom scripts supplied with the protocol are not guarantied to work. Here is the license statement [LICENSE].

Objectives

  • Use gene trees which can be multilabeled, to identify candidate hybrydization events using consensus approach and to test them in ILS-aware maximum likelihood framework using two major programs available to the community.
  • Use set of custom scripts along with PhyloNet program.
  • Sample data

    Please download testing_hybridizations_input.tar.gz archive, that contains:
  • testing_hybridizations_input.txt file - plain text file with 273 newick format gene trees from dataset 12 in the paper. The dataset includes a number of diploid species and one octaploid F. chiloensis. Diploid individuals are included as consensus sequences while octaploids are represented by haplotypes. Only gene trees for fragments with detectable phylogenetic signal were retained. Gene trees were processed to label tips with relevant species name making them multilabeled.
  • test_hybridization_N_phylonet.nex and test_hybridization_N_phylonet.out files (N in 1, 2 or 3) which will also be generated by this tutorial. These files are not going to serve as input to the tutorial but I knclude them for illustration purposes.
  • Programs

    Make sure that you have all of those programs installed on your computer, working and their location is added to the path. Please cite these programs if using in your research:
  • PhyloNet
  • ape R package
  • Also please download a tar.gz testing_hybridizations_scripts.tar.gz file and unpack it. The archive contains following custom scripts, these will be accessed through their location in this tutorial, but they can be copied into one of the path directoies, eliminating need for specifying their location:
  • R_consensus_summary.r
  • R_functions.R
  • R_phylonet.r
  • Each of the scripts contains option list and example usage in its header.

    Main steps of the protocol

  • Find candidate hybridizations using ILS-agnostic consensus network approach
  • Test this hybridization using ILS-aware approach in PhyloNet
  • Find candidate hybridizations using ILS-agnostic consensus network approach

  • Run R script to generate consensus network from a set of gene trees, in gene trees, only clades associated with bootstrap support of 5 and stronger will be used, and only clusters observed in at least 30% of trees will be included if they generate reticulation events:

    Rscript testing_hybridizations_scripts/R_consensus_summary.r \
    testing_hybridizations_input.txt \
    testing_hybridizations_scripts/R_functions.R \
    5 \
    0.30 \
    test_hybridizations_consensus_out

    This will generate file called test_hybridizations_consensus_out which contain a network in extended newick format which can be visualized using Dendroscope to see that one hybridization was detected in this case.
    example graphic
  • Test this hybridization using ILS-aware approach in PhyloNet

  • Run R script to generate parameter files for PhyloNet. Note that second argument represents four taxa that are included in the evolutionary scenarios to be tested, in particular first taxon is a hybrid, second and third are potential parental lineages and the fourth one is the ourgroup. Please see script header for description of the rest of the agruments:

    Rscript testing_hybridizations_scripts/R_phylonet.r \
    testing_hybridizations_input.txt \
    chiloensis,vesca,iinumae,Drymocallis \
    test_hybridization \
    12345 \
    commands_phylonet \
    10

    This will generate 3 files with _phylonet.nex suffix and test_hybridization prefix, parameter files for PhyloNet, and phylonet_commands file with commands for running PhyloNet.
    Parameter files are set up to calculate likelihood of 3 elternative scenarios of species evolution. One with hybridization and two others without:
    example graphic
  • Set path to PhyloNet, make sure that commands file has correct extensions and execute it to obtain PhyloNet results.

    export PATH_PHYLONET=~/Downloads/
    chmod 777 ./commands_phylonet
    ./commands_phylonet

    This will generate 3 files with _phylonet.out suffix and _test_hybridization prefix, output files from PhyloNet. I also include these files with the inputs for this tutorial.
    They can be used to compute AIC of the 3 different evolutionary scenarios and select the one which seens to fit data (set of gene trees) best.

    olga@strawbery$ tail -n 3 test_hybridization_*_phylonet.out
    ==> test_hybridization_1_phylonet.out <==
    Species Network:
    ((((chiloensis:0.08177960134575445)#H1:0.0011774181844964955::0.7889362664736876,
    iinumae:3.2322961834617074):0.21342487771271856,(vesca:0.03756955801084463,
    #H1:0.9128026176817999::0.21106373352631236):0.8646947680550517):5.9063434149541605,
    Drymocallis:1.0);
    Total log probability: -4976.799536608153

    ==> test_hybridization_2_phylonet.out <==
    Species Network:
    (Drymocallis:1.0,(iinumae:3.299155992176229,(vesca:0.3672750518888784,chiloensis:0.13941635790686432)
    :0.0011774181844964955):5.905554510350227);
    Total log probability: -5220.087740719886

    ==> test_hybridization_3_phylonet.out <==
    Species Network:
    (Drymocallis:1.0,(vesca:0.35931455092773623,(iinumae:3.27864301747052,
    chiloensis:0.05952780259580878):0.14214091413189794):5.904547370827603);
    Total log probability: -5173.5750126386065

    Now calculate AIC for each of the three scenarios, note that there are 9 estimated parameters for scenario 1 and 5 for scenarios 2 and 3:

    olga@strawbery$ R
    > 2*9--4976.799536608153
    [1] 4994.8 ## AIC for Scenario 1
    > 2*5--5173.5750126386065
    [1] 5183.575 ## AIC for Scenario 2
    > 2*5--5220.087740719886
    [1] 5230.088 ## AIC for Scenario 3

    Scenario 1 is associated with lowest AIC and therefore is most supported by the data.
  • The end

    This is the end of the third turorial describing major parts on the analysis performed in the paper. Please cite the paper if you use information or scripts described here.