NSF Postdoc Fellowship.docx - Home | Biology...
Transcript of NSF Postdoc Fellowship.docx - Home | Biology...
Appendix 3 – Phylogenetic Reconstruction and statistical approaches
Phylogenetic reconstruction - For phylogenetic reconstruction, we concatenated
publically available DNA sequences from seven genes: Calcium/Calmodulin-Dependent Protein
Kinase II (CAD), Cytochrome Oxidase I (COI), Elongation Factor 1-α copy 2 (EF1a-2), Sodium
Potassium Adenosine Triphosphate (Nak), Long Wavelength Rhodopsin (Opsin), RNA
PolymeraseII (PolII), and wingless (Wg) (Table 1). We selected the longest available sequences
for each of these genes or a random one if multiple sequences of equal size were available. For
COI, in some cases we concatenated two sequences to obtain a longer fragment expanding the
whole gene. When sequences for the species included in our study were not available in
GenBank, we used sequences from a different species within the same genus or subfamily.
Sequences were aligned using the MUSCLE option in SeaView v.4.5.3. The final length of the
alignment was 11,035bp. We used the Perl script PartitionFinder v.1.1.1 (Lanfaer et al 2012) to
find the best partition scheme and models of molecular evolution for our dataset (Table 2).
Maximum-likelihood phylogenetic reconstruction was generated with RAxML v.7.2+
(Stamatakis 2014) through the online portal CIPRES (Miller et al 2010) using the GTRGAMMA
and GTRGAMMA+I models. Both evolutionary models resulted in the same tree topology and
similar branch lengths, thus only results from the simpler model GTRGAMMA are reported
(Fig. 1). We constraint RAxML to find the best tree given the topology found in published
phylogenies with larger number of genes (Cardinal et al. 2010; Johnson et al. 2013; Misof et al.
2014). Specifically, we constraint the ant lineage (Camponotus) as the sister lineage of sphecid
wasps (Sphecius) and bees.
MCMCglmm Approach - MCMCglmm is a Markov Chain Monte Carlo sampler for a
Bayesian framework to multivariate generalized linear mixed models. We used this approach to
1
estimate the effect of multiple predictor variables (sociality, colony size, body size and
temperature) on the encapsulation response of the 11 insect lineages, while correcting for the
phylogenetic non-independence among taxa. The phylogenetic information was incorporated in a
generalized linear model as a random effect in a variance/co-variance matrix. The matrix was
derived after converting the tree generated by our seven-gene dataset to an ultrametric tree using
the function chronos in the R package ape (Paradis et al 2004). We used a weak prior for the
Bayesian analysis: [list (G=list(G1=list(V= 1, nu= 0.02)), R=list(V=1, nu=0.02))]. We ran 1
million generations, sampling every 100 generations and removing a burnin of 10% of the
sampled parameters. All models were run with [e.g. model2_P<-MCMCglmm
(MeanGrey~log.MaxColonySize, random=~Species, ginverse=list(Species=inv.phylo$Ainv),
family="gaussian", prior=prior, data=data, nitt=1000000, burnin=100000, thin=100] and without
[e.g. model2_NP<-MCMCglmm (MeanGrey~log.MaxColonySize, random=~Species,
family="gaussian", prior=prior, data=data, nitt=1000000, burnin=100000, thin=100]
phylogenetic correction. Model selection was based on the Deviance Information Criterion (DIC)
reported by the package MCMCglmm.
PGLS approach – We tested the effect of sociality and colony size on the encapsulation
response using a phylogenetic generalized least squares (PGLS). Body size was included as a
covariate. In the PGLS method, the phylogenetic correlation between lineages is incorporated as
a variance-covariance matrix. PGLS was used to quantify the regression coefficients between
traits using maximum likelihood estimators. We estimated the weighting parameter lambda (λ)
using the function corPage in the R package ape (Paradis et al 2004) to correct for the
phylogenetic effect on all the tested linear models. The Akaike Information Criterion corrected
2
for small sample size (AICc), incorporated in the function model.sel of the package MuMIn, was
used to compare among all tested model.
3
Table 1. GenBank accession number for sequences used in this study
Taxon CAD COI EF1a-2 Nak Opsin PolII Wg
Agapostemon
virescensJQ279238 FJ581961+JQ266376 AF140320 GU320102 AY227940 AY945089 JQ266620
Apis mellifera DQ067178 KC135895+AY114458 EU163208 EU184750 U26026 XM623278 AY703618
Blattella
germanicaGQ265596 KC407709+EU854321 - - - GQ265663 HE965017
Bombus
impatiensKF936682 HQ978604+JF799027 DQ788304
EU184743 AY485301XM01239196
0EU184707
Camponotus
castaneusXM011268571 AY334393 JN134514 - -
XM01125695
9JN134721
Eumenes
fraternus- EU649440 - - - EF190805 EF473959
Halictus ligatus JQ279314 HQ978593+AF438426 AF140299 EF646388 AY227956 AY654510 JQ266703
Polistes
fuscatus- EF136438 KP255916 EU367308 EF190796 EF473960
Sphecius
speciosusDQ067116 EF203743 AY585168 GU320099 JN374897 AY945172 EU367333
Xylocopa
virginicaEU122079 EU271670+AY005231 AY208290
GU245220 - GU245500 GU245676
4
Zootermopsis
nevadensis- KJ958410 - - AB596915 FJ802921
5
Table 2. Partition scheme used for the Maximum Likelihood phylogenetic reconstruction of the seven-gene dataset.
Set1 Set2 Set3 Set4 Set5 Set6 Best Model
Partition 1 Nak-3pos Wg-3pos GTR+G
Partition 2 Nak-1pos PolII-1pos Wg-1pos Wg-2pos CAD-1pos EF1a-1pos GTR+G
Partition 3 Nak-2pos PolII-2pos CAD-2pos EF1a-2pos GTR+G
Partition 4 Opsin-2pos COI-pos2 GTR+G
Partition 5 Opsin-3pos CAD-3pos GTR+G
Partition 6 Opsin-1pos COI-pos1 GTR+G+I
Partition 7 PolII-3pos EF1a-3pos GTR+G
Partition 8 COI-3pos GTR+G+I
6
Figure 1. Maximum likelihood estimation of phylogenetic relationships between the focal taxa
of this study. Numbers above branches indicate bootstrap percentage. Numbers below branches
represent branch lengths.
7
References
Barton K. MuMIn: Multi-model inference. R package version 1.0. 0.Vienna, Austria: R
Foundation for Statistical Computing; 2011, Available from: http://CRAN. R-project.
org/package= MuMIn.
Cardinal S, Straka J, Danforth BN. Comprehensive phylogeny of apid bees reveals the
evolutionary origins and antiquity of cleptoparasitism. Proc. Natl. Acad. Sci. U.S.A.
2010;107:16207-11.
Johnson BR, Borowiec ML, Chiu JC, Lee EK, Atallah J, Ward PS. Phylogenomics resolves
evolutionary relationships among ants, bees, and wasps. Curr. Biol. 2013;23:2058-62.
Lanfear R, Calcott B, Ho SY, Guindon S. PartitionFinder: combined selection of partitioning
schemes and substitution models for phylogenetic analyses. Mol. Biol. Evol. 2012;29:1695-
01.
Miller MA, Pfeiffer W, Schwartz T. Creating the CIPRES Science Gateway for inference of
large phylogenetic trees. Proceedings of the Gateway Computing Environments Workshop
(GCE); 2010 Nov 14; New Orleans, LA, USA; p. 1-8.
Misof B, Liu S, Meusemann K, Peters RS, Donath A, Mayer C, Frandsen PB, Ware J, Flouri T,
Beutel RG. Phylogenomics resolves the timing and pattern of insect evolution. Science.
2014;346:763-67.
Paradis E, Claude J, Strimmer K. APE: analyses of phylogenetics and evolution in R language.
Bioinformatics. 2004;20:289-90.
Stamatakis A. RAxML Version 8: A tool for phylogenetic analysis and post analysis of large
phylogenies. Bioinformatics. 2014;30:1312-13.
8