NSF Postdoc Fellowship.docx - Home | Biology...

11
Appendix 3 – Phylogenetic Reconstruction and statistical approaches Phylogenetic reconstruction - For phylogenetic reconstruction, we concatenated publically available DNA sequences from seven genes: Calcium/Calmodulin-Dependent Protein Kinase II (CAD), Cytochrome Oxidase I (COI), Elongation Factor 1-α copy 2 (EF1a-2), Sodium Potassium Adenosine Triphosphate (Nak), Long Wavelength Rhodopsin (Opsin), RNA PolymeraseII (PolII), and wingless (Wg) (Table 1). We selected the longest available sequences for each of these genes or a random one if multiple sequences of equal size were available. For COI, in some cases we concatenated two sequences to obtain a longer fragment expanding the whole gene. When sequences for the species included in our study were not available in GenBank, we used sequences from a different species within the same genus or subfamily. Sequences were aligned using the MUSCLE option in SeaView v.4.5.3. The final length of the alignment was 11,035bp. We used the Perl script PartitionFinder v.1.1.1 (Lanfaer et al 2012) to find the best partition scheme and models of molecular evolution for our dataset (Table 2). Maximum-likelihood phylogenetic reconstruction was generated with 1

Transcript of NSF Postdoc Fellowship.docx - Home | Biology...

Page 1: NSF Postdoc Fellowship.docx - Home | Biology Lettersrsbl.royalsocietypublishing.org/.../2/rsbl20150984supp3.docx · Web viewMCMCglmm is a Markov Chain Monte Carlo sampler for a Bayesian

Appendix 3 – Phylogenetic Reconstruction and statistical approaches

Phylogenetic reconstruction - For phylogenetic reconstruction, we concatenated

publically available DNA sequences from seven genes: Calcium/Calmodulin-Dependent Protein

Kinase II (CAD), Cytochrome Oxidase I (COI), Elongation Factor 1-α copy 2 (EF1a-2), Sodium

Potassium Adenosine Triphosphate (Nak), Long Wavelength Rhodopsin (Opsin), RNA

PolymeraseII (PolII), and wingless (Wg) (Table 1). We selected the longest available sequences

for each of these genes or a random one if multiple sequences of equal size were available. For

COI, in some cases we concatenated two sequences to obtain a longer fragment expanding the

whole gene. When sequences for the species included in our study were not available in

GenBank, we used sequences from a different species within the same genus or subfamily.

Sequences were aligned using the MUSCLE option in SeaView v.4.5.3. The final length of the

alignment was 11,035bp. We used the Perl script PartitionFinder v.1.1.1 (Lanfaer et al 2012) to

find the best partition scheme and models of molecular evolution for our dataset (Table 2).

Maximum-likelihood phylogenetic reconstruction was generated with RAxML v.7.2+

(Stamatakis 2014) through the online portal CIPRES (Miller et al 2010) using the GTRGAMMA

and GTRGAMMA+I models. Both evolutionary models resulted in the same tree topology and

similar branch lengths, thus only results from the simpler model GTRGAMMA are reported

(Fig. 1). We constraint RAxML to find the best tree given the topology found in published

phylogenies with larger number of genes (Cardinal et al. 2010; Johnson et al. 2013; Misof et al.

2014). Specifically, we constraint the ant lineage (Camponotus) as the sister lineage of sphecid

wasps (Sphecius) and bees.

MCMCglmm Approach - MCMCglmm is a Markov Chain Monte Carlo sampler for a

Bayesian framework to multivariate generalized linear mixed models. We used this approach to

1

Page 2: NSF Postdoc Fellowship.docx - Home | Biology Lettersrsbl.royalsocietypublishing.org/.../2/rsbl20150984supp3.docx · Web viewMCMCglmm is a Markov Chain Monte Carlo sampler for a Bayesian

estimate the effect of multiple predictor variables (sociality, colony size, body size and

temperature) on the encapsulation response of the 11 insect lineages, while correcting for the

phylogenetic non-independence among taxa. The phylogenetic information was incorporated in a

generalized linear model as a random effect in a variance/co-variance matrix. The matrix was

derived after converting the tree generated by our seven-gene dataset to an ultrametric tree using

the function chronos in the R package ape (Paradis et al 2004). We used a weak prior for the

Bayesian analysis: [list (G=list(G1=list(V= 1, nu= 0.02)), R=list(V=1, nu=0.02))]. We ran 1

million generations, sampling every 100 generations and removing a burnin of 10% of the

sampled parameters. All models were run with [e.g. model2_P<-MCMCglmm

(MeanGrey~log.MaxColonySize, random=~Species, ginverse=list(Species=inv.phylo$Ainv),

family="gaussian", prior=prior, data=data, nitt=1000000, burnin=100000, thin=100] and without

[e.g. model2_NP<-MCMCglmm (MeanGrey~log.MaxColonySize, random=~Species,

family="gaussian", prior=prior, data=data, nitt=1000000, burnin=100000, thin=100]

phylogenetic correction. Model selection was based on the Deviance Information Criterion (DIC)

reported by the package MCMCglmm.

PGLS approach – We tested the effect of sociality and colony size on the encapsulation

response using a phylogenetic generalized least squares (PGLS). Body size was included as a

covariate. In the PGLS method, the phylogenetic correlation between lineages is incorporated as

a variance-covariance matrix. PGLS was used to quantify the regression coefficients between

traits using maximum likelihood estimators. We estimated the weighting parameter lambda (λ)

using the function corPage in the R package ape (Paradis et al 2004) to correct for the

phylogenetic effect on all the tested linear models. The Akaike Information Criterion corrected

2

Page 3: NSF Postdoc Fellowship.docx - Home | Biology Lettersrsbl.royalsocietypublishing.org/.../2/rsbl20150984supp3.docx · Web viewMCMCglmm is a Markov Chain Monte Carlo sampler for a Bayesian

for small sample size (AICc), incorporated in the function model.sel of the package MuMIn, was

used to compare among all tested model.

3

Page 4: NSF Postdoc Fellowship.docx - Home | Biology Lettersrsbl.royalsocietypublishing.org/.../2/rsbl20150984supp3.docx · Web viewMCMCglmm is a Markov Chain Monte Carlo sampler for a Bayesian

Table 1. GenBank accession number for sequences used in this study

Taxon CAD COI EF1a-2 Nak Opsin PolII Wg

Agapostemon

virescensJQ279238 FJ581961+JQ266376 AF140320 GU320102 AY227940 AY945089 JQ266620

Apis mellifera DQ067178 KC135895+AY114458 EU163208 EU184750 U26026 XM623278 AY703618

Blattella

germanicaGQ265596 KC407709+EU854321 - - - GQ265663 HE965017

Bombus

impatiensKF936682 HQ978604+JF799027 DQ788304

EU184743 AY485301XM01239196

0EU184707

Camponotus

castaneusXM011268571 AY334393 JN134514 - -

XM01125695

9JN134721

Eumenes

fraternus- EU649440 - - - EF190805 EF473959

Halictus ligatus JQ279314 HQ978593+AF438426 AF140299 EF646388 AY227956 AY654510 JQ266703

Polistes

fuscatus- EF136438 KP255916 EU367308 EF190796 EF473960

Sphecius

speciosusDQ067116 EF203743 AY585168 GU320099 JN374897 AY945172 EU367333

Xylocopa

virginicaEU122079 EU271670+AY005231 AY208290

GU245220 - GU245500 GU245676

4

Page 5: NSF Postdoc Fellowship.docx - Home | Biology Lettersrsbl.royalsocietypublishing.org/.../2/rsbl20150984supp3.docx · Web viewMCMCglmm is a Markov Chain Monte Carlo sampler for a Bayesian

Zootermopsis

nevadensis- KJ958410 - - AB596915 FJ802921

5

Page 6: NSF Postdoc Fellowship.docx - Home | Biology Lettersrsbl.royalsocietypublishing.org/.../2/rsbl20150984supp3.docx · Web viewMCMCglmm is a Markov Chain Monte Carlo sampler for a Bayesian

Table 2. Partition scheme used for the Maximum Likelihood phylogenetic reconstruction of the seven-gene dataset.

Set1 Set2 Set3 Set4 Set5 Set6 Best Model

Partition 1 Nak-3pos Wg-3pos GTR+G

Partition 2 Nak-1pos PolII-1pos Wg-1pos Wg-2pos CAD-1pos EF1a-1pos GTR+G

Partition 3 Nak-2pos PolII-2pos CAD-2pos EF1a-2pos GTR+G

Partition 4 Opsin-2pos COI-pos2 GTR+G

Partition 5 Opsin-3pos CAD-3pos GTR+G

Partition 6 Opsin-1pos COI-pos1 GTR+G+I

Partition 7 PolII-3pos EF1a-3pos GTR+G

Partition 8 COI-3pos GTR+G+I

6

Page 7: NSF Postdoc Fellowship.docx - Home | Biology Lettersrsbl.royalsocietypublishing.org/.../2/rsbl20150984supp3.docx · Web viewMCMCglmm is a Markov Chain Monte Carlo sampler for a Bayesian

Figure 1. Maximum likelihood estimation of phylogenetic relationships between the focal taxa

of this study. Numbers above branches indicate bootstrap percentage. Numbers below branches

represent branch lengths.

7

Page 8: NSF Postdoc Fellowship.docx - Home | Biology Lettersrsbl.royalsocietypublishing.org/.../2/rsbl20150984supp3.docx · Web viewMCMCglmm is a Markov Chain Monte Carlo sampler for a Bayesian

References

Barton K. MuMIn: Multi-model inference. R package version 1.0. 0.Vienna, Austria: R

Foundation for Statistical Computing; 2011, Available from: http://CRAN. R-project.

org/package= MuMIn.

Cardinal S, Straka J, Danforth BN. Comprehensive phylogeny of apid bees reveals the

evolutionary origins and antiquity of cleptoparasitism. Proc. Natl. Acad. Sci. U.S.A.

2010;107:16207-11.

Johnson BR, Borowiec ML, Chiu JC, Lee EK, Atallah J, Ward PS. Phylogenomics resolves

evolutionary relationships among ants, bees, and wasps. Curr. Biol. 2013;23:2058-62.

Lanfear R, Calcott B, Ho SY, Guindon S. PartitionFinder: combined selection of partitioning

schemes and substitution models for phylogenetic analyses. Mol. Biol. Evol. 2012;29:1695-

01.

Miller MA, Pfeiffer W, Schwartz T. Creating the CIPRES Science Gateway for inference of

large phylogenetic trees. Proceedings of the Gateway Computing Environments Workshop

(GCE); 2010 Nov 14; New Orleans, LA, USA; p. 1-8.

Misof B, Liu S, Meusemann K, Peters RS, Donath A, Mayer C, Frandsen PB, Ware J, Flouri T,

Beutel RG. Phylogenomics resolves the timing and pattern of insect evolution. Science.

2014;346:763-67.

Paradis E, Claude J, Strimmer K. APE: analyses of phylogenetics and evolution in R language.

Bioinformatics. 2004;20:289-90.

Stamatakis A. RAxML Version 8: A tool for phylogenetic analysis and post analysis of large

phylogenies. Bioinformatics. 2014;30:1312-13.

8