media.nature.com · Web [email protected]; [email protected] This file includes:...
Transcript of media.nature.com · Web [email protected]; [email protected] This file includes:...
Online Materials for
Mutation in ST6GALNAC5 identified in family with coronary artery disease
Kolsoum InanlooRahatloo1,2α, Amir Farhang Zand Parsa3, Klaus Huse2, Paniz Rasooli1, Saeid Davaran4,
Matthias Platzer2, Marcel Kramer5, Jian-Bing Fan6, Casey Turk6, Sasan Amini6, Frank Steemers6, Kevin
Gunderson6, Mostafa Ronaghi6, Elahe Elahi1,7*
correspondence to: [email protected] ; [email protected]
This file includes:
Materials and Methods
Figures S1 to S7
Tables S1 to S10α, Present address: Dept. of Cardiology & Radiology, Stanford School of Medicine, Stanford, CA 94305-
5454
1
12
1
2
3
4
5
6
7
89
1011
12
13
1415
1617
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
Methods in detail
Genome-wide linkage analysis
Genome-wide SNP genotyping was carried out on DNA samples of eight individuals of the CAD-105
pedigree using HumanCytoSNP-12v1-0_D BeadChips and the iScan reader (Illumina; www.illumina.com)
(GEO accession no.: GSE42137). The individuals included six CAD affected and two CAD unaffected
individuals (Fig. 1). SNPs that had not been genotyped in one or more individual were removed from the
analysis with appropriate options in the GenomeStudio_Genotyping_Module_V1.0) (Illumina). MERLIN
was used to remove SNPs that exhibited Mendelian error and subsequently to attain parametric and
nonparametric logarithm of odds (LOD) scores under two sets of criteria: disease allele frequency of
0.001, penetrance of 90%, and 10% phenocopies; disease allele frequency of 0.0001, penetrance of 99%,
and 1% phenocopies1.
Genome wide exome sequencing
CAD affected individuals III-1 and III-2 were selected for exome sequencing. Genomic DNA was
isolated from blood samples by standard methods. DNA libraries were enriched using the TruSeq® Exome
Enrichment kit (Illumina, San Diego, CA, USA) and subsequently sequenced on an Illumina HiSeq® 2000
system (Illumina). The Truseq Exome assay targets 62 Mb of protein coding and regulatory untranslated
regions of the genome. Base calling was performed by the Illumina pipeline with default parameters. Over
eight gigabases of high quality sequences for each subject were generated. Sequence reads were mapped to
the human reference genome UCSC NCBI37/hg19 using ELANDv2 software (Illumina). Variant detection
was performed with CASAVA software (version 1.8.1; Illumina), and candidate variants were filtered to
have a CASAVA quality threshold of 10. CASAVA filtered out duplicate reads and reads without matched
pairs. In addition to CASAVA, variants were analyzed using Elnis Genomics (http://www.enlis.com) and
NextBio (http://www.nextbio.com/b/nextbio.nb) analysis softwares, again with reference to human genome
reference sequence NCBI37/hg19. Absence of the variants in 60 whole-exome sequence data available
within the Enlis Genomics data set (http://www.enlis.com/sample_genomes.html) and 15 other exome
sequences sequenced along with the CAD-105 patients but derived from healthy Iranians or Iranians
affected with unrelated disorders was also verified. The variants were finally systematically filtered to
identify those that were positioned within the linked loci, that affected splicing or amino acid changes, and
that were present in both patients. Development protocols and features of the TruSeq® Exome Enrichment
kit are described for the first time below.
TruSeq exome content
2
56
34
35
36
37
38
39
404142434445464748
49
505152535455565758596061626364656667
78
The TruSeq Exome enrichment kit includes 340,427 probes, each constructed against the human
genome NCBI37/hg19 reference genome. The probe set was designed to enrich 201,121 exons spanning
20,794 genes targeting a total complexity of around 62Mb. For exons larger than 150 bases, the probes are
uniformly spaced roughly every 150 bases. Each 95-mer probe targets libraries of 300-400 bp (insert size
180-280bp) enriching 265-465 bases centered symmetrically on the midpoint of the probe. This means that,
in addition to comprehensive coverage of the major exon data bases, the kit also provides broad coverage
of non-coding DNA in exon flanking regions including promoters and UTRs. Databases covered by the kit
are CCDS coding Exons (31.3 Mb, hg19, 97.2% covered), RefSeq (33.2Mb, hg19, 96.4% covered), RefSeq
(regGene) exons plus (67.8 Mb, hg19, 88.3% covered), Encode/Gencode coding exons (25.6 Mb, hg19,
93.2% covered), and predicted microRNA targets (9.0 Mb, hg19, 77.6% covered).
TruSeq exome enrichment
Protocols, workflows, sequencing library preparation and pooling, and details regarding the TruSeq
Exome enrichment assay can be found in the “TruSeq Exome guide” and “TruSeq Exome enrichment kit
data sheet”: http://support.illumina.com/sequencing/sequencing_kits/truseq_exome_enrichment_kit.ilmn.
http://www.illumina.com/documents//products/datasheets/datasheet_truseq_exome_enrichment_kit.pdf
The method consists of a 2.5-3 day workflow. An indexing solution is supplied for each DNA sample,
which makes multi-sample pooling of up to twelve samples feasible in a single enrichment reaction. This is
a key feature of the technology that enables an automation friendly workflow and allows processing of
many samples simultaneously with minimum hands-on time. An overview of the enrichment scheme is
shown in Figure S2. In summary, the enrichment workflow steps are: (A) preparation of indexed libraries,
quantitation, and pooling of indexed libraries (see Preparation of sequencing libraries below), (B)
denaturing of libraries, (C) solution-phase hybridization of biotinylated oligonucleotide probes; (D) affinity
pull-down of targeted regions using magnetic streptavidin beads and high stringency washing, (E) elution
of captured regions of interest , repeat of pull-down step starting at (C) , and (F) PCR amplification using
universal primers (P5 and P7 in Figure S2) of the final eluted targeted libraries (not shown).
The enrichment method is unique in the sense that standard biotin-oligonucleotides are used, and that
non-stringent high concentration capture probe annealing is combined with two rounds of affinity
purification using Tm-normalizing stringency washes. The method is enabled by the high quality, relatively
uniform representation, and low preparation cost of the biotin-oligonucleotides, making this method
economically attractive for fixed sets with large sample volumes2.
Implementation of two rounds of biotin-oligonucleotide capture and release resulted in an increased
enrichment specificity (>80%) and overall assay robustness compared to a single round (40-50%). Two key
reagents, Cot-1 DNA (KREAcot DNA, Kreatech) and sequencing primer blockers (SBS3 and SBS12) in
step C (Figure S2) have a significant effect on the enrichment specificity. For example, 100, 50, 10, 5, 1,
and 0 ng/uL Cot-1 DNA yields 82.5%, 70.7%, 33.2%, 20.3%, 6.3%, and 2.1% enrichment specificity,
respectively. In the second enrichment step, the effect of cross-hybridizing repeat elements is effectively
blocked by Cot-1 DNA since fewer non-targeted libraries are present after the first enrichment step.
3
910
68
69707172737475767778
79
80818283
84858687888990919293
9495969798
99100101102103104
1112
Typical enrichment specificities are >80% and are relatively uniform across library insert sizes ranging
from 150-500 bp. The protocol results in a relatively uniform read count distribution across the targeted
regions. For example, at 0.2X of the mean coverage, >90% of the targeted bases are covered. This is
accomplished by a relatively non-stringent overnight hybridization capture at sub-nM concentrations of
capture probes to drive hybridization capture, and a highly-optimized Tm-normalizing stringency wash to
reduce off- target enrichment.
Preparation of sequencing libraries
Sequencing libraries were prepared using the TruSeq® DNA sample preparation kit v2 (
http://www.illumina.com/products/truseq_dna_sample_prep_kit_v2.ilmn)
using 1 ug of gDNA input. We also validated The Nextera® Exome enrichment kit with a few selected
samples using only 50ng of gDNA input
(http://www.illumina.com/documents/products/datasheets/datasheet_nextera_exome_enrichment.pdf).
Preparation of biotinylated capture probes
The synthesis and purification of the biotinylated oligonucleotides for the Exome Capture Target Oligo
(CTO) pool is described elsewhere2.
Effect of probe-target DNA mismatches on enrichment efficiency
We designed a set of oligonucleotide probes that enabled evaluation of the efficiency of target DNA
enrichment with various types and degrees of mismatches with respect to the probes. Capture probes were
designed to span a range of variant lesions consisting of deletions, insertions, consecutive substitutions, or
staggered substitutions relative to the hg19 reference genome. Each “variant” category was covered by 100
capture probes chosen from the TruSeq® Exome capture probe set with the following selection rules: (1)
random selection from regions with an average number of reads of a typical exome enrichment experiment,
(2) performing consistently well, (3) and positioned within regions covered by only a single probe. The
types of variant probes included probes with 0, 1, 2, 3, 4, 5, 7, 9, 12, 15 bp alterations where the alterations
consist of insertions, deletions, and consecutive- and staggered substitutions. For insertions, an insert of
specified length was inserted into the middle of the probe using a randomly generated sequence. Truncation
was performed symmetrically so probe length was maintained (e.g. for 2 base insertions: ATAT-
>ATGGAT->TGGA). For deletions, a deletion in the middle of the probe of specified length was
introduced with flanking sequence added symmetrically to maintain probe length (e.g. GGATATGG-
>NGGATGGN). For substitutions, homo-mismatched were used as a worst case scenario. Half of the
designs had consecutive mismatches (substitutions), and the other half had uniformly spaced mismatches
(staggered substitutions) (e.g. ATGATGAC->ATGTAGAC or ATGTTCAC). We also accessed the effect
of probe tiling in cases where more than one probe covers a given genomic region. This was accomplished
by designing a second set of probes such that three probes, rather than one probe, annealed to each target
region. This trio of probes included the original probe and two flanking probes of equal length. The latter
set is useful in determining how flanking probes can help recover the targeted region in cases of inefficient
affinity enrichment with a single central probe.
4
1314
105
106107108109110111
112
113
114
115116117
118
119120
121
122123124125126127128129130131132133134135136137138139140141
1516
Variant capture probe preparation
Capture probes were prepared using PCR amplification of a 90K pool of “in situ” array synthesized
oligonucleotides from CustomArray (Bothall, WA). Common amplification primers (Primer 1:
AGTCCGCGCAATCAG, and Primer 2: TGCAAGGATCACTCG) were included in the probe sequence
and flanked the 80 base capture probe sequence for a final array probe length of 110 bases. The PCR
reaction was performed with 1X Titanium Taq buffer (Clontech, Mountain view, CA), 1uM Biotin-Primer1
(Integrated DNA Technologies (IDT), Coralville, Iowa), 1uM Primer 2 (IDT), 200 uM dNTPs (Roche,
Indianapolis, IN), 1ul Titanium Taq (Clontech, Mountain view, CA), 0.1ng template 90K oligonucleotide
pool (CustomArray, 110 bp), and H2O to 100 uL. PCR cycling conditions: 95oC (5 min.), 95oC (30 s). 55oC
(30 s), 72oC (60 s), cycle 30 times, 72oC (5 min.), 10oC (forever). Sera-Mag Magnetic Streptavidin (100 uL;
MPB, Thermo Scientific, IN) were pre-washed with hybridization buffer (HB:1 M NaCl, 0.5 M phosphate
buffer, 0.1% Tween-20). The biotinylated PCR products (8 uL, 60 uM) were incubated in hybridization
buffer with the pre-washed MPB beads for 30 min at RT. The beads were subsequently washed with first
with 1X HB, then with 0.2X HB, NaOH (0.1N), and finally with 0.2X HB. The biotin-oligonucleotides
were eluted from MPB using water (100 uL) and heat (950C for 10 min) and obtained at a final
concentration of ~2 uM3, 4.
Variant probe enrichment assays
Enrichment assays were tested with various wash temperatures (32, 42, and 52oC) of step D (Figure S2)
in order to evaluate the effect of wash stringency on specificity, uniformity, and coverage of the targeted
regions across the probe variant categories. Not unexpected, higher stringency wash temperatures generated
higher enrichment efficiencies and lowered the uniformity across all probes. In Figures S3-S5, the average
read count per category is shown for the single probe and three probe designs at, respectively, 32, 42, and
52oC. The current protocol requires a 42oC stringency wash temperature. At this stringency, targeted
libraries with up to 11 bp staggered substitutions, 15 bp consecutive substitutions and 15 bp indels
compared to the probes are efficiently enriched without much loss of coverage in the targeted regions. With
the longer 95-mer probes in the current Truseq® Exome kit, it should be possible to detect even larger
variations. Larger variations can still be detected without loss in coverage using the 3 probe design. This
strongly indicates that flaking probes can mitigate against losses of capture efficiency of the mismatched
probe centered in the middle.
Sanger sequencing
Genomic DNA fragments containing each of the 12 candidate exomic variations distributed in 11 genes
and considered to be candidate CAD causing variations based on results of linkage analysis and exome
sequence data were amplified by PCR and sequenced. Reference sequences used for design of primers were
as follows: VPS13D: NC_000001.10,NM_015378.2; CRYZ: NC_000001.10, NM_001130042;
ST6GALNAC5: NC_000001.10, NM_030965; LPHN2: NC_000001.10, NM_012302; TTN:
NC_000002.11, NM_001256850; HSPD1: NC_000002.11, NM_002156; IRS1: NC_000002.11,
NM_005544; GPR35: NC_000002.11, NM_001195381; IL7R: NC_000005.9, NM_002185; LILRA2:
5
1718
142
143
144145146147148149150151152153154155156157158
159
160161162163164165166167168169170171
172
173174175176177178
1920
NC_000019.9, NM_001130917; NLRP2: NC_000019.9, NM_001174081. DNAs from the seven affected
and three unaffected members of pedigree CAD-105 were analyzed with respect to these variations. The
amplicon containing the p.Val99Met causing variation in ST6GALNAC5 was also sequenced in 800
ethnically matched control Iranians not affected with cardiac disorders, and the amplicon containing the
p.*337Qext*20 variant was also sequenced in 800 controls. Finally all exons and adjacent intronic
sequences of ST6GALNAC5 were amplified and sequenced in 100 of the control individuals and in 160
Iranian CAD patients unrelated to each other and to pedigree CAD-105. Amplified DNA fragments were
sequenced using ABI Big Dye terminator chemistry and an ABI 3730XL genetic analyzer instrument
(Applied Biosystems, Foster city, CA). Sequences were analyzed with the Sequencher 4.8 software (Gene
Codes Corporation, Ann Arbor, MI). ST6GALNAC5 sequences derived from whole genome sequencing
performed in the United States on 150 CAD affected individuals and 800 individuals diagnosed not to be
affected with CAD were kindly provided by Dr. Leslie G. Biesecker (Genetic Diseases Research Branch,
National Human Genome Research Institute, USA). In this study, CAD diagnosis was based on having had
experienced MI, a stent placed, received a bypass, or presented with more than 50% occlusion upon
computed tomography angiography (CTA) or cardiac catheterization. In addition to reference sequences
described above, effects of nucleotide sequence variations observed were analyzed using protein reference
sequences NP_056193.2 (VPS13D), NP_001123514.1 (CRYZ), NP_112227.1 (ST6GALNAC5),
NP_036434.1 (LPHN2), NP_001243779.1 (TTN), NP_002147.2 (HSPD1), NP_005535.1 (IRS1),
NP_001182310.1 (GPR35), NP_002176.2 (IL7), NP_001124389.1 (LILRA2), and NP_001167552.1
(NLRP2). The sequences of all primers used are presented in Table S5.
Creation of plasmids pcDNA3.3- ST6GALNAC5, pcDNA3.3- ST6GALNAC5- p.Val99Met, and
pcDNA3.3- ST6GALNAC5- p.*337Qext*20
COOH-terminal FLAG-tagged ST6GALNAC5 cDNA was PCR amplified from a human heart cDNA
panel (Clontech, Mountain View, CA, USA) using forward primer 5'-AAAATGAAGACCCTGATGCGC-
3' and reverse primer 5′-TTACTTATCATCATCATCCTTATAATCGAACACAGGTTTATTCTCAGGA-
3′. The amplicon was cloned into pcDNA3.3 (Invitrogen, Karlsruhe, Germany) using the Topo TA Cloning
system (Invitrogen), and pcDNA3.3- ST6GALNAC5 was created. The plasmid was transformed into One
Shot® TOP10 Chemically Competent E. coli cells (Invitrogen, Karlsruhe, Germany), and the sequence of
the insert in plasmids isolated from ampicillin resistant cells was confirmed by direct sequencing.
Subsequently, the c.G295A mutation that causes p.Val99Met was introduced using the QuickChange site-
directed mutagenesis kit (Agilent Technology, Karlsruhe, Germany) according to the manufacturer’s
instructions. PcDNA3.3- ST6GALNAC5- p.Val99Met was thus created. Primers used contained the
sequence 5′- GGGACTGTGCCCTGATGACCAGCTCAG-3′ (nucleotide causing the mutation is
underlined) and the reverse complement of this sequence. Briefly, these primers which are complementary
to opposite DNA strands of pcDNA3.3- ST6GALNAC5, are extended during a cycling reaction to create
mutated plasmids with staggered nicks. DNA strands in plasmids without nicks are removed by digestion
with Dpn1. Nicks in surviving plasmids are repaired in vivo after transfection into TOP10 E. coli cells.
6
2122
179
180181182183184185186187188189190191192193194195196197198199
200201
202203204205206207208209210211212213214215
2324
PcDNA3.3- ST6GALNAC5- p.*337Qext*20 was created by performing overlap extension PCR using
forward primer 5′- AAAATGAAGACCCTGATGCGC-3′ and three reverse primers
5′- TGGGATTACAGTCTGGCATGCTCATTCCTTGGAACACAG-3′,
5′- GTGTCTCGGTGTCTGATGCAGTGAATACCTGGGATTACAG-3′,and
5′- TTACTTATCATCATCATCCTTATAATCGTGTCTCGGTGTCTGA-3′. The third reverse primer
contained the Flag sequence. Presence of the mutations was confirmed in isolated plasmids by sequencing.
ST6GALNAC5 expression in COS-7 cells
Transfection was performed using Lipofectamin 2000 (Invitrogen) according to the manufacturer’s
instructions. African green monkey kidney COS-7 cells (ATCC, Rockville, MD, USA) were seeded at a
density of 2 × 104/100 μl in a 12-well plate (for RT-PCR and Western blotting analyses) or onto poly-L-
lysine coated coverslips (for immunofluorescent microscopy) and cultured in Dulbecco’s MEM medium
GlutaMAX™ (Invitrogen) supplemented with 10% fetal calf serum and 1% antibiotic (Penicillin -
Streptomycin; Sigma, Hamburg, Germany). Growth was in an atmosphere with 5% CO2 and 95% humidity
at 37°C. Cells were transfected with pcDNA3.3- ST6GALNAC5, pcDNA3.3- ST6GALNAC5- p.Val99Met,
and pcDNA3.3- ST6GALNAC5- p.*337Qext*20 after 24 hours. ST6GALNAC5 sequences in plasmids
isolated from COS-7 cells were confirmed by sequencing. Expression analyses were performed 24 hours
post transfection.
For RT-PCR, RNA was isolated using the RNeasy Mini Kit (Qiagen, Hilden, Germany) and cDNA
synthesis was performed with Sprint RT Complete-Random Hexamer first-strand cDNA synthesis kit
(Clontech-Takara Bio Europe, Saint-Germain-en-Laye, France). Five micrograms of total RNA was used
for reverse transcription.
For Western blotting, PBS washed COS-7 cells were lysed in 1% NP-40 (v/v), 20 mM Tris-HCl, pH 7.6,
150 mM NaCl, 1:1000 protease inhibitor cocktail, and 1 mM EDTA5. Total protein content in supernatants
recovered after centrifugation was determined using the BCA assay (Bio-Rad, Munich, Germany). Aliquots
containing 20 microgram of protein were size-fractionated in NuPAGE 10% Bis-Tris gels (Invitrogen)
using a buffer that contained 50 mM MOPS, 50 mM Tris Base, 0.1% SDS, 1 mM EDTA, pH 7.7. Proteins
were then transferred onto nitrocellulose membranes, and probed with mouse monoclonal M2-anti-FLAG
(1:1000; F3165, Sigma-Aldrich, Munich, Germany), rabbit anti-human sialylytransferase 7e (1:1000;
ab69855, abcam, Cambridge, MA, USA), or goat anti-human lamin B (1:6000; sc-6216, Santa Cruz
Biotechnology, Santa Cruz, CA, USA) primary antibody and appropriate (anti- mouse IG, 1:2500, W4021,
Promega, Mannheim, Germany; anti-rabbit IG,1: 4500, 65-6120, Invitrogen; and anti-goat IG, 1: 50000 ;
sc-2768, Santa Cruz Biotechnology) secondary anti-IgG antibody anti-coupled to horseradish peroxidase.
Lamin B served as internal control. Detection was performed using the enhanced chemiluminescence
(ECL) Western blotting detection system (Invitrogen). Exposure times were 5 to15 seconds.
Sialyltransferase enzyme assay
Sialyltransferase enzyme activity in protein extracts of untransfected COS-7cells and COS-7 cells
transfected with pcDNA3.3- ST6GALNAC5, pcDNA3.3- ST6GALNAC5- p.Val99Met, and pcDNA3.3-
7
2526
216
217218
219
220
221
222
223
224225226227228229230231232233
234235236237
238239240241242243
244245
246247248249250
251
252
2728
ST6GALNAC5- p.*337Qext*20 was assayed using the Sialyltransferase Activity Kit (R&D Systems,
Wiesbaden-Nordenstadt, Germany), according to the manufacturer’s instructions. This kit utilizes the 5’-
nucleotidase CD73 as a coupling phosphatase to remove inorganic phosphate quantitatively from the
leaving nucleotide cytidine 5’-monophosphate that is generated during sialyltransferase reactions7. Assays
were performed on protein extracts isolated from cells 24 hours after transfection; three independent
transfections with each vector were performed and protein extractions and enzyme assays were done on
cells of each transfection experiment. Varying amounts of protein were incubated with 25 nmol of CMP-
Neu5Ac (C8271; Sigma), 1 mg of asialofetuin (A4781, Sigma), and 50 ng of Coupling Phosphatase 2 in 1X
Assay Buffer for 20 minutes at 37° C. Released inorganic phosphate was measured by spectrophotometry.
ST6GALNAC5 ELISA assay
ST6GALNAC5 protein concentrations in the extracts of untransfected COS-7cells and COS-7 cells
transfected with pcDNA3.3- ST6GALNAC5, pcDNA3.3- ST6GALNAC5- p.Val99Met, and pcDNA3.3-
ST6GALNAC5- p.*337Qext*20 that were used for sialyltransferase enzyme assays were measured using an
ST6GALNAC5 ELISA kit (antibodies-online Inc., Atlanta, GA, USA) according to the instructions of the
manufacturer. The assay employs a quantitative sandwich enzyme immunoassay technique. Briefly,
purified standard ST6GALNAC5 provided with the kit and protein extracts of cultured cells were added to
wells of microplates previously coated with antibody specific for ST6GALNAC5. After incubation to allow
binding of ST6GALNAC5 to the immobilized antibodies and removal of unbound substances, biotin-
conjugated antibody specific for ST6GALNAC5, avidin-conjugated horseradish peroxidase, and
horseradish peroxidase substrate 3,3′,5,5′-tetramethylbenzidine (TMB) were sequentially added to the
wells. Appropriate incubations, removal of liquids from the wells, and washings were performed.
Ultimately, a stop solution was added to prevent further enzymatic activity and optical densities were
measured at 450 nm. Amounts of ST6GALNAC5 protein per nanogram total protein in the extracts was
calculated based on a standard curve derived from readings of the standard ST6GALNAC5. All samples
were assayed in triplicate.
1. Abecasis, G.R., Cherny, S.S., Cookson, W.O., Cardon, L.R. (2002). Merlin rapid analysis of dense
genetic maps using sparse gene flow trees. Nat. Genet. 30, 97–101.
2. York, K.T., et al (2011). Highly parallel oligonucleotide purification and functionalization using
reversible chemistry. NAR. doi:10.1093/nar/gkr910.
3. Maurer, K., et al. (2006). Electrochemically generated acid and its containment to 100 micron reaction
areas for the production of DNA microarrays. PlosOne. 1, e34. doi:10.1371/journal.pone.0000034.
4. Holmberg, A., Blomstergren, A., Nord, O., Lukacs, M., Lundeberg, J., Uhlén, M. (2005). The biotin-
streptavidin interaction can be reversibly broken using water at elevated temperatures. Electrophoresis. 26,
501-510.
8
2930
253
254255256257258259260261262
263
264265266267268269270271272273274275276277278
279
280281
282283
284285
286287
288289
3132
5. Hosoya KI, Kim KJ, Lee VHL. (1996). Age-dependent expression of P-glycoprotein gp170 in Caco-2
cell monolayers. Pharm Res (NY). 13, 885–890.
6. Wacker I, Kaether C, Krömer A, et al. Microtubule-dependent transport of secretory vesicles visualized
in real time with a GFP-tagged secretory protein. J Cell Sci 1997;110(pt 13):1453-63.
7. Wu ZL, Ethen CM, Prather B, Machacek M, Jiang W. Universal Phosphatase-coupled
Glycosyltransferase Assay. Glycobology 2010; DOI: 10.1093/glycob/cwq187.
9
3334
290
291292
293294
295296
3536
Figure S1
Figure S1- LOD plots of chromosomal regions showing best linkage to CAD status in pedigree CAD-105. A, Plot of chromosome 1. B, Plot of chromosome 19. The
LOD plots are under assumption of autosomal dominant model of inheritance. Two close peaks
on chromosome 1 and one peak on chromosome 19 are evident.
10
3738
297
298
299
300
301
302303304
3940
Figure S2
Figure S2- The Truseq® Exome enrichment workflow. Indexed libraries can be
prepared with the TruSeq DNA sample preparation kit v2 or with the Nextera® sample preparation
solution (as shown).
11
4142
305
306307308309
4344
Figure S3
Figure S3- Average read count of the targeted regions using the one probe design (blue) vs. the three probe design (green) across the categories: deletions, insertions, consecutive- and staggered substitutions at the 320C wash temperature.
12
4546
310
311
312
313
314
315
316
4748
Figure S4
Figure S4- Average read count of the targeted regions using the one probe design (blue) vs. the three probe design (green) across the categories: deletions, insertions, consecutive- and staggered substitutions at the 420C wash temperature.
13
4950
317
318319
320
321
322
5152
Figure S5
Figure S5- Average read count of the targeted regions using the one probe design (blue) vs. the three probe design (green) across the categories: insertions, consecutive- and staggered substitutions categories at the 520C wash temperature.
14
5354
323
324325
326
327
328
5556
Figure S6
Figure S6-DNA sequence chromatograms. A, Chromatograms verifying presence of
12 exome sequence-based candidate CAD causing mutations in affected individual III-1. B,
Chromatogram showing stop-loss mutation in CAD patient.
15
5758
329
330331
332333334335
5960
Figure S7
Figure S7- Confirmation of expression of ST6GALNAC5 in COS-7 transfected cells. A,
Confirmation by RT-PCR on RNA extracted from COS-7 culture transfected with pcDNA3.3-
ST6GALNAC5 (Wt), culture transfected with pcDNA3.3- ST6GALNAC5- p.Val99Met (Mut1), and
culture transfected with pcDNA3.3- ST6GALNAC5- p.*337Qext*20 (Mut2). NT: RT-PCR on RNA
extracted from non-transfected cells. B-D, Confirmation by Western blotting on protein extracted
from COS-7 culture transfected with pcDNA3.3- ST6GALNAC5 (Wt), culture transfected with
pcDNA3.3- ST6GALNAC5- p.Val99Met (Mut1) and culture transfected with pcDNA3.3-
ST6GALNAC5- p.*337Qext*20 (Mut2). The primary antibodies used were either against
FLAG (B), sialyltransferase 7e (C), or lamin B (D). A less intensely stained band in
addition to the major protein band was consistently observed when antibodies against
FLAG or sialytransferase 7e were used. This may represent a modified form of the
encoded protein. Lamin B served as internal loading control. NT: Western blot on protein
extracted from non-transfected cells.
16
6162
336
337
338
339
340341342343344345
346
347
348
349
350
351
6364
Table S1- Phenotypic and genotypic data on families with mutations in ST6GALNAC5 A: CAD-105 pedigree with p.V99M mutation
ID CAD status¥ Sex Age at Present ST6GALNAC
5 LDL Triglycerides HDL Fasting Systolic
BPDiastolic
PB BMI
diagnosis age genotype (mg/dl)δ (mg/dl)δ (mg/dl)δ blood
glucose (mm Hg)β (mm Hg)β (kg/m2)
(yrs) (yrs) (mg/dl)δ
II-1 +: CABG M 55 65 Mut/Wt 180 142 42 105 120 80 23
II-2 - F 75 Wt/Wt 163 92 41 99 160 90 26
II-3 - M 72 Wt/Wt 145 82 42 75 140 90 23
II-4 - M 78 Wt/Wt 154 85 40 110 130 90 25
II-5 +: MI M 55 dead
II-6 +: MI F 68 dead
II-12 +: MI F 52 dead
II-13 +: MI F 50 dead
II-21 +: MI M 61 dead
III-1 +: MI/CABG
F 38 40 Mut/Wt 190 156 42 76 110 90 25
III-2 +: CABG M 43 47 Mut/Wt 183 107 47 78 120 90 26
III-3 +: MI/CABG
M 47 51 Mut/Mut 192 80 41 81 120 80 25
III-4 +: CABG M 50 62 Mut/Wt 174 151 40 92 120 90 25
III-5 +: CABG M 43 55 Mut/Wt 185 199 44 107 110 80 24
III-6 +: Angina
F 50 54 Mut/Wt 186 126 41 71 120 90 26
III-7 +: MI M 42 dead
III-9 M 35 Mut/Mut 83 202 49 148 25
17
6566
6768
-
III-10 - M 47 Mut/Mut 173 143 44 98 120 80 24
III-11 - F 36 Mut/Mut 209 158 41 110 130 90 26
III-21 - F 47 Wt/Wt 262 198 31 79 130 80 25
III-25 +: MI M 50 dead
IV-3 - M 28 Mut/Wt 122 218 43 95 100 60 24
IV-4 - M 25 Wt/Wt 127 68 36 73 110 70 25
IV-6 - F 20 Mut/Wt 200 59 66 88 100 70 21
IV-7 - M 30 Mut/Wt 279 117 43 92 118 70 25
IV-8 - M 21 Mut/Wt 52 74 50 91 100 60 22
IV-9 - M 19 Mut/Wt 175 77 63 87 120 90 24
IV-10 - F 25 Mut/Wt 85 72 69 87 110 90 25
P-valuesϒ 0.017 0.087 0.348 0.569 0.01 0.199 0.905
B: Family with p.*337Qext*20 mutation
ID CAD status¥ Sex Age at Present ST6GALNAC
5 LDL Triglycerides HDL Fasting Systolic
BPDiastolic
PB BMI
diagnosis age genotype (mg/dl)δ (mg/dl)δ (mg/dl)δ blood
glucose (mm Hg)β (mm Hg)β (kg/m2)
(yrs) (yrs) (mg/dl)δ
II-1 - F 62 Wt/Wt 173 91 42 190 170 90 23
II-2 - M 61 Wt/Wt 138 106 41 145 160 90 24
II-3 +: MI/CABG
M 51 53 Mut/Wt 181 132 39 203 160 90 24
II-4 +: CABG M 54 58 Mut/Wt 125 110 40 189 150 90 25
18
6970
7172
Tables A & B: ¥: +, CAD affected and -, CAD unaffected; δMeasured after 12 hour fast; β average of four measurements taken at five minute intervals in the lying position using a mercury sphygmomanometer; ϒ derived using Mann-Whitney test which is appropriate for small sample sizes. P-values were calculated using available data on individual III-7 and individuals listed above him in the table. The remaining data which pertain to individuals who are relatively young and who may later experience CAD were not used in the calculations; CABG, coronary artery bypass graft; MI, myocardial infarction; M, male; F, female; Mut, mutated allele; Wt, wild type allele; LDL, low- density lipoprotein cholesterol; HDL, high-density lipoprotein cholesterol; BP, blood pressure; BMI, body mass index.
C: Averages of data of subgroups of CAD-105 individuals grouped on basis of CAD status or mutation status¥
CAD (7)*
Without CAD (>70 yrs)(3)*
With mut
(16)*
With mut - het (12)*
With mut - homo (4)*
Without mut (5)*
LDL (mg/dl)184.3 (±5.6) 154.0 (±7.3) 166.8
(±54.1)167.6
(±55.8)164.3
(±48.6)170.2
(±47.4)
Triglycerides (mg/dl)
137.3 (±35.2
)86.3 (±4.2) 130.1
(±48.4)124.8
(±48.7)145.8
(±43.7)105.0
(±47.1)
HDL (mg/dl)
42.4 (±2.2) 41.0 (±0.8) 47.8
(±9.2)49.2
(±10.1) 43.8 (±3.3) 38.0 (±4.0)
Glucose (mg/dl)
87.1 (±13.3
)94.7 (±14.6) 94.1
(±17.5)89.1
(±10.2)109.2
(±24.6)87.2
(±14.7)
Systolic BP (mm Hg)
117.1 (±4.5) 143.3 (±12.5) 114.5
(±8.8)β112.3 (±8.1)
123.3 (±4.7)δ
134.0 (±16.2)
Diastolic BP (mm Hg)
85.7 (±4.9) 90.0 (±0.0)
80.7 (±10.6)
β 80 (±11.5) 83.3 (±4.7)δ
84.0 (±8.0)
BMI ((kg/m2)
24.9 (±1.0) 24.7 (±1.2) 24.4
(±1.4) 24.2 (±1.5) 25.0 (±0.7)24.8
(±1.0)Table C: ¥Averages based on data of part A of this table. *Number in parenthesis indicates number of individuals in the group for whom data is available; β, data on 15 individuals available; δ, data on 3 individuals available. With mut = with mutation p.V99M; Without mut = without mutation p.V99M; het, heterozygous for the mutation; homo, homozygous for the mutation.
19
7374
7576
Table S2- Genomic regions showing maximum linkage to CAD status in pedigree CAD-105*
Chromosome ChromosomalSNPs
bordering Nucleotide positions Size of linkedLOD
scoreβ No. annotated
bandlinked
regionsbordering linked
region (bp) region (cM) protein coding
genesψ
1 1p31.3rs11208724
& 66179203-68588299 3.49 2.2 21
rs1430751
1 1p22.3-rs7522428
& 75178360-79595720 3.94 2.2 47
1p31.1 rs6690294
19 19q13.33rs6509789
& 54042830-55879872 8.96 2.2 136
rs1126757 * Six CAD affected and two CAD unaffected individuals were included in the analysis; β analysis with MERLIN under autosomal dominant model. ψ based on human reference genome UCSC NCBI37/hg19.
20
7778
352
7980
Table S3- Summary of exome sequencing data
ID Target covered Total no. No. novel No. novel variants No. novel variants
at 10X (%)* variants δ variantsα affecting splicing affecting amino acid changesψ
CD-105-III-1 92 149274 7402 8 576
CD-105-III-2 92 167597 8282 10 581* target size was 62 Mb; δ variants detected with CASAVA (V1.8.1), Illumina- reference sequence UCSC NCB137/hg19); α variants that were absent in the NCBI dbSNP v130 and 1000 genomes databases and in the control exome sequences; ψ 283 of these were common to both patients.
21
8182
353
8384
Table S4- Candidate CAD causing mutations in pedigree CAD-105 based on linkage analysis and exome sequence data analysis
ChromosomeChromosomal
position Variation Gene Effect on Bioinformatics Segregation
with
of variation (bp)* protein sequenceprediction of
effectCAD status
in
on protein functionψ
CAD-105 pedigree
1 12336823 G>A VPS13D p.D1060N Tolerated -
1** 75172042 C>T CRYZp.E310K, p.E173K,
p.E276Kβ Tolerated -1** 77509922 G>A ST6GALNAC5 p.V99M Damaging +
1** 82437577 T>C LPHN2 p.I982T Damaging -2 179634421 T>G TTN p.T2963P, p.T2917Pβ Tolerated -2 198358148 T>C HSPD1 p.I257V Tolerated -2 227660612 G>C IRS1 p.P948R Tolerated -2 241569440 A>T GPR35 p.Y55F Tolerated -5 35871197 G>A IL7R p.R140Q Tolerated -5 59284478 G>T PDE4D p.L37I Tolerated -
19 14200072 T>C SAMD1 p.R247G Tolerated -19 15734118 A>T CYP4F8 p.N284Y Tolerated -
19** 55087463 G>C LILRA2 p.R381P Tolerated -19** 55087468 G>A LILRA2 p.G383S Tolerated -
19** 55493047 C>G NLRP2 p.P154A, p.P131Aβ Tolerated -*human reference genome UCSC NCBI37/hg19; ** variations within peak linked loci, β in different isoforms of encoded protein; ψ both SIFT and PolyPhen predictions
22
8586
354
355
356
357
8788
Table S5- Primer sequences used for amplification and sequencing of ST6GALNAC5 exons and amplicons containing candidate CAD causing variations
Gene/exon no. Forward Reverse
VPS13D/19* 5'-CTGGTGGATACCATGCAGACA-3' 5'-TGGGCACTCTGAACTCACAAA-3'
CRYZ/9* 5'-CACATAAGAAGCTCATGGAATCG-3' 5'-TTTTCTTAGGAGGAATTTCAGCA-3'
ST6GALNAC5/1 5'-GTGGGTACACTGGCTCGGTTA-3' 5'-GGAGCGGGTAGAAAGTTGTCC-3'
ST6GALNAC5/2 5'-TCTCCCCCACTAGAGTGACCA-3' 5'-AATGCGCTGGAGAGATCAGAG-3'
ST6GALNAC5/3 5'-CTGACATGATGGGGAGGAGAG-3' 5'-CCCACAACAAAAGCCTGTAGC-3'
ST6GALNAC5/4 5'-CAAATGGAGGAGAGAGGGAGA-3' 5'-GAATGAGAACTTGGGACATGC-3'
ST6GALNAC5/5aα 5'-CCTCAAAACCTCTCCACTTCC-3' 5'-ACTGGCCCAGATTGCACTAAA-3'
ST6GALNAC5/5bα 5'-ATGATGGTTGGAAATGGCCTA-3' 5'-GCCAGGTCATGTCTGTAAAACAA-3'
ST6GALNAC5/5cα 5'-CAAATTTGAAGGCACCAGCA-3' 5'-TGATGACACAGCTGGGAGAGA-3'
LPHN2/15* 5'-TAAGGATTCAGTGGTGGAGTGGT-3' 5'-TGGCCTATATGTAAGCATGTTTTT-3'
TTN/37* 5'-ATATCGAGGTGCCTGAGACCA-3' 5'-TTGGTTCTCATCTGGCACTTGT-3'
HSPD1/7* 5'-TGAAGGCATGAAGTTTGATCG-3' 5'-CCAAACACTGCACCACCAGTA-3'
IRS1/1* 5'-TCCACAGCTCACCTTCTGTCA-3' 5'-CTGTTCGCATGTCAGCATAGC-3'
GPR35/6* 5'-CCTGCTCACTCTCTGCTGACC-3' 5'-CGGCGTGTCTGAGGTGTCT-3'
IL7R/4* 5'-TGGAGGTAAAGTGCCTGAATTT-3' 5'-TGATCAGGGATGGATCGAACT-3'
LILRA2/6* 5'-CCCTCACCCATCCTTCTTCTC-3' 5'-GGTCCCTGCCTATTTCCACTC-3'
NLRP2/5* 5'-GCCGGAAAACACATTTGTAGC-3' 5'-CAGGCAGATCACCTGACATTG-3'*Amplified fragments contain position of candidate disease causing variation; αAll exons of ST6GALNAC5 amplified, long exon 5 encoding 3'UTR amplified in multiple reactions.
23
8990
358
9192
Table S6- Conservation of p.99Val in human sialyltransferase 7e at corresponding positions in paralogous and orthologous proteins.Paralogous p.V99 Sequence ID.*sialyltransferases
hST6GALNAC5 CRDCALVTSSGHLLHSR gi|21759442|ref|NP_112227.1
hST6GALNAC1 CITCAVVGNGGILNNSH gi|18490673|ref|NP_060884.1
hST6GALNAC2 CIRCAVVGNGGILNGSR gi|26996844|ref|NP_006447.2
hST6GALNAC3 CDLCAIVSNSGQMVGQ gi|48428651|ref|NP_001153483.1
hST6GALNAC4 CRSCAVVSSSGQMLGSG gi|21759443|ref|NP_778204.1
hST6GALNAC6 CHQCVIVSSSSHLLGTKL gi|48146753|emb|CAG33599.1
hST3GAL1 CRRCAVVGNSGNLRESS gi|1705559|ref|NP_003024.1
hST3GAL2 CRRCAVVGNSGNLRGSG gi|21759433|ref|NP_008858.1
hST3GAL3 RCRRCIIVGNGGVLANKS gi|284055255|ref|NP_777623.2
hST3GAL4 CRRCVVVGNGHRLRNSS gi|14714972|ref|gb|AAH10645.1
hST3GAL5 CRRCVVIGSGGILHGLEL gi|109633044|ref|NP_003887.3
hST3GAL6 CKKCVVVGNGGVLKNKT gi|403225034|ref|NP_001258075.1
hST8SIA1 LKKCAVVGNGGILKKSG gi|28279804|gb|AAH46158.1
hST8SIA2 FGTCAIVGNSGVLLNSG gi|64654294|gb|AAH96205.1
hST8SIA3 YNICAVVGNSGILTGSQC gi|110815855|ref|NP_056963.2
hST8SIA4 -KTCAVVGNSGILLDSEC gi|2494834|ref|NP_005659.1
hST8SIA5 KKCAVVGNGGILKNSR gi|80478739|gb|AAI08911.1
Orthologous proteins
Species
Human KMHCRDCALVTSSGHLLH gi|21759442|ref|NP_112227.1
Chimpanzee KMHCRDCALVTSSGHLLH gi|397472622|ref|XP_003807839.1
Cow KMHCRDCALVTSSGQLLR gi|300796298|ref|NP_001179733.1
Mouse KMHCKDCALVTSSGHLLR gi|148540137|ref|NP_036158.3
Chicken KMHCKSCALVTSSGHLLG gi|46425420|emb|CAG26706.1
Xenopus KMHCKTCALVTSSGHLLG gi|288562694|ref|NP_001165748.1
Zebrafish LKTHCRSCALVTSSGHMT gi|112182837|emb|CAL18608.1
*Sequence ID number from NCBI (http://www.ncbi.nlm.nih.gov/)
24
9394
359
360
9596
25
9798
361
99100
Table S7- ST6GALNAC5 sequence variations* observed in 160 CAD patients and 100 control individuals from Iran
In controls& patients
Only incontrols
Only in patients
No. controls
with variation
No. patiens
with variation
Allele frequency
inENSEMBL
**
Predictedeffect***
rs number
C>T/5' NC 2 3 0.03 Benign rs76305167
c.141G>A/p.Q47Q 2 3 0.167 Benign rs62637703
c.381C>A/p.P127P 3 8 0.02 Benign rs35763299
c.711A>G/p.T237T 3 5 0.004 Benign rs143357043
c.357T>C/p.C119C 1 0.003 Benign rs147610766
c.492G>C/p.Q164H 1 0.003 Benign rs189362082
c.536G>A/p.R179Q 2 <0.001 Benign rs200875685
c.759G>A/p.D257N 1 Benign Novel
p.*337Qext*20 2 Damaging Novel* All observed in heterozygous state; **http://www.ensembl.org; ***SIFT and PolyPhen predictions
26
101102
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
103104
Table S8- ST6GALNAC5 sequence variations* observed in 150 CAD patients and 800 control individuals from the USA
In controls In patients No. individuals Allele Predicted rs number
with variation frequency in effect***
ENSEMBL** c.474C>A/p.D158E 1 Benign Novel
c.607A>G/p.M203V 1 0.001 Benign rs151200060
c.616C>T/p.R206C 1 Damaging Novel
c.492G>C/p.Q164H 1 < 0.01 Benign rs189362082
c.301A>G/p.N101D 7 Damaging Novel
c.19C>G/p.H7D 1 0 Damaging rs200073725
c.1000C>A/p.P334T 1 Benign Novel
* All observed in heterozygous state; **http://www.ensembl.org; ***SIFT and PolyPhen predictions
27
105106
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
107108
Table S9- Statistical analysis on numbers of control and CAD affected individuals with rare sequence variations in ST6GALNAC5 that cause amino acid changes Rare ST6GALNAC5 variations that cause amino acid change
Iranian cohort US cohort¥ Combined
Iranian & US cohorts
No.controls with the variations/total no. controls 0/100 4/800 4/900
No. with the variations/total no. patients 6/160 2/150 8/310P value* 0.085 0.242 0.003βOdds Ratio -δ 2.69 5.93
(95% CI: 0.49-14.8) (95%CI: 1.77-19.85)βRare ST6GALNAC5 variations that cause amino acid change predicted to be damagingβ
Iranian cohort US cohort¥ Combined
No.controls with the variations/total no. controls 0/100 1/800 1/900
No. with the variations/total no. patients 2/160 1/150 3/310
P value* 0.525 0.291 0.054Odds Ratio (OR) -δ 5.36 8.79
(95%CI: o.33-86.21) (95%CI: 0.91-84.77)¥ Individuals with the p.N101D variation not included in the analysis because it appears to be a common polymorphism in the the population studied; *, Fisher exact test; δ, OR can not be calculated because of the zero value; β, the P value and the OR decrease, respectively, to 0.022 and 4.42 (95% CI: 1.24-15.77) if the p.R179Q mutation, which was not tested for enzyme activity and which was observed in of two Iranian patients, is not included in the analysis. Figures consistent with ST6GALNAC5 having a role in CAD are shown in bold.
28
109110
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
111112
Table S10- ST6GALNAC5 protein concentrations in extracts of untransfected COS-7 cells and cells transfected with vectors expressing wild type and mutatedST6GALNAC5 proteins as measured by an ELISA assay
Source of extractsβpg ST6GALNAC5/ ng total
proteinUntransfected cells 4.3 ± 0.1Cells expressing wild type protein (transfection 1) 16.4 ± 2.4Cells expressing mutated protein: p.Val99Met (transfection 1) 18.0 ± 0.42Cells expressing mutated protein: p.Val99Met (transfection 2) 16.8 ± 3.7Cells expressing mutated protein: p.Val99Met (transfection 3) 18.3 ± 1.7Cells expressing wild type protein (transfection 2) 19.5 ± 0.23Cells expressing mutated protein: p.*337Qext*20 (transfection 1) 18.10 ± 0.07Cells expressing mutated protein: p.*337Qext*20 (transfection 2) 17.6 ± 0.99Cells expressing mutated protein: p.*337Qext*20 (transfection 3) 18.2 ± 0.28 β: Each extract was assayed in triplicate.
29
113114
458
115116