Sequence of a cDNA encoding the β4 subunit of murine integrin

8
Gene, 130 (1993) 209-216 0 1993 Elsevier Science Publishers B.V. All rights reserved. 0378-I 119/93/$06.00 209 GENE 07227 Sequence of a cDNA encoding the p4 subunit of murine integrin (Lung carcinoma; splice variants; gene map) Stephen J. Kennela, Linda J. Footea, Letizia Ciminob, Maria Giulia Rizzob, Lan-Yang Chang” and Ada Sacchib Biology Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831-8077, USA; and b Molecular Oncogenesis Laboratory, Regina Elena Cancer Institute, Rome, Italy. Tel. (39-6) 498-5529 Received by R.W. Davies: 29 December 1992; Accepted: 2 March 1993; Received at publishers: 15 April 1993 SUMMARY A cDNA coding for the B4 subunit of murine integrin (m&) has been cloned and sequenced using mRNA from a murine lung carcinoma as the template. The 5’ sequence contains two AUG codons, the second of which initiates synthesis of the mature protein. The cDNA sequence has an open reading frame coding for 1748 amino acids (aa), including a signal peptide, cysteine-rich region, serine- and threonine-rich region, transmembrane domain, and a cyto- plasmic domain of over 1000 aa. Overall, the deduced rnP4 aa sequence has 88% identity with the human S4 subunit (hP4) sequence deduced from the sequence of placental mRNA. Reverse transcriptase-polymerase chain reaction using primers flanking splice sites for two variant forms of h& transcripts provided evidence for alternate splicing of RNA in the murine spleen and to a lesser extent in the skin, uterus, and thymus but was found at only one of the two alternative sites. Five potential glycosylation sites present in the extracellular domain of h& are conserved in m&. One tyrosine in the terminal region of the cytoplasmic domain (position 1600) is conserved between rnp* and h& and has the consensus sequence for tyrosine phosphorylation. Finally, a genomic restriction map of rnBo shows that the gene is about 40 kb in length. No restriction-fragment length polymorphisms were detected between BALB/c liver and BALB/c lung carcinoma DNA. INTRODUCTION The integrins are a large family of heterodime~~ (a$) cell surface glycoproteins which function in cell adhesion (Hemler, 1990; Springer, 1990; Hynes, 1992). The B4 sub- unit is unique among all of the 20 or so integrin subunits Correspondence to: Dr. S.J. Kennel, Biology Division, Oak Ridge National Laboratory, P.O. Box 2009, MS-8077, Oak Ridge, TN 37831-8077, USA. Tel.: (1-615) 574-0825; Fax (l-615) 574-1274. studied in that it has an extremely large (> 1000 aa) cyto- plasmic domain (Hogervorst et al., 1990; Suzuki and Naitoh, 1990; Tamura et al., 1990). This domain can asso- ciate with cytoplasmic proteins, bullous pemphizoid anti- gen and possibly cytokeratins, to form hemidesmosomal structures promoting adhesion of squamous basal epithe- lium to the basement membrane via the a,& complex (Carter et al., 1990; Stepp et al., 1990; Jones et al., 1991; Sonnenberg et al., 1991). The B4 subunit is also found on subsets of endothelial cells (Kennel et al., 1992), on devel- oping thymocytes (Phillips et al,, 1991), and in large amounts on carcinomas of diverse origin (Kajiji et al., 1987; Kennel et al., 1989), none of which displays obvious hemidesmosomal structures. Proteolytic cleavage of the cytoplasmic domain results in at least three discrete trun- cated forms of the molecule which can all interact with Abbreviations: aa, amino acid(s); Ab, antibody(ies); B.,, beta 4 integrin subunit; mB.,, murine ).&; hp.,, human B4;bp, base pair(s); EGF, epider- ma1 growth factor; GCG, Genetics Computer Group (Madison, WI, USA); kb, kilobase or 1000 bp; mAb, monoclonal Ab; NML, normal mouse lung fibroblasts; nt, nucleo~d~s); ORF, open reading frame; PCR, polymerase chain reaction; RFLP, restriction-fra~ent length polymorphism; RPE, retinal pigment epithehum; RT, reverse tran- scriptase; UTR, untranslated region.

Transcript of Sequence of a cDNA encoding the β4 subunit of murine integrin

Gene, 130 (1993) 209-216 0 1993 Elsevier Science Publishers B.V. All rights reserved. 0378-I 119/93/$06.00 209

GENE 07227

Sequence of a cDNA encoding the p4 subunit of murine integrin

(Lung carcinoma; splice variants; gene map)

Stephen J. Kennela, Linda J. Footea, Letizia Ciminob, Maria Giulia Rizzob, Lan-Yang Chang” and Ada Sacchib

’ Biology Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831-8077, USA; and b Molecular Oncogenesis Laboratory, Regina Elena Cancer Institute, Rome, Italy. Tel. (39-6) 498-5529

Received by R.W. Davies: 29 December 1992; Accepted: 2 March 1993; Received at publishers: 15 April 1993

SUMMARY

A cDNA coding for the B4 subunit of murine integrin (m&) has been cloned and sequenced using mRNA from a murine lung carcinoma as the template. The 5’ sequence contains two AUG codons, the second of which initiates synthesis of the mature protein. The cDNA sequence has an open reading frame coding for 1748 amino acids (aa), including a signal peptide, cysteine-rich region, serine- and threonine-rich region, transmembrane domain, and a cyto- plasmic domain of over 1000 aa. Overall, the deduced rnP4 aa sequence has 88% identity with the human S4 subunit (hP4) sequence deduced from the sequence of placental mRNA. Reverse transcriptase-polymerase chain reaction using primers flanking splice sites for two variant forms of h& transcripts provided evidence for alternate splicing of RNA in the murine spleen and to a lesser extent in the skin, uterus, and thymus but was found at only one of the two alternative sites. Five potential glycosylation sites present in the extracellular domain of h& are conserved in m&. One tyrosine in the terminal region of the cytoplasmic domain (position 1600) is conserved between rnp* and h& and has the consensus sequence for tyrosine phosphorylation. Finally, a genomic restriction map of rnBo shows that the gene is about 40 kb in length. No restriction-fragment length polymorphisms were detected between BALB/c liver and BALB/c lung carcinoma DNA.

INTRODUCTION

The integrins are a large family of heterodime~~ (a$) cell surface glycoproteins which function in cell adhesion (Hemler, 1990; Springer, 1990; Hynes, 1992). The B4 sub- unit is unique among all of the 20 or so integrin subunits

Correspondence to: Dr. S.J. Kennel, Biology Division, Oak Ridge National Laboratory, P.O. Box 2009, MS-8077, Oak Ridge, TN 37831-8077, USA. Tel.: (1-615) 574-0825; Fax (l-615) 574-1274.

studied in that it has an extremely large (> 1000 aa) cyto-

plasmic domain (Hogervorst et al., 1990; Suzuki and Naitoh, 1990; Tamura et al., 1990). This domain can asso-

ciate with cytoplasmic proteins, bullous pemphizoid anti- gen and possibly cytokeratins, to form hemidesmosomal structures promoting adhesion of squamous basal epithe-

lium to the basement membrane via the a,& complex (Carter et al., 1990; Stepp et al., 1990; Jones et al., 1991; Sonnenberg et al., 1991). The B4 subunit is also found on subsets of endothelial cells (Kennel et al., 1992), on devel-

oping thymocytes (Phillips et al,, 1991), and in large amounts on carcinomas of diverse origin (Kajiji et al., 1987; Kennel et al., 1989), none of which displays obvious hemidesmosomal structures. Proteolytic cleavage of the cytoplasmic domain results in at least three discrete trun-

cated forms of the molecule which can all interact with

Abbreviations: aa, amino acid(s); Ab, antibody(ies); B.,, beta 4 integrin subunit; mB.,, murine ).&; hp.,, human B4; bp, base pair(s); EGF, epider- ma1 growth factor; GCG, Genetics Computer Group (Madison, WI, USA); kb, kilobase or 1000 bp; mAb, monoclonal Ab; NML, normal mouse lung fibroblasts; nt, nucleo~d~s); ORF, open reading frame; PCR, polymerase chain reaction; RFLP, restriction-fra~ent length polymorphism; RPE, retinal pigment epithehum; RT, reverse tran- scriptase; UTR, untranslated region.

210

ct6 to form the OI& complex (Hemler et al., 1989; Kennel et al., 1989; 1990). The unique size of the cytoplasmic domain and the complexity of the molecules, cellular dis- tribution, and apparent binding function have stimulated several studies on p4 structure. The human cDNA has been cloned from a keratinocyte cDNA library (Hogervorst et al., 1990), by PCR from a RPE cell line and placental mRNA (Suzuki and Naitoh, 1990), and from a cDNA library from pancreatic carcinoma cells (Tamura et al., 1990). The extracellular domains from these sources are essentially identical, but the ~ytoplasmic domains of p4 from the pancreatic cell and keratinocyte cDNA contain inserts of 70 and 53 aa, respectively, not present in the sequence from RPE and placental RNA. In this paper, we report the sequence of cDNA from a murine lung carcinoma cell, Line 1. The deduced murine aa sequence has almost 90% homology with the human sequence, but only one of the two human splice variants

was detected. The rn& gene spans approximately 40 kb and does not show evidence for rearrangement in Line- 1 tumor cells, where it is expressed in very high amounts.

RESULTS AND DISCUSSION

(a) Nucleotide sequence encoding murine JR+ integrin The cDNA fragments cloned for sequence analysis are

shown in Fig. 1 B. Relatively long (2.1-3.0 kb) clones were identified in oligo(dT~p~med libraries. In general, small clones for the extracellular domain were found with Ab screening of hgtl 1 S-STRETCH libraries (Fig. 1B). Inserts from clones positive for Ab screening were ampli- fied by asymmetric PCR using commercial hgt 11 primers flanking the cloning site. Partial sequences of these single- stranded amplifications were obtained using nested hgt 11 primers (see Table I), Clones with B4 sequence were used

A GeneBic DNA (kbi 0 10 20 30 40 50 I I I L I I

6 E 8 HH9 H 8 @HE X H B Ha X8 E I i Ii ii I

_)!p*o 300 i50 II I 34 H4 Ii3 :7-t

0 -n- --oD-- D

ml El 300 350 r: 1 150 34 H4 Ii3 37-1

r-i” - , I , i , 0 i 2 3 4 5

cDNA (kb)

B -

3116 - H-l

K-7

16Ab34 16Ab20

16Ab-47: - PCR-+ H5

-

37-19 T2-5- 37-64 -

Ab probed

PCR

37-1 12-i

Probe BP9.3

16-if6 .

Probes 150 - 16-132 and 350 16-130

Fig. I. Murine genomic DNA for the & subunit of integrin. (A) Restriction map. High M, DNA was isolated from Line-l cells or BALB/c liver and cleaved with BarnHI (B), EcoRf (E), Hind111 (H), Xbaf (X). Boxed letters indicate restriction sites present in the cDNA. Probes were labeled by random hexamer priming (Ohgolabelhng kit Pharmacia, Uppsala, Sweden) and used for Southern blots. The cDNA probes used to generate the map and their respective locations (see Fig. 2) are: 300, cDNA l-299; 350, cDNA 602-938; 192, cDNA 933-l 124; 450, cDNA 933-1349; 150, cDNA 1254-1389; 34, cDNA 1588-1941; H4, cDNA 3457-3698; H3, cDNA 3799-4083; 37-1, cDNA 5460-5757. Hybridization was in 50% formamide (~nchik et al., 1989) at 42”C, and blots were washed finally in 0.1 x SSC (Sambrook et al., 1989) at WC. Genomic DNAs cut with two enzymes were analyzed in the same way. Bars indicate the limits within which the ends of the probes could be positioned. SSC is 0.15 M NaC1/0.015 M Nacitrate pH 7.0. (B) Schematic diagram of cDNA for S4, the cloned segments used to determine the sequence, and the method used to isolate these clones from cDNA libraries. Also shown are the positions of the probes used to establish the genomic map and the restriction sites (boxed) that occur in cDNA.

for subcloning into pGEM-72 for sequencing. Two gaps were cloned by RT-PCR amplification from mRNA (Table I), and the final 5’ sequence was determined from clones identified in the hgtll STRETCH library with DNA probes 150 and 350. The cDNA sequence and deduced aa sequence are shown in Fig. 2.

The cDNA sequence of m& has 84% identity to the h&+ cDNA sequence. The extent of identity is much lower in the 5’-UTR (55’~) and in the 3’-U7X (48%). The S- U7X of the murine cDNA contains two traditional start codons (AUG). The first, at position I 17, is not conserved in the human sequence (Tamura et al., 1990) and does not appear to function as a start codon. Although this codon has an A at position -3, it does not have a G at position + 4, as is usually found in consensus start signals (Kozak, 1991). Furthermore, this AUG has a TGA stop signal, in-frame, five codons downstream. The putative start codon for the rnP4 mRNA translation is found at position 288-290, is conserved in the human sequence, has A at position - 3 and G at position + 4, and starts an ORE; of 1748 aa. The use of the second AUG is unusual in that over 90% of eukaryotic messages begin translation at the first AUG (Kozak, 1991). The presence of the AUG codon at position 117 would tend to inhibit translational efficiency as would the extremely high G + C content (>70%) of the S-UTR.

211

(b) The aa sequence of murine &, integrin Comparison of the aa sequence (Fig. 3) with that

deduced from human cDNA (Hogervorst et al., 1990) demonstrates 88% identity and 93% similarity, as defined by the GCG software package (Devereux et al., 1984). Five sites are noted where 1 aa has been deleted or added in comparing the human and mouse deduced sequences. In addition, the mouse sequence is missing 3 aa starting at position 1104. This deletion was found in only one of two clones sequenced in this region. Every one of 48 Cys residues in the Cys-rich region are conserved, indicating tertiary structural homology as well. Cys4*l and Cys564 mark the beginning of two EGF-like repeats in the pro- tein structure, characteristic of integrin p subunits.

All five Asn glycosylation sites (Miletich and Braze, 1990) in the extracellular domain are conserved, as are the first 2 flanking aa on each side. A potential site for the addition of glucosamino~y~ans (Bourdon et al., 1987) is present in both the murine and human sequences midway through the extracellular domain (position 515). There are at least 10 possible sites for Ser or Thr phos- phorylation; however, the cytoplasmic domain has only one consensus sequence for Tyr phosphorylation (Tyri6*‘). This is consistent with Tyr phosphorylation occurring only in the most C-terminal regions of the b4 protein (Kennel et al., 1989). Phosphorylation of b4 can

TABLE I

DNA primers

Primer No. Sequencea cDNA positionsb Use’

321 S-GCGAATTCIGCTTCTCTGTCAGCTTCAGCA 2999-2980 328 S-TACTCGAGIAACTTCCAGTGTCCCCGGACC 1911-1926 423 5’-GACTCGAG]GCTGCTGAAGCTGACAGAGA 2918-2997 424 S-GCGAATTC]TATAGGAGACCTGGGACTTGCC 3366-3345

89 S’-TCAGTATC GGCGGA hgtll 90 S-GTAGCGAC CGGCGCTCAG hgtll

619 5’-CGAGAGAGAGGCTATCATCAA 4201-4222 680 S-GGACGTGC TCAGCACTCGGT 4520-4500 702 S-GACTCGAGCAGCGTTTCTGATGAC 4367-4382 703 S-G~GAATTCGGC~ATT~A~CAGGTGCTC 4404-4386 681 S-CCACTCTC TGACACGCACAG 4541-4560 682 S-TAGGGATGTTGAGCCGATGCA 48004780 700 S-GCGAATTCJTGTGTCAGGCACACCCACAGC 4655-4635 701 S-GACTCGAGICTACTCCACCCTCACTTC 4592-4609 386 S-TGCCAGGC CTGGGGCACCGGGGAGA 2193-2217 387 5’-TCTTCCTT AAAGCCCACCAT 2566-2547 687 S-GTGGATCT GTATATCC 678-693 688 S-GCTGCAAC ACCCAGG 469-483

PCR-I PCR-I PCR-2 PCR-2 Sequencing Sequencing Test for insert A Test for insert A Nested insert A Nested insert A Test for insert B Test for insert B Nested insert B Nested insert B Sequencing Sequencing Sequencing Sequencing

“Symbol ] indicates restriction site sequence added to cDNA sequence to aid in cloning. Methods: Oligodeoxyribonucleotide primers were synthesized on an Applied Biosystems PCR-Mate Model 391 DNA synthesizer. Primers were prepared on 0.2 uM cartridges with the ‘trityl OR’ program, cleaved, and deblocked with concentrated NH,OH, hydrolyzed, dried under vacuum, redissolved in 0.5 M ammonium acetate, precipitated with ethanol, redissolved in 10 mM Tris*HCI 1 mM NazEDTA pH 8.0, quantitated by absorbance at 260 nm, and used without further purification. bSee Fig. 2 for cDNA position numbers. CThe primers described in this table were used for RT-PCR cloning for segments 1 and 2 (PCR-I, PCR-2), to extend nt sequence determination in long cDNA clones, to test for inserts A and B using RT-PCR, and to reamplify insert A RT-PCR with nested primers for subcloning.

212

be insulin-induced (Falcioni et al., 1989), and the position- ing at the C-terminal region of the protein may allow regulation through selective protease cleavage (see section c below).

(c) Structure of the &+ protein The B4 protein can exist in severaX truncated forms

while remaining associated with ff6 (Kennel et al., 1990). Work with mAb has shown that the shorter forms, X83 and I: 50 kDa, are missing the extreme C-terminal portion of the Pa chain (Kennel et al., 1990). Northern blot analy- ses (Kennel et al., 1992; and data not shown) indicate that little, if any, variation in mRNA length can be detected to account for the different sizes of the B4 proteins. Thus, the alternate forms are likely due to selective protease cleavage of p4 in situ. Several clusters of basic aa are present as potential protease cleavage sites in the eyto- plasm& domain, as has also been suggested for the human protein (Hoge~orst et al., 1990). It is possible that prote- ase action on the ~ytoplasmic domain alters hemidesmo- somal formation (Carter et al., 1990; Sonnenberg et al., 1991), removes the phosphorylation site, or even may alter the conformation and specificity of the extracellular binding domain (Hynes, 1992). Experiments are in pro- gress to test these hypotheses.

(d) RT-PCR The iarge (?O-aa) insert (position 1366, designated posi-

tion A; Fig. 3) found in human B4 from pancreatic carci- noma (Tamura et al., 1990) was not present in four independent clones isolated from the human keratinocyte cDNA library (Hogervorst et al., 1990), nor was it found in any of the six independent clones sequenced from Line- I cDNA libraries. A second insert designated B (Fig. 3) was also identified (Hogervorst et al., 1990). Primers for RT-PCR (Table If were synthesized which flanked the sites of the two insertions (A and B; Fig. 3) to test for the presence of variant mRNA species which had the inserts.

Samples of RNA from uterus, thymus, liver, spleen, intes- tine, Line-l tumor, skin, lung, and two cell lines were tested. The major reaction products detected by ethidium bromide staining (Fig. 4A) from RT-PCR for site A had the size (319 bp) predicted by the mouse cDNA sequence in Fig. 2. Only small amounts of reaction product were obtained for uterus (lane 2) and lung (lane 12); however, repeats of the RT-PCR reactions verified that the pre- dicted 319-bp fragment was present (data not shown). PCR with nested primers showed only the nonspliced product (data not shown); thus, the insert A was not found in mouse tissue and may be unique to human tumor cells (Tamura et al., 1990).

RT-PCR products from splice site B (Fig. 4B) demon- strated mostly the predicted 259-bp fragments with the exception of spleen (lane 5), where a larger fragment of about 400 bp was present. Closer examination of the stained gel and Southern blot analyses (Fig. 4C) showed that reaction products of RNA from uterus, thymus, and skin also contained small amounts of the 400-bp product. Nested primers containing restriction sites for cloning were used in PCR to amplify the internal DNA segment, using the gel purified 400-bp piece as template. The resul- tant fragment was subcloned and sequenced. The deduced aa sequence of the 159-bp insert (Fig. 4) shows 80% homology with the human insert at the same site (Hogervorst et al., 1990). It is interesting to note that the insert interrupts a codon, altering the flanking aa on each side of the inserted sequence. Although no attempt was made to quantitate the RT-PCR products, the alterna- tively spliced product appears to be a minor form in all tissues except spleen. Expression of B4 protein in mouse spleen is chiefly in vascular endothelial cells (Kennel et al., 1992). It is possible that this form has some specialized function in the endothelium, although it is clearly pro- duced in a human keratinocyte cell line as well.

Genomic DNA digests were analyzed by Southern blotting using eight independent fragments of cDNA. A

_____~ -_--.~ - Fig. 2. Complete nt sequence of cDNA and deduced aa sequence for rn& inte~~n. Metbodsr Dlig~dT) and S-STRETCH libraries (specially designed for longer, full-length clones) from Line-c cell mRNA were custom prepared in hgtl 1 by Qontech Laboratories (Palo Alto, CA, USA). Libraries were screened with DNA probe BP93 (hBa cDNA, Fig. 1) provided by Dr. Martin Hemler (Boston, MA) or with rabbit polydonal Ab to rn& and mAb 346-l IA as previously described (Kennel et al., 1992). Relevant clones were identified by sequencing hgtI I inserts using nested primers after asymmetric amplification to produce single-stranded DNA as previously described (Kennel et al., 1992). The inserts were subcloned into Bluescript (Stratagene, La Jolla, CA, USA) or pGEM-7Z (Promega, Madison, WI, USA) vectors for final sequencing. RT-PCR from mRNA was done with a Perkin-Elmer Cetus GeneAmp RNA PCR kit as recommended by the manufacturer, in a Perkin-Elmer Cetus thermocycler. The RT reaction was conducted at 37°C for 20 min or I h. Resultant cDNA was amplified according to the program: I cycle, 2 min at 95°C; 35 cycles, 95”C, 1 min, 60°C 13 min: and I cycle, 7 min at 60°C. PCR products from reactions primed with primers cantaining restriction sites were digested with EcoRI -I- Xhol for force cloning in pGEM-7Z. The nt sequence was determined with a Sequenase Version 2.0 kit (US Biochemical, Cleveland, OH, USA) according to the manufacturer’s recommendations. Sequencing was done in both directions, with most areas confirmed by data from overlapping clones. Sequence analyses and afignments were done using the GCG software package (Devereux et al.. 1984). The two AUG codons in the 5’ leader sequence are boxed. The signal sequence at the N-terminal end is underhned heavify. Dash/dot underlining shows the transmembrane region. A potentiaf site for glucosaminoglycan addition is underlined with a broken line. Cys at the beginning of the EGF-like repeats are circled, and potential sites for carbohydrate attachment in the aa sequence are boxed. The nt sequence reported in this paper appears in the EMBL GenBank and DDBJ Nuclcotide Sequence Ratabases under accession No. LO4678.

213

120 240

360 25

480 65

600 105 720 1.45 840 Ias 960 225

~cATCtCAGGCAACCTGGACGCTCCTCAAGGGGGCTTTGAtGCCATCCTGCAGACAGCTGTGTGCACAAGGGACAtTGGCTGGAGGGCTGACAGcACCCACCTGCtGGlGTTCTCCACCC 1080 ISGNLDApEGGFDAILOtAVCTRDIGURADSTHLLVFSTE 265

AGTctGCCTTC~ctACGAGGCTGAtGGTGC~CGTTCtGGCCGG~TCAT~ACCG~TGATGA~ATGCCACCTGGACGCCTCGGGCGcCTACACCC~TACM~CA~GGACT 1200 sAFHYEADGANVLAGINNRNDEKCtILDASGAYTOYKTODY 305

A~~~A~CAG~GCCCACGCTGG~TCGCCTGCTTGCCAAGCAT~CATCA~CCCCATCTTTGCTGTCACCAACTACTCTTACAGCTACTATGAGAAGCTCCATAAGTATT~CCCCG~C~CCT 1320 ~SVPTLVRLLAKHNIIPIFAVT~~]YSYYEKLHKYFPVS~ 345

cTcTGGGCGTccTGcAGGAGGATTcATCCAACATCGTGGAGCTGCTGGAGGAGGCCTTCTATc~AtTCGCtCCAACCTGGACAlCCGGGCTCTGGACAGCCCcA~GGCCTGAGAACAG 1440 LGVLQEDSSNIVELLEEAFYRIRSNLDIRALDSPRGLRTE 385

AGG~~ACCTCCGATACTCTCCAW\AGACGGAGACTGGGTCCTTTCACATCAAGCGGGGGGAACTGGGCACATACMTGTGCATCTCCGGGCAGTGWGGACATAGA~GGGACA~ATG~GT 1560 ~~S~TLOKTETGSFHIKRGEVGTYNVHLRAVEDIDGTNVC 425

G~~AGCTGGCTAAAGAI\CCCMGGGGGC~CATC~CCT~CCCTCCTTCTCT~TGGCCTCCGGATGGACGCGAGTG~~TCTGTGACGTGTGCCCCTG~GAGCTGCAAAI\GGAAG 1680 o L A K E 0 0 G G N I Ii L K PS F SD G L R N D A S V I CD VC P C E L 0 KE V 465

ttcGAtCAGCTcGcTGTCACTTCA~GGAGACTTCATGTGTGGACACTGtGTGtGC~TGAGGGCTGGAGTGGCAAAACCtGCAACTGCTCCAccGGCTCTCTGAGT~CACAcAGccCT 1800 RSARCIIFRGDFNCGH@VCNEGUSGKTC~jTGSL$DTPPC SOS

GccTGcGTCAGGGTGAGGACAAACcGTGCTCGGGCCACGGCGAGTGCCAGtGCGWCttTCTGTGTGtGCTATGGTGAAGGCCGCTACGAGGGTCACTTcTGCGAGTATCACAACTTCCAGT 1920 LREGEDKPCSGHGEC~CGR~VCYGEGRYEGHFCEYDNFOC -_______ 545

GTccccG~CcTctGGATTCCTGlGCAATtl\CCGGG~CGCtGttCTATGGGAGAGTGtGtGTGtGAGCCTGGTTG~~GGCCG~GCTGC~CTGTCCCCTcAG~TGCCAcCTG~ 2040 ~RTSGFLCNDRGRCSNGE@VCEPGUTGRSCDCPLS~~CI 585

~CGA~AGCAACGGGGGCATCTGCAACGGCCGAGGCTACTGTGAG~GTGGCCGTTGTCACTGCAACCAGCAGTCGCTCTACACGGACACCACCTGTGA~TCAA~~A~TCTGCGATAC~GG 2160 DSNGG I CNGRGYCECCRCHCNOOSLYTDTTCE IjrjAl~G 625

G~~~~~GTGAGGA~CTCCGCTCCTGCGTACAGTGCCAGGCCTGGGGCACCGGGGAGAA~AAGGGCGCGCGTGTGACGATTGCCCCTTTA~GTCAA~TGGTAGACGAGCTTAAGAAAG 2280 LCE0LRScVOCOAUCTGEKKGRACDDCPFKVKUV0ELKKE 665

~GAGG~GGTGGAGTACTGC~CCTTCCGG~ATCI\GGAT~C~C~G~CTTACAGCTACMCGTGGAGGGC~CGGCAGCCCTGGGCCCAA~GCACAGTCCTGGTCCACAAAAAG~AG 2400 EVVEYCSFRDEDDDCTYSYNVEGDGSPGP\~~VLVHKKKD 705

AcTGcCTCCCGGCTCCTTCCTGGTGGCTCATCCCCCTGCtCATCTTCCTCCTGtTGCTCClGGCGTTGCTTCTGCTGCTCTGCTGGAAATACTGTGCCTGCTGcAAAGCCTGCCTGGGGC 2520 ~LPAPSUULI~LLIFLLLLLALLLLLCUKYCACCKACLGL 745

tTcTTCCTTGCTGCAACC~~G~i~~~G~C~~~~~A~iC~~A~i~iT~t~47;A~~t~i~~t~~A~~CCTGW\CAcGcCCATGCtAcGAAGCGGGAAccTcA 2640 L~~~NRGNu~GFKEDH~NLRENLNA~c~NLDT~NLRsGHLK 785

AGGGACGAGACACAGtCCGCTGGAACIitlCCMCACC~CAATGTGCAGCGCCCtGGCTTTGCCACCCATGCCGCCAG~C~GCCC~CGGAGCtCGTACCClACGGGcTGTcCCTGcGCCttG 2760 G R D TV R Y K IT N NV Q R P G F AtH A A S t S P T E L V P Y G L S L R L G 825

GCCG~CTCTGCACTGAGAACCTTATG~GCCCGGCACCCGAGAGTG~GACCAGCTACGCCAGGAGGTGGAGGA~ATCTGAAT~GGTGTATAGACAGGTCAGCGGCGCACA~AGCTCC 2880 RLCTENLMKPGTRECDOLRoEVEENLNEVYRoVSGAHKLO 865

AGcAGACGAAGtTCCGACAGCAGCCCAACGCCGGGAAAAAGCAAGACCACACCATTGTGGACACAGTGTTGCTGGCGCCCCGCTCCGCC~GCAGAtGCTGCTGAAGCTGACA~GAAGC 3000 o T K F R 0 0 P N A G K KOD H T I VD TV L LAP R SA KON L L K L T E KQ 905

AGGTGGAGCAGGGGTCCTTCCAT~ACTGAAGCTGGCCCCTGGCTACTA~ACTGTCACGGCAGAGCAGGATGCCCGGGGCATGGTG~GTTCCAGGAGGGCGTGGIGCTGGTGGATGTGC 3120 VE 0 G S F HE L K VAP G Y Y T v T AEOD A R G FIVE F QE GVE L VD VR 945

GAGTGCCCCTCtTCATCCGGCCTGAG~T~TGAt~~GCAGCTGCTGGtGGAGGC~lT~tGtCCCTGt~G~CtGC~CCCtTGGTCGCCGtCtGGTAAACAtCACCATTAtCA 3240 VPLFIRPEDDDEKOLLVEAIDVPVSTATLGRRLVNIlIIK 985

AGGAACAAGCtAGtGGGGTAGtGTCCTTCCAGCAGCCTGAATACTCGGTGAGTCGTGGAGACCAGGTGGCCCGCATCCCTGTCATCCGGCACATCCTGCACAtGGCAAGTCCCAGGTCT 3360 E~ASGVVSFE~PE~SV~RGD~~ARI~~IRHIL~JNGK~~~~ 1025

CCTATAG~A~A~AGGATAATACAG~ACA~GGA~~CGGGATTATGTTCC~G~GGAGGGA~G~TG~TG~T~~ATC~TGGGG~CCTG~AG~GTTGCAGG~GMGCTA~TGGAG~TG~ 3480 yStQONtANGHRDYVPvEGELLFHPGETUKfI.OVKLLElQ 1065

AGGAGGTTGACTCCCTCCTGCGTGGCCGCCAGGTCCGCCGCTtCCAAGTCCAACTCAGCAACCCCAAGTtCGGAGCCCGCCTGGGCCAGCCCAGCACMCCACCGTTATTCTCGATGAAA 3600 EVDSLLRGROVRRFOVOLSNPKFGARLGOPSTTTVILDET 1105

CGGACAGGAGTCTCATA~TCAAACACTTTCATCGCCtCCGCCACCCCATGGAGACCTGGGCGCCCCACAGAACCCC~TGC~GGCTGCCGGATCCAGGAAGATCCATTTCAACTGGC 3RD DRSLINOTLSSPPPPHGDLGAPONPNAKAAGSRKIHFNUL 1145

TGCCCCCTCCTGGCMGCCAAtGGGGTACAGGGT~AGTACtG~TCCAGGGCGACtCT~AtCt~GCCCACCTTCtA~tAC~AGGt~CCtCAGTGGMCTCACCAACCtGtAtc 3840 PPPGKPNGYRVKYUIOGDSESEAHLLD~K~~SVEL~NLYP It85

CCTATTGCGACTACG~TGAAGGTGTGTGCCTAlGGGGCCAAGGGTGAGGGGCCCTATAGCTCACTGGTGTCCTGCCG~CCCACCAG~AGTACCCAGtGAGCCAGG~GGClGGCtT 3960 YCDYENKVCAYGAKGEGPYSSLVSCRTHOEVPSEPGRLAF 1225

TCAATGTAGTCTCTTcTACGGTGACtCAGCTGAGCTGGGCAGAGCCAGCTGAGACCAATGGCGAGATCACAGcCTACWGtTCTtCTGCTAtG~CTGGTCAATGAG~CAACAGACCCATtG 4080 NVVSStVTOLSYAEPAETNG.EltAYEVCtGLVNEDNRPIG 1265

GACCTATGMWAGGTGcTCGTGGAcAACCCCM~ACCG~TGCTGCTCATTGIGAATCTGcGAGATTCCCAGCCATAcc~TACACGGTTMGGCGCGCMtGGGGCAGGATGGG~c 4200 PUKKvLVDNPKNRNLLIENLRDSoPYRyTVKARNGAGUGP 1305

CCGAGAGAGAGGCTATcATCMCC7CCCfACACAGCCC~GCGGCCCATGtCCATCCCTAT~TCCCA~~TCCCCAlAGTG~CGCCCAG~tG~~~CTAC~~CTTCCTtA 4320 EREAIINLATOPKRPNSIPIIPDIPIVDAOGGEDYENFLN 1345

tGtACAGTGAtGACGtcctGCGGTCcccAGCCAGCAGCCAGAGGCCCAGCGtttCtGATGACACtGAGCACCtGGtWUtGGCCG~TGCACtttGCCtAtCULGGCAGCGCCAACtccc 4440 YSDDVLRSPASSQRPSVSDDTEHLVNGRNDFAYPGSANSL 1385

TGCACAGAATCAcTGcAGccAATG7GGccTATCGCACGCATCTGAGCCCACACCTGTCCCACcGAGTGCTGAGCAcGTcctccACCCtTAccCGGGACTACcACTCTCTGAcACGcAcAG 4560 HRWTAANVAYGTHLSPWLSHRVLSTSSTLTRDYHSLTRTE 1425

AG~ACT~~~A~~~AGG~A~A~TTC~~AGGW\CTA~TCCACCCTCAC~TCCCTTTCCTCCC~~~TCCCGTGGG~~GTGGG~GTGCCT~~A~CC~C~~GGCTGG~G~~CT~TG~~~ 4680 HSHSGTLPROYStLTSLSSOOSRGAVGVPDtPTRLVFsAi. 1465

tGGGGcGcAcGTcTtTGMGGTGAGctGGCAGGAGCCACAGTGtGATCGGACGCTGCTGGGCtACAGTGTGGMTACcAGcTACTMCGtGcGTC~~TG~TCGGCTcAACATcccTA 4800 GRtSLKvSUQEPOCDRtLLGYSVEYOLLTCVEMtIRLNlpN 1505

AccCTGGccAAAccTcGGTGGTGGtAGAG~TCTCCTGCCT~CTACTCTUTGTGTTCCGGGTACGGGCAcACIGccAGGAGGGCTGGGGTC~GAGC~GAGGGTGTcATCACcATcG 4920 PGQT$vvvsDLLPNYSNVFRVRAoSQEGUGREREGVITIE 1545

AGTCCcAGGtG~cccGCllGllGCCCfCrCTGCCCcCTGCCAGGCTCA~CttCACTCt~GCACC~~GCGcCC~G~ccACT~TGTt~CTGC~TMGCCcA~cTcCccGcAGc 5cUD S~~NP~SPLCPLPGSAFTLSTPSAPG~LVFTALSPD~~~L 1585

TcAGCTGGGAGcGGcc~GGAGccGc~tGGATATCCTTGGCTACCTGGT~CCTGTGAGATGGCCCMG~G~GcAccAGCCAG~cc?TCCGGGTG~cG~GAcMcccTGA~ 5160 SUER~RSRNGDILGYLV~CEUAOGGAPARTFR~DGDN~E~ 1625

GccGGTT~cTGTAccTGGcctcAGTGAGAACGTtcctTACAAGTTCMGGTTCAGGCCAGCACGACCGAGGGCTtTGGGccAGAGCGT~GGGTAT~T~c~TcGAGTcTcAGGTTG 5280 RLTvpGLSENVpYKFKVOARtTEGFGpEREGIItIRsovG 1665

~GGccccTTcccA~GcTGGGcAGcAATTcTGGGcTcTTCCA~CC~GTGC~GC~GtTCAGCAGcGTGAccAG~cG~cAGcAc~c~cT~GcccTTcct~TGGATGGTc 5400 G~F~QLG~NSGL~ONPVOSEFSS~~STNSTTTE~FLNDGL 1705

TAA~~~TGGGGA~~~AG~G~~~GG~G~AGGAGG~T~~~TCACCCGGCATGTGACCCAG~TTCGTW~~~~GGA~~TT~~GG~~AGTGG~T~T~T~AG~A~T~ATATG~~~~~AG~ ??520 TLGT~RL~AGG~LTRHVTOEFVTRTLTA~G~L~THND~~F 1745

TcTtccAAAccTGAAcctcccccGcGccccAGCCAcctGGGCCCcTCCTTGCCtCCtCTCCTAGCGCCTTcttCCtctGctGcTctAcccAcGAGcttGcT~ccAcAGAGccAGcccct 5640

F 0 T * 1785

GTAGT~A~GAG~AGGGGTAGG~G~TG~~~AGGAA~~ATA~GTGGGTAGAGGTGATACAAGG~C~TT~T~CTG~AT~~~A~~~TGGG~~~~~~~~A~ATGTAA~~*~*~~~*~~ 57’j7

214

1 ~~~CC~~LLLARHL;ASLP[DLAN~CKKIIPVKSC~ECIRVOKSC~YCTDELFKEI;RCNTOADVL;AGCRGESIL;CIESSLEITG~TPIVTSLHR~ 100 III1 .lll.:!II I ::l'll'l'llllltll'lllllll:llll'lllllll:l::

I II I 11111111 IIIIIII IIII Illllll I IIIIIIl..IIIII IIIIlII"IIIII'

lll.lllll..lll.lll I I.11 lll'lllll"lll'lll l'l'll

~GPRPSPWARLLLM.LISVSLSGTLANRCKKAPVKSCTE~~~CAYCTDEUFRDRRCNT~LLMGCORESI~ESSFOITEETGIDTTLRRS 99

GVSWGLO~LRRGEERTF;FGVFEPLES~M)LYIL~F~NSnSbDLDN;K~G(4nGPWLAICiLRPLTSDYTiGFGKFVDKV~~TDMRPE;LKEPUPWSDI; 200 1.11111 III1 IIII I ..llllllIllll IIIIIIIlIIIIIlIII IIIIII.. l'lIIll'ttIl'IIIl I "lllllllllll lllllllllllllllll'llllll f

I l1lllllltllillllllll1lllllllllllllllll "1'11111111111111111111111111111111111111

PMSPPGLRVRlRPGEERHFELEVFEPLESPVDLYILMDFSNSnSDDLDNLKKMGQNLARVLSPLTSDYTIGFGKFVDKVSVPGTDRRPEKLKEPUPN~P 199

PFSFKNVlS;TENVEEFUII;LPGERISGN;DAPEGGFDAiLPTAVCTRDiGWRIOSTHL;VFSTESAFH~E~~NV~~I~RNDEKC; 300 I1lIllllIIl1 I II II Illlllllllllllllllllllllllllllll.lillllIlltlllllltlllllllltll llll.l1ll llllllllt1tt~l~ll:lt~ttlltlllllllllllltllllltllllllll*tlllllllllltlllltltIlltlllIl*llll'tlll" I'llll PFSFKNVISLTEDVDEFRNKLQGERISGNLDAPEGGFDAILPTAVCTRDIGVRPDSTHLLVFSTESAFHYEADGANVLAGIRSRNDERCHLDTTGTYTPY 299

KTQDYPSVP;LVRLLAKHNiIPIFAVTNY~YSYYEKLHK;FPVSSLGVL~EDSSNIVEL;EEAFYRIRS;LDIRALDSP~GLRTEVTSD;LPKTETGSFI; 400 lllllllllltllllllllllllllltlllllltl I IltlIIlIlIIIIIIIItIIIIllI tIIIIIIIIIlIIIltIIIIIII

~llllllltlllllllllllllllllltlllllllt I'lIltIttIlIIIIIIIlItIlllII IIIIIIIIIIIIIIlI1II1Il1' 'III Illll 1 ,111 IIIII

RTODYPSVPTLVRLLAKHNIIPIFAVTNYSYSYYEKLHTYFPVSSLGVLGEDSSNIVELLEEAFNRIRSNLDI~LDSPRGLRTEVTS~FGKTRTGSFH 399

~K’f~~~~T~“~H~~~~DI~~~~~~~AK~DGGGNIHLK~SFU)GLRWDI~SVICOVCPC~LPKEVRSAR~HFR~F~CG~CVCNEGWSG~TCNCSTGSL~ 500 III lllllllllllll.lll..lllll IIIIIIIIIIII I II

l:lllll~l'l:lll:l~:llllllll: III IIIIIII1I1lI1'III"IIIII'IIIIIIIIIIII I'll l:II:lll IIIII IIIIIlllll

III'IIIII'IIIIIIIIII

IRRGEVGIYGVPLRALEHVDGTWVCGLP.EDGKGNIHLKPSFSDGL~AGIICDVCTCELQKEVRSARCSFNCDFVCGGCVCSEGWSGPTCNCSTGSLS 498

DTGPCLRECiDI(PCSGHGECtlCGRCVCYG;GRYEGHfCTCIDSNG~iCNGRGYCE~ 600 I llllllltlllllt.llllll.llllllltl~l.lllllllltlllllllllllllllll.lll~Illll II11IIIlIl1IIIIIIIlIII1 Ill 1'11111111111111'111111'111Illlll l'IIIIIIIIIIII1IIII1IIIIIIlI'IlI lllll'lllllllllllllllllllllll'lll DIQPCLREGEDKPCSGRGECOCGHCVCYGEGRYEGPFCEYDNFPCPRTSGFLCNDRGRCSMGPCVCEPGWTGPSCDCPLSNATCIDSNGGICNGRGHCEC 598

GRCHCNP(IS;rTDTTCEIN;SAIL.GLCE~LRSCVGC~~GTGEKKGRA~DDCPFKV~~ELKK.EE~EY~~~~~~~~~~N~~GSPGPNST~ 698 lIIIl.llltItIl 11111111 llllllll1llll1llll IIII l..l tIlltIIlll. IIII lllll*lllttlll*lllltltl 1

.llll 1lIIIll lllllll1lllll1llll ll1l'l"l tIIttIIIlt' III1 liltllllllllll**llll~lllllll

GRCHCHPPSLYTDTICEINYSAIHPGLCEDLRSCVG&PAWGTGEKKGRT&EECNFKVKMM)ELKRAEEWVRCSFRDEDDDCfYSYTMEGDGAPGPNSTV 698 .

~~~:~~~SL~APS~LIPL;IFLLLLLAL;LLLC~YCA~C~CLGLLP~CNRGHMVGF~EDHYMLREN~~SDHLDTP~LRSGNLKGR~TVRWKITNN~ 798 lllllll..ll ll1lllll4llllllllll1.11111111111111111111111111111111111111111111 llll.lll.

IIIIllll 1:. lllllll"ll lillllllllllllllllll'l1lllllllltllllllllllllIllllllllllllllllllll~llll-lll' LVHKKI(DCPPGSFUVLIPLLLLLLPLLALLLLLCVKYCACCI(IICULLPCCWRGHUVGFKEDHYMLRENL~~HLDTP~LRSGNLKGRD~RWKVTNN~ 798

~‘I’;~~~~~~TSPTEL\(P;GLSLRLGRL~TENLMKPGTI;ECDGLRPEV;EWCNEVIRP;SWHKLGGT~FRGGPNAGK~GDHTIVDTViLAPRSAKGMi 898 IIIIIIIIIllll.IIIIIII

IIIIIIIIIII"IIIIIIIIIIIII'IIIIIII :I l.llll IIIIIIIIIlIIIIlI.II IIIIIIIIIIIIIIIIIIIllllllll.IIIIII l'llll'lllllltlllllllll'tl'lllllllllllllllllllllllllll'llIIII' I

QRPGFATHMSINPTELVPYGLSLRLARLCTENLLKPDTRECAGLRPEVEENLNEVYRPISGVHKLPPTKFRQGPNAGKKPDHTIVDTVLMAPRSAKPAL 898

~~~~~~~~9”EFHEL~A~GYYTVTAE~AR~VEFGE~VEL~~~~FIRPEDDDE~GLLVEAID~VSTATLGRRiVNITIIKEQ;SGWSFEPP~ 998 Il.lllltll1t.ll.llllllllllltllllllllllllllllllllllItllllll ~IIIlIIiIIlIIIlIlII .IIIIlIIl

1111111111 "tl'ltlllllll'll'llllllllllllllllllllllllllllllllllllllllll"llllllllllllllllll"llllllll LKLTEKGVEPRAFHDLKVAPGYYTLTADPDARGMVEFPEGVEL~VRVPLFIRPEDDDEKGLLVEAIDVPAGTATLGRRLVNITIIKEQARDWSFEPPE 998

. YSVSRGDGVARIPVIRHIL[;NGKSGVSYS;PDNTAHGHR~YVPVEGELL;HPGETWKEL~VKLLELPE~SLLRGRPVRI;FGMLSNPK~~RLGPPST~ ~Illllllllllllll"ll'ItllllI'llt.ll'l'lll'tlltllll~lll'llllllltlllllllllIllltllII~lllIIllIll'llll *t IlIlIIlItllltll..tl lllllll Ill Il.l.lII.IIlIlIII.III lllllllllltllIllllillttlll~llllllllll~llll

FSVSRGDPVARIPVIRRVLDGGKSGVSYRTPDGTAGGNRDYIPVEGELLFOPGEA~ELPVKLLELQEVDSLLRGRPVRRFHVGLSNPKF~HLGGPWST

TVIL... 1:;:

~~i~~~LINPTL~SPPPPHGDL~PPNPNA~GSRKIHFNWiPPPGKPWGY~VKYWIPCDS;SEAHLLDSr;PSVELTNLYI;Ya)YEMKV~ I III I IIIlllllllllllllllll lllllll IIIIIIIIIIIIIItlIIIIIIIII lIlIIIIIIIIIIlIlIIIII

II Ill:"I tll'l1lllllllllltlllllllll ttttlll'lttIlIIItttIttIIlIIIIlIll llIIIIIttttlllIItlItt I I

TIIIRDPDELDRSFTSQnLSSQPPPHGDLGAWNPNA~GSRKIHFNWLPPSGKPWGYRVICYWIOGDSESEAHLLDSKVPSVELTNLYPYCDYEMKVCA *

YGAKGEGPYSSLVSCRTHP;VPSEPGRLA;)(WSSTVTP;ITAYEVCY~LVNEDNRPI~PnKKVLVDN~KNR~LLIENiRDSGPYRYT~ III llllll lllllllllllllllllllllllllltllllltlllllltllllllllllll.lllllllllllllllllllllllllll.IIllllll ltl'IIIIII IIIIIIIIIIIllltlIIIlIIIIIIIIItlttlllllltllllllllllll'lllllllllttllllltltlIIIIIll'lllltlll 1 YGAQGEGPYSSLVSCRTHPEVPSEPGRLAFNWSSTVTPLSWAEPAETNGEITAYEVCYGLVNDONRPIGPMKKVLVDNPKNRMLLIENLRESQPYRYTV

. . . . * . . * KARNGAGWCPEREAIINLATGPKRPnSIPIIPDIPIVDAGGGEDYENFLRYSDDVLRSPASSPRPSVSDDTEHLVNGR~FAYPGSANSLHRRTMN.VA ltlltlttttlltltlllltltlllllllllltlllllll~ltll. ltlllIttttIl ltllttttltllttIItIIII.III lIttIll Itlllllllttttllttlllltltlllllltlllllllll'lttl "lllillllllll'~lltlttlltlllttlllllll'lll'ltllllt"' -f KARNWICWGPEREAIINLATQPKRPWSIPIIPDIPIVDAPSGEDYDSFLWYSDDVlRSPSGSGRPSV~D~HLVNGR~FAFPGSTNSLHRMTTTS~

.ff YGTHLSPHL;HRVLSTSSTiTRDYNSLTR;EHSHSGTLPI;DYSlLTSLS~GDSRGAVGV~DTPTRLVFSALGRTSLKVS;PEPQ~RlLiGYSVEYQLL~ tltllltl. Itttlttttltttt.llIl IllIf tlIttIIttli.ll.ltt II tIllltltlllt tlt.llltlt 1.t t tltltttll tltltlll"lllltttlllllIl'tItt'ltltl'tltlltllttl'lt~ttt "Ii t tltlltltlll~~ttt*tlttlt*l-t*t tttltlllt~

RLTAG~TPTRLVFSALGPTSLRV~PRCERPLGGYS~YGLLN

"l‘tltllllt'llttllltlllt~t-ttlllltltltttltllttltlllltllltl~lllllltllltllllttlltltlltlll 11111111"~ l.llllllll.lltlltllltll I lllllllllttllttlltlllltllllllllllllllllllllllllttlllllllllll Itllllll

GGELHRLNIPNPAGTSVWEDLLPNHSYVFRVRAGSGEGWGREREGVITIESPVHPQSPLCPLPGSAFTLSTP~PGPLVFTALSPDSLGLSUERPRRPN

1098

1098

1195

1198

1295

1298

1394

1398

1494

1498

1594

1598

1694

1698

TTTEPFLWD;;LTLGTGRLE;GGSLTRHVT~EFVTRTLTA~GSLSTH~~FFGT* 1749 MOUSE tItll.lt Ill l*ltltllltlltlltlt IIII II IilltlItIlttl

"ltlll'll Ill-l'IIIIIlllllllllll'llll'll'lltlllltltltl SATEPFLVOGPTLGAGHLEAGGSLTRHVTOEFVSRTLTTSGTLSTH~~FFGT* 1753 HUMAN

Fig. 3. Comparison of deduced aa sequences for mouse (top line) and human (bottom line) g4 integrin. Arrowhead A shows the position of the 70- aa insert (Tamura et al., i990) and arrowhead B, the position of the 53-aa insert (Hogervorst et al., 1990) in the human sequences. Tyr (Y) which are sites for phospho~lation are framed. Short, vertical lines indicate homology between mouse and human sequence; colons indicate conservative substitutions; dots indicate nonconservative substitutions. (Conventions are consistent with the GCG software package.)

215

SPLICE SITE A SPLICE SITE B

L C 1 2 3 4 5 6 7 L 8 910111213 L 1 2 3 4 5 6 7 L 8 910111213

SPLICE SITE B

1 2 3 4 5 6 7 L 8 9 10 11 12 13

1 gcctccctcctatctgggaagatgggaggagcaggcttccgctgtcctggactcttgggtccttgagccgggctcacatg 80 LPPIWEDGRSRLPLSWTLGSLSRAHM

81 aagggtgtgcccgcatccaggggttcaccagactctataatcctggccgggcagtcagcagcaccctcctggggtacag 159 KGVPASRGSPDSIILAGQSAAPSWG Q

Fig, 4. Analyses for splice variant sequences. Primers flanking insert A (panel A) or insert B (panels B and C) were used to amplify sequences found in RNA.% Lanes: 1, Line-l cell mRNA; 2, uterus mRNA, 3, thymus mRNA, 4, liver mRNA, 5, spleen mRNA, 6, intestine mRNA; 7, Line-l tumor mRNA; 8, intestine mRNA, 9, skin mRNA, 10, intestine total RNA; 11, skin total RNA; 12, lung total RNA, 13, NML cell mRNA. Lane C contains the reaction product from a known 308~nt RNA template included in the kit as a control for RT-PCR. Lanes L contains a DNA ladder as size marker. Panels A and 3 depict ethidium bromide-stains gels, and panel C is an autoradiogr~ of a Southern blot of the gel in panel 3 probed with random-hexamer- primed clone 3AB insert (Fig. 2). The nt sequence and deduced aa sequence of insert B from the mouse spleen are presented below panel C.

restriction map (four enzymes) was constructed after analysis of 38 specific fragments (Fig. 1A). Six of 24 restriction sites were present in the cDNA (exon sequences), and the gene was approximately 40 kb in length. At least five exons could be demonstrated, although it is likely that many more exist since the average exon size in eukaryotes is less than 300 bp. The data are consistent with the results of pulsed field gel analyses in which the smallest fragment, which contained sites for probes 34 and 37-1, was 50 kb (data not shown).

(e) Conclusions (I) The rnp4 aa sequence deduced from cDNA sequence

is nearly 90% homologous to the h& sequence. (2) One alternative splice variant was identified by RT-

PCR as a 159-bp insert homologous to one of two pre- viously identified splice variants in hp.+.

(3) Five sites for glycosylation are conserved, as is one site for Tyr phosphorylation.

(4) The m& gene is about 40 kb long, and analyses of the gene structure revealed no RFLP between liver and tumor DNA.

ACKNOWLEDGEMENTS

We thank Drs. S. Mitra and F. Larimer for advice on cDNA cloning and use of computer software and Trish Lankford and Dr. Ginny Ford for technical support. Nette Crowe and Sylvia Allen helped in manuscript prep- aration, and Drs. S. Niyogi and F. Larimer provided help- ful comments on the text. Research sponsored by the Ofhce of Health and Environmental Research, US Department of Energy, under contract DE- AC05-840R21400 with the Martin Marietta Energy Systems, Inc. The submitted manuscript has been co- authored by a contractor of the US Government under contract DE-AC05-840R21400.

REFERENCES

Bourdon, M.A., Krusius, T., Campbell, S. and Schwarz, N.B.: Identification and synthesis of a recognition signa for the attach- ment of ~ycos~no~y~ns to proteins. Proc. Natl. Acad. Sci. USA 84 (1987) 3194-3198.

Carter, W.G., Kaur, P., Gil, S.G., Gahr, P.J. and Wayner, E.A.: Distinct

216

functions for integrins a3bl in focal adhesions and a6B4/bullous pemphigoid antigen in a new stable anchoring contact (SAC) of keratinocytes: relation to hemidesmosomes. J. Cell Biol. 11 I (1990) 3141-3154.

Devereux, J., Haeberli, P. and Smithies:, 0.: A comprehensive set of sequence analysis programs for the VAX. Nucleic Acids Res. I2 (1984) 387-395.

Falcioni, R., Perrotti, N., Piaggio, Cl., Kennel, S.J. and Sacchi, A.: Insulin-induced phosphorylation of the beta-4 integrin subunit expressed on murine metastatic carcinoma cells. Mol. Carcinogenesis 2 (1989) 361-368.

Hemler, M.E.: VLA proteins in the integrin family: structures, functions, and their role on leukocytes. Annu. Rev. Immunol. 8 (1990) 365-400.

Hemler, M.E., Crouse, C. and Sonnenberg, A.: Association of the VLA a6 subunit with a novel protein. A possible alternative to the common VLA p4 subunit on certain cell lines. J. Biol. Chem. 264 (1989) 6529-6535.

Hogervorst, F., Kuikman, I., von dem Borne Jr., A.E.G. and Sonnenberg, A.: Cloning and sequence analysis of beta-4 cDNA: an integrin subunit that contains a unique I18 kd cytoplasmic domain. EMBO J. 9 (1990) 765-770.

Hynes, R.O.: Integrins: a family of cell surface receptors. Cell 48 (1987) 549-554.

Jones, J.C.R., Kurpakus, M.A., Cooper, H.M. and Quaranta, V.: A function for the integrin a& in the hemidesmosome. Cell Regulation 2 (1991) 427-438.

Kajiji, S.M., Davceva, B. and Quaranta, V.: Six monoclonal antibodies to human pancreatic cancer antigens. Cancer Res. 47 (1987) 136771376.

Kennel, S.J., Foote, L.J., Falcioni, R., Sonnenberg, A., Stringer, C.D., Crouse, C. and Hemler, M.E.: Analysis of the tumor-associated anti- gen TSP-180. Identity with aGP4 in the integrin superfamily. J. Biol. Chem. 264 (1989) 15515-15521.

Kennel, S.J., Epler, R.G., Lankford, T.K., Foote, L.J., Dickas, V., Canamucio, M., Cavalierie, R., Costimelli, M., Venturo, I., Falcioni, R. and Sacchi, A.: Second generation monoclonal antibodies to the human integrin a.& Hybridoma 9 (1990) 243-255.

Kennel, S.J., Godfrey, V., Ch’ang, L.Y., Lankford, T.K., Foote, L.J. and Makkinje, A.: The B4 subunit of the integrin family is displayed on a restricted subset of endothelium in mice. J. Cell Biol. 101 (1992) 145-150.

Kozak, M.: An analysis of vertebrate mRNA sequences: intimations of translational control. J. Cell Biol. I15 (1991) 887-903.

Miletich, J.P. and Braze Jr., G.J.,: B protein C is not glycosylated at asparagine 329. J. Biol. Chem. 265 (1990) 11397-l 1404

Phillips, J.H., McKinney, L., Azuma, M., Spits, H. and Lanier, L.L.: A novel p,a6 integrin-associated epithelial cell antigen involved in natural killer cell and antigen-specific cytotoxic T lymphocyte cyto- toxicity. J. Exp. Med. 174 (1991) 1571-1581.

Rinchik, E.M., Machanoff, R., Cummings, C.C. and Johnson, D.K.: Molecular cloning of the ectropic leukemia provirus Emv-23 pro- vides molecular access to the albino-deletion complex in mouse chromosome 7. Genomics 4 (1989) 251-258.

Sambrook, J., Fritsch, E.F. and Maniatis, T.: Molecular Cloning. A Laboratory Manual, 2nd ed. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, 1989.

Sonnenberg, A., Calafat, J., Janssen, H., Daams, H., Van der Raaij- Helmer, L.M.H., Falcioni, R., Kennel, S.J., Aplin, J.D., Baker, J., Loizidou, M. and Garrod, D.: Integrin a6/84 complex is located in hemidesmosomes, suggesting a major role in epidermal cell- basement membrane adhesion. J. Cell Biol. I13 (1991) 907-917.

Springer, T.A.: Adhesion receptors of the immune system. Nature 346 (I 990) 425-434.

Stepp, M.A., Spurr-Michaud, S., Tisdale, A., Elwell, J. and Gipson, I.K.: a,&, integrin heterodimer is a component of hemidesmosomes. Proc. Natl. Acad. Sci. USA 87 (1990) 8970-8974.

Suzuki, S. and Naitoh, Y.: Amino acid sequence of a novel integrin B4 subunit and primary expression of the mRNA in epithelial cells. EMBO J. 9 (1990) 757-763.

Tamura, R.N., Rozzo, C., Starr, L., Chambers, J., Reichardt, L.F., Cooper, H.M. and Quaranta, V.: Epithelial integrin a&: complete primary structure of as and variant forms of 84. J. Cell Biol. I I I (1990) 159331604.