Characterization of the 3′ half of the human type IV collagen α5 gene that is affected in the...
Transcript of Characterization of the 3′ half of the human type IV collagen α5 gene that is affected in the...
GENOMICS 9, 1-9 (19%)
Characterization of the 3’ Half of the Human Type IV Collagen c~5 Gene That Is Affected in the Alport Syndrome
JING ZHOU, SIRKKA LIISA HOSTIKKA, LOUISE T. CHOW,* AND KARL TRYGGVASON
Blocenter and Department of Biochemistry, University of Oulu, SF-90570 Oulu, Finland; and *Department of Biochemistry, University of Rochester School of Medicine and Dentistry, Rochester, New York 14642
Received July 20, 1990; revised October 4, 1990
We have determined the exon-intron structure of the 3’ half of the gene for the human type IV collagen (~5 chain that is affected in X-chromosome-linked Alport syndrome. Six overlapping X phage genomic clones containing exons 1-14 (as counted from the 3’ end) and two additional over- lapping genomic clones containing exons 16-19 spanned a total of 60 kb, 9.5 kb of which were the 3’ flanking region. The exon-intron structure was elucidated by restriction enzyme mapping, nucleotide sequencing, and heteroduplex analyses. The sequences of all of the 19 most 3’ exons and their flanking sequences were determined from the geno- mic clones, with the exception of exon 15, which was se- quenced after amplification from genomic DNA with the polymerase chain reaction. The results show that the genes for the (~5(1V) and al(IV) chains have an almost identical exon size pattern in the 3’ half. In contrast, there is not a clear conservation of intron sizes between the two genes, although both genes may have a similar total size. The current results have allowed the identification of three mu- tations in the a5(IV) gene in three kindreds with Alport syndrome, and the gene structure and sequencing data pre- sented should facilitate the analysis of other as yet unidenti- fied mutations in this heterogeneous genetic disease. IL 1991 Academic Press. Inc.
INTRODUCTION
Type IV collagen is the major structural component of basement membranes. This collagen type has been shown to contain at least five distinct component chains, termed al(IV), (uB(IV), (u3(IV), a4(IV), and 05( IV), demonstrating that type IV collagen exists in multiple forms (Timpl, 1989; Butkowski et al., 1987; Saus et al., 1988; Hostikka et al., 1990). The cul(IV) and a2(IV) chains are the primary collagen IV compo- nents that are usually present in the same molecule in a 2:l ratio (Timpl, 1989). The human collagen cyl(IV) and a2(IV) chains contain a highly conserved car- boxyl-terminal noncollagenous domain of 229 and
227 amino acid residues, respectively, and a collage- nous domain of 1398 and 1428 residues, respectively (Soininen et al., 1987; Hostikka and Tryggvason, 1988). Unlike fibrillar collagens, the type IV collagen a chains contain numerous imperfections in the oth- erwise continuous Gly-Xaa-Yaa repeat sequence. Comparison of the al(IV) and cu2(IV) chains has shown that many of the interruptions in the Gly- Xaa-Yaa sequence do not coincide (Hostikka and Tryggvason, 1988), a fact that may be responsible for the kinky structure of the molecules observed by ro- tary shadowing (see Timpl, 1989). In contrast, the noncollagenous (NC) domains of the al(IV) and ~y2(1V) chains are almost equal in size, with 63% se- quence identity that includes conservation of all 12 cysteine residues (Hostikka and Tryggvason, 1988). The crl(IV) and (uB(IV) genes are structurally related and are located at the end of the long arm of chromo- some 13 (Boyd et al., 1986; Griffin et al., 1987; Cutting et al., 1988). The genes are located head-to-head and are divergently transcribed from an overlapping pro- moter region (Pijschl et al., 1988; Soininen et al., 1988). Comparison of the available structure of the al (IV) and a2( IV) genes has demonstrated extensive divergence (Hostikka and Tryggvason, 1987). At pres- ent, there are only minimal sequence data available for the (~3(1V) and a4(IV) chains (Butkowski et al., 1987; Saus et al., 1988; Gunwar et al., 1990) and little is known about their molecular properties.
We have isolated and characterized cDNA clones that code for about 50% of the sequence of the human &(IV) chain from the carboxyl-terminal end (Hos- tikka et al., 1990). Amino acid sequence comparison demonstrated that the (r5(IV) chain is more closely related to the al(IV) chain than to the a2(IV) chain. In the NC domain, the sequence identity between the (u5(IV) chain and the cul(IV) or a2(IV) chain is 83 or 63%, respectively. For the collagenous domain, the sequence identity is 58 and 46%, respectively. All the interruptions in the collagenous domain coincide in
0888-7543/91 $3.00 Copyright 8 1991 by Academic Press, Inc.
All rights of reproduction in any form reserved.
their location between the al(IV) and the tu5(IV) chains. Immunofluorescence studies with antiserum raised against a synthetic a5(IV) chain peptide showed that the location of the a5(IV) chain is highly restricted in the kidney to the glomerular basement membrane. The a5(IV) collagen gene (COL4A5) has been assigned to the Xq22 locus (Hostikka et al., 1990), a region where the Alport syndrome gene has been located (Atkin et al., 1988b; Flinter et al., 1988; Brunner et al., 1988). This X-linked disease is charac- terized primarily by hematuria and patchy splitting of the glomerular basement membrane (GBM) (Atkin et al., 1988a). We have recently identified three different mutations in the a5(IV) gene in 3 of 18 kindreds with Alport syndrome (Barker et al., 1990). One of these mutations involves the deletion of an approximately 15-kb segment containing exons 5 through 10, as counted from the 3’ end of the gene (Barker et al., 1990; this study). This mutation results in the synthe- sis of an a5(IV) collagen chain that lacks 240 amino acids from the carboxyl-terminal region. A second mutation involved a single base change in exon 3, converting a G to a C (Zhou et al., 1991). This muta- tion changes the TGT codon of cysteine in the NC domain of the chain to the TCT codon for serine. The mutation also generates new restriction sites for PstI and BglII and, consequently, a restriction fragment length polymorphism (RFLP) that is diagnostic for the disease in this kindred (Barker et al., 1990; Zhou et al., 1991). In a third kindred the absence of a TaqI fragment has been observed (Barker et al., 1990) but the exact nature of this variant has not yet been eluci- dated.
It is apparent that X-chromosome-linked Alport syndrome is a heterogeneous disease caused by sev- eral different mutations in the tu5(IV) collagen gene. Therefore, detailed characterization of this locus is crucial for future studies of the nature of mutations in different individuals with Alport syndrome. In the present study we describe the exon-intron structure of the 3’ end of the a5(IV) gene containing the last 19 exons that code for about one-half of the n5(IV) chain. The sequence of the exons and the flanking intron sequences are provided. The gene structure is also compared with the corresponding region of the previously characterized cul(IV) and cu2(IV) genes.
MATERIALS AND METHODS
Genomic Clones
Human genomic libraries in h Charon 4A (kindly provided by Dr. Eric Fritsch; generated by partial AluI/HaeIII digestion), in EMBL-3 (Clontech NO. HL 1067J), and in X Fix (Stratagene No. 944201) were
screened with 3”P-labeled cDNA inserts coding for the human a5(IV) gene (Hostikka et al., 1990) according to standard procedures (Maniatis et al., 1982). Iso- lated genomic clones were characterized by restric- tion mapping and by hybridization with different cDNA inserts or sequence-specific oligonucleotide probes. Suitable restriction fragments were subcloned into pUC and/or Ml3 vectors for further restriction mapping and sequence analysis.
Nucleotide Sequencing
Exon-containing genomic DNA subclones were se- quenced in Ml3 (Messing, 1983) and/or pUC vectors. Exons were sequenced with the Sanger dideoxynu- cleotide chain termination method (Sanger et al., 1977) using Sequenase (U.S. Biochemicals). Ml3 “universal primer” or sequence-specific oligonucleo- tides (made with an Applied Biosystems DNA synthe- sizer) were used.
Heteroduplex Analysis
cDNA clones PL-31 and MD-6 (Hostikka et al., 1990) were linearized with ScaI. Each was separately hybridized to the X genomic DNA clones in 10 mM Tris, 1 mMEDTA, pH 8.5,50% formamide. The reac- tion mixture was mounted onto electron microscopic grids in the presence of formamide as described previously (Chow et al., 1979; Lopata et al., 1983) and examined in a Zeiss EM10 CA electron microscope. The heteroduplexes were photographed and the nega- tive was enlarged and traced. Single-strand circular $X174 and renatured cDNA molecules in the same negative served as single- and double-strand length standards, respectively, in the determination of in- tron and exon lengths.
Polymerase Chain Reaction Amplification and Cloning of Exon 15
On the basis of structural similarities with the type IV collagen al chain gene, it could be anticipated that the 127 bp of coding sequence that was not present in the genomic clones was in one exon. To verify this, the polymerase chain reaction (Saiki et al., 1988) was carried out from total human DNA. Oligonucleotide primers, P, (5’-CTAGAATTCGGTGAGCCTGGT- CTGCCT-3’) and P, (5’-CCGAAGCTTCTGGGA- ATCCAGGAAGGC-3’) for the 5’ and 3’ ends, respec- tively, of the putative exon 15 were designed on the basis of the sequence known from the cDNA clones (Hostikka et al., 1990; Williams, 1989). The primers were synthesized in an Applied Biosystems, Inc., DNA synthesizer. The primers were designed to con- tain EcoRI (Pi) and Hind111 (P,) restriction sites at
HUMAN a5(IV) COLLAGEN GENE
the 5’ end for subsequent cloning of the amplified fragment into the sequencing vector. The polymerase chain reaction was performed by using a commercial polymerase chain reaction kit (Perkin-Elmer/Cetus) using 1 pg of human lymphocyte DNA as template and primers at 25 pmol concentration. The initial de- naturation was performed at 94°C for 10 min followed by cooling on ice for 3 min. Tuq polymerase (0.5 ~1) was then added and 25 cycles were carried out as fol- lows: denaturation at 94°C for 1.5 min, annealing at 55°C for 2 min, and extension at 72°C for 2 min. The amplified product was extracted with phenol/chloro- form, digested with EcoRI and HindIII, and electro- phoresed in Nusieve GTG low-melting-point agarose (FMC BioProducts) gel. A band of about the expected 145-bp size based on comparison with the al(IV) gene (includes 28 extra bp added for cloning of the frag- ment) was excised and ligated into an EcoRI/ HindIII-cut M13mp19 vector in the presence of 1 mM hexamine cobalt chloride (Murray, 1986). Positive clones were isolated and sequencedusing the Ml3 uni- versal primer and Sequenase.
RESULTS
Identification of Genomic Clones
Eight h phage clones that together spanned about 60 kb (Fig. 1) were isolated using the cDNA clones MD-6 and PL-31 that contain about 50% of the cod- ing sequence from the 3’ end of cr5(IV) gene (Hostikka et al., 1990). The genomic clones were purified and characterized by restriction enzyme mapping and Southern analysis. Furthermore, the exon-intron size pattern was determined by electron microscopy of heteroduplexes formed between the genomic and the cDNA clones. Six overlapping clones (ML-5, MG-2, EB-4, FM-13, F-2, F-7) covered about 9.5 kb of the 3’ flanking region and about 36 kb of the gene, including the 14 most 3’ exons. Two additional clones, F-8 and ML-2, that did not overlap with the clone F-7 were shown to contain exons 16-19 as counted from the 3’ end of the gene.
Exon-Intron Structure
Restriction fragments of the genomic clones were subcloned and sequenced to determine the exon sizes. For sequencing, either universal primers or primers designed on the basis of the nucleotide sequence known from the cDNA were used. Furthermore, the sizes of both exons and introns were determined from heteroduplexes between the cDNA and the genomic clones. The sizes of exons (data not shown) were in good agreement with those obtained from DNA se- quencing, thus validating the assignment of introns.
4 ZHOII ET AL.
The sizes of exons and introns are summarized in Ta- ble 1. The data demonstrate that the eight genomic clones studied contain 18 exons, i.e., 1-14 and 16-19, as counted from the 3’ end of the gene (Fig. 1). The size of the missing exon 15 (127 bp) was predicted by comparison of the exon size patterns in the cr5(IV) and crl(IV) collagen genes (Table 1). The sequence of exon 15 (Fig. 2) was verified after its amplification from genomic DNA with the polymerase chain reac-
TABLE I
Sizes (bp) of Exons and Introns in the 3’ End of the Human Type IV Collagen (~5 and (~1 Chain Genes
Gene
Gene segment IX5 (IV) o!l (IV)”
Exon 19 (34) 150 Intron 18 (34) 1500
Exon 18 (35) 99 Intron 17 (35) 1450
Exon 17 (36) 90 Intron 16 (36) 300
Exon 16 (37) 140 Intron 15 (37) >5000
Exon 15 (38) 127 Intron 14 (38) r4500
Exon 14 (39) 81 Intron 13 (39) 1210
Exon 13 (40) 99 Intron 12 (40) 560
Exon 12 (41) 51 Intron 11 (41) 940
Exon 11 (42) 186 Intron 10 (42) 7000
Exon 10 (43) 134 Intron 9 (43) 3100
Exon 9 (44) 73 Intron 8 (44) 133*
Exon 8 (45) 7’ Intron 7 (45) 870
Exon 7 (46) 129 Intron 6 (46) 4480
Exon 6 (47) 99 Intron 5 (47) 1400
Exon 5 (48) 213 Intron 4 (48) 5000
Exon 4 (49) 178 Intron 3 (49) 2050
Exon 3 (50) 115 Intron 2 (50) 345b
Exon 2 (51) 173 Intron 1 (51) 900
Exon 1’ (52)d 1245
153
99
90
140
127
81
99
51
186
134
73
72
129
99
213
178 >
115
173
1383
I
161
115
1100
300
97
600
1500
2000
800
2670
960
1390
1510
1210
960
-3000
2900
1900
LI Ref. (28). * Sequenced (this study). ’ The number as counted from the 3’ end. ’ The number in parentheses refers to the actual number in the
previously determined human nl (IV) gene as counted from the 5’ end.
tion and subsequent DNA sequencing. The complete sequence of the 19 most 3’ exons of the gene, together with the derived amino acid sequences, is shown in Fig. 2. With the exception of exon 15,30 bp of intron sequences flanking each exon are also provided in Fig. 2.
Exon 1, the most 3’ exon, is 1245 bp long and codes for 79 bp of the translated sequence and 1166 bp of a 3’ untranslated sequence (Fig. 2). The 3’ untranslated sequence contains one typical polyadenylation signal, AATAAA. However, we have, as yet, not isolated cDNA clones that indicate the use of this signal se- quence. In contrast, there are two closely spaced po- tential polyadenylation signals, AATATA and AATTAAA (Hostikka et al., 1990), present further downstream. One of those is most likely a functional polyadenylation signal sequence since three cDNA
HUMAN (u5(IV) COLLAGEN GENE 5
Em” 7 GiTCTCC?KXATTACCTGGTCCTKAGGACAGAGTATCATAATl!AMGGAGATGCTCGTC GPPGLPGPSG*SIIIKGDAG
CTCCXGAATCCCTGGCCAGCCTGGKTAAAGGGGTCTACCAGGACCCCM~CCTCAAG PPGIPGQPGLKGLPGPPGPO
cCrrACCAGPtaccaatg~g*~~~=~~~*~~*~=*~~ G L P
Em” 5 GT*CCCGTGGmG*TGGTCCccc?cGTccAGAT~lTGcAAGGTccccc*GGTcccc GTRGLDGPPGPDGLaGPPGP
ATAATGTTTGCAWTTKCTAAGAAATGACTATICTTA NNVCNFASRNDYSYWLSTPE
attatpttccttctccrtttcctttaccag
EIM 3 ATGTGcAGTATGTGAAGCTCCAGCTGTGGTGATcGcAGlT~cAG~*G.4cGA~cAGAT CA”CEAPA”“IA”HSQTIgI
Eron 2 cATACAAGTGcAGGGGc*GAAGGcKAGGTCAAGCCCTAGccTccccTGGmcTGcm HTSAGAEGSGoALASPGSCL
GAAGAGTTTCGTTCAGCKCClTZAT”XAATGTCATGGGAGGGGTACC?GTAACTACTAT EEFRSAPFIECHGRGTCNYY
GccAACTCCT*CAGCmGGcTGGCAACTGPAWLrrTGTcAGAcATGlTcAGgtaa*gt ANSYSFWLATVDVSDMFS
gcttatagctttaattcaggtcc
tagcaattg
~tcttaattttaccaatttgacctttctag
GTCCCARAGGTAACCCTGGTCTCCCTIXXAGC~GGT~ATAG~CCTCCTGGACTTA GPKGNPGLPGQPGLIGPPGL
CCTGG~CCCCGGATPACCAGGGARCCCCTGGAGCAAMGGAC~CCAGGCC~~CTGGA PGLPGLPGTPGAKGOPGLPG
RCCCAG F P
ttc*tttttaaattgagctctttactctap
GAACCCCAGGCCCTCCTGGACCMAAGGCCC GTPGPPGPKGISGPPGNPG:
gggtgt*acctgctgtactcaattttttag
GTGGTGGAGGKATCCTGGGCAACCAGGXCTCCAGGC GAAAAAGGCAAACCCGGWA GGGGHPGQPGPPGEKGKPGQ
ATGGTATICCrr;GRCUGCTGGACA~G~~C~qtgctgtagttttt~ttttt DGIPGPAGQKGEP
ttc9ttttstttt9tttt~*=~=~g*~*g
GTCAACCAGGCTTTGGPAACCCAGGACCCCCTGGACTTCCAGGACTTTCXgtaaacctt GOPGFGNPGPPGLPGLS
AGGGCGAACCAGGCmtACGGmCCCTGGTGTGCAGGGGCCC KGEPGFHGFPGVQGPPGPPG
FIG. Z-Continued
clones, including the previously described PC-4 (Hos- tikka et al., 1990), contain a poly(A) tail at 26 and 14 nucleotides, respectively, downstream from these two putative polyadenylation signals (Fig. 2). We have previously described a cDNA clone (PL-35) contain- ing a 3’ untranslated sequence reaching 34 nucleo- tides further downstream of PC-4 but not having a poly(A) tail (Hostikka et al., 1990). The present analy- sis of the gene implies that the mentioned 34-nucleo- tide sequence is a cloning artifact since it is not pres- ent in the gene, as verified by sequencing of two sepa- rate genomic clones (data not shown). We sequenced 282 bp downstream from the AATATA sequence and did not find the 34-nucleotide sequence or any addi- tional polyadenylation signals (Fig. 2). We conclude that the 3’ end of the structural a5(IV) collagen gene is located 1166 bp downstream from the first base of the translation termination codon.
Exons l-5 code for the NC domain. Exon 5 is a “junction exon” with 142 bp coding for the NC do- main and 71 bp coding for the Gly-Xaa-Yaa repeat- containing collagenous domain. Exons 6619 that code for the collagenous domain have sizes varying be- tween 51 and 186 bp. None of these exons has 54 bp or multiples thereof, the sizes typically found in the genes for fibrillar collagens (Boedtker et al., 1985; de Crombrugghe et al., 1985). Twelve of the fourteen ex- ons coding for the collagenous domain start with a split codon, i.e., the second base of the codon for gly- tine (Fig. 2).
The intron sizes analyzed vary between 133 and 7000 bp. The sizes of introns 14 and 15 could not be determined because genomic clones containing com- plete introns were not isolated. It can be calculated from the available data, however, that intron 14 is over 4500 bp and intron 15 more than 5000 bp. Conse- quently, the 19 most 3’ exons that contain about 50% of the human type IV collagen cu5 chain gene are lo- cated in 50 kb of genomic DNA.
Comparison of the Exon-Intron Structures of the
cuS(IV), aI( and cr2(IV) Genes
We compared the structure of the gene for the type IV collagen a5 chain with those of the (Al and cu2(IV) genes. Analysis of primary structure has previously demonstrated that the cu5, ~1, and (~2 chains of type IV collagen are all structurally closely related (Hostikka et al., 1990; Soininen et al., 1987; Hostikka and Tryggvason, 1988). However, the cu5(IV) and Cal chains are considerably more closely related to each other than to the crS(IV) chain (Hostikka et al., 1990). We have shown (Hostikka and Tryggvason, 1987; Soininen et al., 1989) that the genes for the cul(IV) and a2(IV) chains, which are
located adjacent to each other on chromosome 13 (Boyd et al., 1986; Griffin et al., 1987; Piischl et al., 1988; Soininen et al., 1988), have very different exon sizes. Interestingly, the present results demonstrate that the genes for the cu5(IV) and (ul(IV) chains have practically identical exon structures, at least in the 3’ end of the genes (Table 1; Fig. 3). The only difference is exon 19 in the (u5(IV) gene, which has 150 bp when the corresponding exon in the cul(IV) gene (exon 34) has 153 bp due to the deletion of one amino acid in the &(IV) chain (see Hostikka et al., 1990). This homol- ogy is in sharp contrast to the differences with the cu2(IV) gene, which has previously been reported to vary considerably from the (Al gene (Hostikka and Tryggvason, 1987). Thus, the NC domain is en- coded by three exons in the cu2(IV) gene but by five exons in the a5(IV) and cul(IV) genes. However, all the intron locations in this region of the genes have been conserved. Another interesting feature is that the exon-intron pattern of the a5(IV) and crl(IV) genes, on the one hand and that of the cuB(IV) gene, on the other, differ considerably in the collagenous do- main coding region. Here, only intron 4 in the ~2(1V) gene coincides with an intron location in the other two genes (Fig. 3). Furthermore, the sizes of all exons in this region differ between the t.wo groups.
Comparison of sizes of introns in the human a5(IV) and oll(IV) genes shows that regardless of the conser- vation of coding sequences in the 3’ end of the genes, the intron sizes do not show any conservation of indi- vidual introns (Table 1). However, the sizes of the corresponding 3’ ends of the genes appear to be simi- lar; the 19 exons of the a5(IV) and cul(IV) genes spanned about 50 and 40 kb of their respective geno- mic DNA. Exact comparison cannot be made since the sizes of all introns have not been determined.
DISCUSSION
The present results provide the exon-intron struc- ture of the 3’ end of the human type IV collagen (~5 chain gene located on Xq22. Analysis of the sequence of the 19 most 3’ exons demonstrated that the exon size profile is almost identical to that of the al(IV) gene, at least in this part of the gene. If this similarity holds true for the rest of the gene, the entire a5(IV) should contain 52 exons, as does the (ul(IV) gene (Soininen et al., 1989). The fact that the 19 exons studied here are contained in about 50 kb of genomic DNA indicates that the gene may have a size of about 100 kb, which is similar to that of the human al(IV) gene (Soininen et al., 1989).
The conservation of exon sizes in the a5(IV) and ~yl (IV) genes indicates that there has been a selective pressure to maintain this structure. This is somewhat
HUMAN n5CIV) COLLAGEN GENE 7
GLY -X -Y -coding
0 NC domain
0 3’ untranslated
FIG. 3. Illustration ofthe exon skucture of the 8’ends of the human genes for type IV collagen tu5, CY~, and &2 chains. Exons are indicated bv boxes and introns bv interconnectine lines (not, in scale). All intron locations in the iu5CIV) and nl(IV) genes coincide. Interrupted vertical ~I
lines depict intron locations that are identical in all three genes.
surprising considering the fact that exon sizes in the closely related ~2(1V) gene are very different. The dif- ference in exon structure of the cul(IV) and (uB(IV) genes and in the primary structure of the correspond- ing proteins led to the suggestion that there was not a severe functional constraint to maintain exact length of type IV collagen chains with the exception of the NC domains (Hostikka and Tryggvason, 1988). Therefore, divergence of gene structure with minor delet.ions or insertions of coding sequences could be tolerated. In contrast, an exactly equal length of col- lagen molecules is essential for the formation of colla- gen fibrils, and this is thought to explain the strict conservation of exon size pattern between the genes for fibrillar collagens (Boedtker et al., 1985; de Grom- brugghe et al.. 1985). With regard to cu5(IV) and (ul(IV), the present results imply that the genes have evolved from a common ancestor gene after branch- ing from the ~u2(1V) gene and that the exons were present prior to gene duplication since the exon-in- tron locations are completely conserved.
The most practical significance of the present work is doubtless in the detailed structural and nucleotide sequence data from a large portion of a gene that is affected in human disease--the Alport syndrome. This X-chromosome-linked disease is a major cause of terminal renal failure in males, with a gene fre- quency estimated to be about 1:5000 (Atkin et al., 1988a). The major clinical symptoms are loss of kid- ney function accompanied by hematuria and progres- sive sensorial deafness (Atkin et al., 1988a,b; Flinter et al., 1988; Brunner et al., 1988). Ultrastructural ab- normalities such as patchy splitting of the lamina densa in the GBM together with biochemical analyses led to the hypothesis that type IV collagen, the major structural component of the GBM, was defective (Spear, 1973; Kleppel et al., 1987). However, the af- fected chains in Alport syndrome could not be ul(IV) or (u2(IV) since their genes are on chromosome 13 (Boyd et al., 1986; Griffin et al., 1987). Our recent identification of the unique c&(W) chain and the sub- sequent localization of the gene to the locus of the
Alport syndrome on chromosome X strongly sug- gested it to be the gene affected in the disease (Hos- tikka et al., 1990). Importantly, this has been shown to be the case, since at least three difIerent mutations were found in three kindreds (Barker et al., 1990). In Utah Kindred EP (Hasstedt et al., 1986) the mutation involved the deletion of about a 15-kb segment of the gene spanning exons 5-10. As can be concluded from the present data, this deletion would result in the syn- thesis of an (u5(IV) chain missing 240 amino acid resi- dues. The variant chain would lack 193 residues from the carboxyl-terminal end of the collagenous domain and a 47--residue segment from the NC domain that contains 1 of the 12 evolutionarily conserved cysteine residues. The loss of this segment alone could not only interfere with the normal folding of the globular NC domain but also hinder the formation of the triple helix and end-to-end assembly of type IV collagen molecules (see Timpl, 1989). The additional loss of 193 residues from the collagenous domain would obvi- ously result in a totally nonfunctional N chain. A sec- ond mutation, found in large Utah Kindred P (Atkin et al., 1988a), involved the generation of a new PstI restriction site generated by a point mutation in exon 3, converting a TGT codon for cysteine to the TCT codon for serine (Zhou et al., 1991). In the third kindred, there was an absence of a TuqI fragment but the characterization of this mutation is still in prog- ress.
To date, there are no protein data available on the a5(IV) chain. Thus, it is not known whether it forms a homotrimer or whether it can assemble into hetero- trimers together with the (ul(IV) and tu2(IV) chains or the as yet poorly characterized a3(IV) and cu4(IV) chains. In the latter case, the presence of a nonfunc- tional variant a5(IV) chain could interfere with the normal formation of triple-helical type IV collagen molecules. However, because the n5(IV) chain is a minor component of type IV collagen in comparison with the ~ul(1V) and cu2(IV) chains, the damage to the GBM might be slow, which could explain the usually slow progression of the disease.
8 ZHOI ET AL,
Recent data indicate that X-chromosome-linked Alport syndrome is a heterogeneous disease involving several different mutations in the &(IV) gene. It can- not be excluded that some forms of the disease are due to defects in another adjacent type IV collagen gene. It is obvious that detailed analysis of new mutations in the tu5(IV) gene requires extensive knowledge about the structure of the normal gene. The sequence data provided here allow the design of oligonucleotide primers for the amplification and sequencing of al- most one-half of the exons and their immediate flanking regions of DNA from patients with Alport syndrome.
10.
11.
12.
13.
ACKNOWLEDGMENTS 14.
This work was supported in part by grants from the Academy of Finland and The Sigrid Juselius Foundation, by National Insti- tutes of Health Grant 36200, and by a grant from the Council for Tobacco-Research-USA (2550).
15.
1.
2.
3.
4.
5.
6.
REFERENCES
ATKIN, C. L., GREGORY, M. C., AND BORDER, W. A. (1988a). Alport syndrome. In “Diseases of the Kidney” (R. W. Schrier and C. W. Gottschalk, Eds.1, 4th ed., Chap. 19, pp. 617-641, Little, Brown, Boston.
ATKIN, C. L., HASSTEDT, S. .J., MENLOVE, L., CANNON, L., KIRSCHNER, N., SCHWARTZ, C., NGUYEN, K., AND SKOLNICK, M. (1988b). Mapping of Alport syndrome to the long arm of the X chromosome. Amer. J. Hum. Genet. 42: 249-255.
BARKER, D., HOSTIKKA, S. I,., ZHOU, J., CHOW, L. T., OLI- PHANT, A. R., GERKEN, S. C., GREGORY, M. C., SKOLNICK, M. H., ATKIN, C. L., AND TRYGGVASON, K. (1990). Identifica- tion of mutations in the COL4AFi collagen gene in Alport syn- drome. Science 248: 1224-1227.
BOEDTKER, H., FINER, M., AND AHO, S. (1985). The structure of the chicken 2 collagen gene. Ann. N. Y. Acad. Sci. 460: 85-116.
BOYD, C. D., WELIKY, K., TOTH-FEJEL, S. E., DEAK, S. B., CHRISTIANO, A. M., MACKENZIE, J. W., SANDELL, L. J., TRYGGVASON, K., AND MAGENIS, E. (1986). The single copy gene coding for human crl(IVl procollagen is located at the terminal end of the long arm of chromosome 13. Hum. Genet. 74: 121-125.
BRUNNER, H., SCHRBDER, C., VAN BENNEKOM, C., LAMBER- MON, E., TUERLINGS, J., MENZEL, D., OLBING, H., MONNENS, L., WIERINGA, B., AND ROPERS, H.-H. (19881. Localization of the gene for X-linked Alport’s syndrome. Kidney Int. 34: 507-510.
BUTKOWSKI, R. J., LANGEVELD, J. P. M., WIESLANDER, J.. HAMILTON, J., AND HUDSON, B. G. (1987). Localization of the Goodpasture epitope to a novel chain of basement membrane collagen. J. Biol. Chem. 262: 18’74~7871.
CHOW, L. T., BROKER, T. R., AND LEWIS, J. B. (19’79). Com- plex splicing patterns of RNAs from the early regions of Ade- novirus-2. J. Mol. Biol. 134: 2655303.
DE CROMBRUGGHE, B., SCHMIDT, A., LIAU, G., SETOYAMA, C., MUDRYJ, M.. YAMADA, Y., AND MCKEON, C. (1985). Struc- tural and functional analysis of the genes for a-2(1) and u-1(111) collagens. Ann. N. Y. Acad. Sci. 460: 154-162.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
CUTTINc:, G. R., KAZAZIAN, H. H., JR., ANTONARAKIS, S. E., KILLEN, P. D., AND YAMADA, Y. (19881. Macrorestriction mapping of COL4Al and COL4A2 collagen genes on human chromosome 13q34. Genomics 3: 256-1’63.
FLINTER, F. A., CAMERON, J. S., CHANTLEH, C., HOUSTON, I., AND BOBROW, M. (1988). Genetics of classic Alport’s syn- drome. Lancet 2: 1005-1007.
GRIFFIN, C. A., EMANUEL, B. S., HANSEN, J. R., CANEVEE, W. K., AND MYERS, .J. C. (1987). Human collagen genes en- coding basement membrane tul(IV) and 1r2fIV) chains map to the distal long arm of chromosome 13. Proc. N&l. Acad. Sci. 1ISA 84: 512-516.
GUNWAR, S., SAUS, J., NOELKEN, M. E., AND HUDSON, R. G. (19901. Glomerular basement membrane. Identification of a fourth chain, 04, of type IV collagen. J. Biol. Chem. 265: 5466-5469.
HASSTEDT, S. J., ATKIN, C. L., AND SAN JUAN, A. C., JR. (1986). Genetic heterogeneity among kindreds with Alport syndrome. Amer. J. Hum. Genet. 38: 940-953.
HOSTIKKA, S. L., EDDY, R. L., BYERS, M. G., H~~YHTY;~, M.. SHOWS, T. B., AND TRYGGVASON, K. (1990). Identification of a distinct type IV collagen a chain with restricted kidney dis- tribution and assignment of the gene to the locus of X chro- mosome-linked Alport syndrome. Proc. Natl. Acad. Sci. USA 87: 1606-1610.
HOSTIKKA, S. L.. AND TRYGGVASON, K. (1987). Extensive structural differences between genes for the cul and n2 chains of type IV collagen despite conservation of coding sequences. FEBS Lett. 224: 297-305.
HOSTIKKA, S. L., AND TRYGGVASON, K. (1988). The complete primary structure of the a2 chain of human type IV collagen and comparison with the nl(IV) chain. J. Biol. Chem. 263: 19488-19493.
KLEPPEL, M. M., KASHTAN, C. E., BUTKOWSKI, R. J., FISH, A. J., AND MICHAEL, A. F. (1987). Absence of 28 kilodalton non-collagenous monomers of type IV collagen in glomerular basement membrane. J. Clin. Znuest. 80: 263-266.
LOPATA, M. A., HAVERCROFF, J. C., CHOW, I,. T., AND CLEVE- LAND, D. W. t 1983). Four unique genes required for @ tubulin expression in vertebrates. Cell 32: 713-724.
MANIATIS. T., FRITSCH, E., AND SAMBROOK, d. (1982). “Mo- lecular Cloning: A Laboratory Manual,” Cold Spring Harbor Laboratory, Cold Spring Harbor, NY.
MESSING, J. (1983). New Ml3 vectors for cloning. In “Meth- ods in Enzymology” (R. Wu, L. Grossman, and K. Moldave, Eds.). Vol. 101, pp. 20-78, Academic Press, San Diego.
MURRAY, J. A. H. (1986). HCC ligation: Rapid and specific DNA construction with blunt ended DNA fragments. Nucleic Acids Res. 14: 10,118.
PGSCHL, E., POLLNER, R., AND K~~HN, K. (19881. The genes for the ol(IV) and n2fIVl chains of human basement mem- brane collagen type IV are arranged head-to-head and sepa- rated by a bidirectional promoter of unique structure. EMBO J. 7: 2687-2695.
SAIKI, R. K.. GELFAND, D. H., STOFFEL, S., SCHARF, S. J., HIGUCHI, R., HORN, G. T., MULLIS, K. B., AND ERLICH, H. A. (1988). Primer-directed enzymatic amplification of DNA with thermostable DNA polymerase. Science 239: 487-491.
SANGER, F., NICKLEN, S., AND COULSON, A. R. (19771. DNA sequencing with chain terminating inhibitors. Proc. Nat/. Acad. Sci. USA 74: 5463-5467.
HUMAN &i(W) COLLAGEN GENE 9
26.
27.
28.
29.
SAUS, J., WIESLANDER, J., LANGEVELD, J. P. M., QUINONES, S., AND HUDSON, B. G. (1988). Identification of the Goodpas- ture antigen as the tu3(IV) chain of collagen IV. J. Viol. Chem. 263: 13,374-13,380.
30. SOININEN, R., HAKA-RISKU, T., PROCKOP, D. J., AND TRYGG- VASON, K. (1987). Complete primary structure ofthe ~1 chain of human basement membrane (type IV) collagen. FELLS L&.225: 188-194.
31.
SOININEN, R., HUOTARI, M., GANGULY, A., PROCKOP, D. J.. AND TRYGGVASON, K. (1989). Structural organization of the gene for the ml(W) chain of human type IV collagen. J. Riol. (‘hem. 264: 1X565-13,571.
SOININEN, R., HUOTARI, M., HOSTIKKA, S. L., PROCKOP, D. J.. AND TRY~GVASON, K. (1988). The structural genes for
32.
33.
cul and 02 chains of human type IV collagen are divergently encoded on opposite DNA strands and have an overlapping promoter region. J. Riol. Chem. 263: 17,217-17,220.
SPEAR, G. S. (1973). Alport’s syndrome: A consideration of pathogenesis. Clin. Nephrol. 1: 336-337.
TIMPL, R. (1989). Structure and biological activity of base- ment membrane proteins. Eur. J. Biochem. 180: 487-502.
WILLIAMS, J. F. (1989). Optimization strategies for the poly- merase chain reaction. BioTechniques ‘7: 76% 768.
ZHOU, J.. BARKER, D. F., HOSTIKKA, S. L., GREGORY, M. c’., ATKIN, C. L., AND TRYGGVASON, K. (1991 I. Single base muta- tion in n5(IV) collagen chain gene converting a conserved cysteine to serine in Alport syndrome. Genomirs 9:1()-l%