Characterization of the 3′ half of the human type IV collagen α5 gene that is affected in the...

9
GENOMICS 9, 1-9 (19%) Characterization of the 3’ Half of the Human Type IV Collagen c~5 Gene That Is Affected in the Alport Syndrome JING ZHOU, SIRKKA LIISA HOSTIKKA, LOUISE T. CHOW,* AND KARL TRYGGVASON Blocenter and Department of Biochemistry, University of Oulu, SF-90570 Oulu, Finland; and *Department of Biochemistry, University of Rochester School of Medicine and Dentistry, Rochester, New York 14642 Received July 20, 1990; revised October 4, 1990 We have determined the exon-intron structure of the 3’ half of the gene for the human type IV collagen (~5 chain that is affected in X-chromosome-linked Alport syndrome. Six overlapping X phage genomic clones containing exons 1-14 (as counted from the 3’ end) and two additional over- lapping genomic clones containing exons 16-19 spanned a total of 60 kb, 9.5 kb of which were the 3’ flanking region. The exon-intron structure was elucidated by restriction enzyme mapping, nucleotide sequencing, and heteroduplex analyses. The sequences of all of the 19 most 3’ exons and their flanking sequences were determined from the geno- mic clones, with the exception of exon 15, which was se- quenced after amplification from genomic DNA with the polymerase chain reaction. The results show that the genes for the (~5(1V) and al(IV) chains have an almost identical exon size pattern in the 3’ half. In contrast, there is not a clear conservation of intron sizes between the two genes, although both genes may have a similar total size. The current results have allowed the identification of three mu- tations in the a5(IV) gene in three kindreds with Alport syndrome, and the gene structure and sequencing data pre- sented should facilitate the analysis of other as yet unidenti- fied mutations in this heterogeneous genetic disease. IL 1991 Academic Press. Inc. INTRODUCTION Type IV collagen is the major structural component of basement membranes. This collagen type has been shown to contain at least five distinct component chains, termed al(IV), (uB(IV), (u3(IV), a4(IV), and 05( IV), demonstrating that type IV collagen exists in multiple forms (Timpl, 1989; Butkowski et al., 1987; Saus et al., 1988; Hostikka et al., 1990). The cul(IV) and a2(IV) chains are the primary collagen IV compo- nents that are usually present in the same molecule in a 2:l ratio (Timpl, 1989). The human collagen cyl(IV) and a2(IV) chains contain a highly conserved car- boxyl-terminal noncollagenous domain of 229 and 227 amino acid residues, respectively, and a collage- nous domain of 1398 and 1428 residues, respectively (Soininen et al., 1987; Hostikka and Tryggvason, 1988). Unlike fibrillar collagens, the type IV collagen a chains contain numerous imperfections in the oth- erwise continuous Gly-Xaa-Yaa repeat sequence. Comparison of the al(IV) and cu2(IV) chains has shown that many of the interruptions in the Gly- Xaa-Yaa sequence do not coincide (Hostikka and Tryggvason, 1988), a fact that may be responsible for the kinky structure of the molecules observed by ro- tary shadowing (see Timpl, 1989). In contrast, the noncollagenous (NC) domains of the al(IV) and ~y2(1V) chains are almost equal in size, with 63% se- quence identity that includes conservation of all 12 cysteine residues (Hostikka and Tryggvason, 1988). The crl(IV) and (uB(IV) genes are structurally related and are located at the end of the long arm of chromo- some 13 (Boyd et al., 1986; Griffin et al., 1987; Cutting et al., 1988). The genes are located head-to-head and are divergently transcribed from an overlapping pro- moter region (Pijschl et al., 1988; Soininen et al., 1988). Comparison of the available structure of the al (IV) and a2( IV) genes has demonstrated extensive divergence (Hostikka and Tryggvason, 1987). At pres- ent, there are only minimal sequence data available for the (~3(1V) and a4(IV) chains (Butkowski et al., 1987; Saus et al., 1988; Gunwar et al., 1990) and little is known about their molecular properties. We have isolated and characterized cDNA clones that code for about 50% of the sequence of the human &(IV) chain from the carboxyl-terminal end (Hos- tikka et al., 1990). Amino acid sequence comparison demonstrated that the (r5(IV) chain is more closely related to the al(IV) chain than to the a2(IV) chain. In the NC domain, the sequence identity between the (u5(IV) chain and the cul(IV) or a2(IV) chain is 83 or 63%, respectively. For the collagenous domain, the sequence identity is 58 and 46%, respectively. All the interruptions in the collagenous domain coincide in 0888-7543/91 $3.00 Copyright 8 1991 by Academic Press, Inc. All rights of reproduction in any form reserved.

Transcript of Characterization of the 3′ half of the human type IV collagen α5 gene that is affected in the...

GENOMICS 9, 1-9 (19%)

Characterization of the 3’ Half of the Human Type IV Collagen c~5 Gene That Is Affected in the Alport Syndrome

JING ZHOU, SIRKKA LIISA HOSTIKKA, LOUISE T. CHOW,* AND KARL TRYGGVASON

Blocenter and Department of Biochemistry, University of Oulu, SF-90570 Oulu, Finland; and *Department of Biochemistry, University of Rochester School of Medicine and Dentistry, Rochester, New York 14642

Received July 20, 1990; revised October 4, 1990

We have determined the exon-intron structure of the 3’ half of the gene for the human type IV collagen (~5 chain that is affected in X-chromosome-linked Alport syndrome. Six overlapping X phage genomic clones containing exons 1-14 (as counted from the 3’ end) and two additional over- lapping genomic clones containing exons 16-19 spanned a total of 60 kb, 9.5 kb of which were the 3’ flanking region. The exon-intron structure was elucidated by restriction enzyme mapping, nucleotide sequencing, and heteroduplex analyses. The sequences of all of the 19 most 3’ exons and their flanking sequences were determined from the geno- mic clones, with the exception of exon 15, which was se- quenced after amplification from genomic DNA with the polymerase chain reaction. The results show that the genes for the (~5(1V) and al(IV) chains have an almost identical exon size pattern in the 3’ half. In contrast, there is not a clear conservation of intron sizes between the two genes, although both genes may have a similar total size. The current results have allowed the identification of three mu- tations in the a5(IV) gene in three kindreds with Alport syndrome, and the gene structure and sequencing data pre- sented should facilitate the analysis of other as yet unidenti- fied mutations in this heterogeneous genetic disease. IL 1991 Academic Press. Inc.

INTRODUCTION

Type IV collagen is the major structural component of basement membranes. This collagen type has been shown to contain at least five distinct component chains, termed al(IV), (uB(IV), (u3(IV), a4(IV), and 05( IV), demonstrating that type IV collagen exists in multiple forms (Timpl, 1989; Butkowski et al., 1987; Saus et al., 1988; Hostikka et al., 1990). The cul(IV) and a2(IV) chains are the primary collagen IV compo- nents that are usually present in the same molecule in a 2:l ratio (Timpl, 1989). The human collagen cyl(IV) and a2(IV) chains contain a highly conserved car- boxyl-terminal noncollagenous domain of 229 and

227 amino acid residues, respectively, and a collage- nous domain of 1398 and 1428 residues, respectively (Soininen et al., 1987; Hostikka and Tryggvason, 1988). Unlike fibrillar collagens, the type IV collagen a chains contain numerous imperfections in the oth- erwise continuous Gly-Xaa-Yaa repeat sequence. Comparison of the al(IV) and cu2(IV) chains has shown that many of the interruptions in the Gly- Xaa-Yaa sequence do not coincide (Hostikka and Tryggvason, 1988), a fact that may be responsible for the kinky structure of the molecules observed by ro- tary shadowing (see Timpl, 1989). In contrast, the noncollagenous (NC) domains of the al(IV) and ~y2(1V) chains are almost equal in size, with 63% se- quence identity that includes conservation of all 12 cysteine residues (Hostikka and Tryggvason, 1988). The crl(IV) and (uB(IV) genes are structurally related and are located at the end of the long arm of chromo- some 13 (Boyd et al., 1986; Griffin et al., 1987; Cutting et al., 1988). The genes are located head-to-head and are divergently transcribed from an overlapping pro- moter region (Pijschl et al., 1988; Soininen et al., 1988). Comparison of the available structure of the al (IV) and a2( IV) genes has demonstrated extensive divergence (Hostikka and Tryggvason, 1987). At pres- ent, there are only minimal sequence data available for the (~3(1V) and a4(IV) chains (Butkowski et al., 1987; Saus et al., 1988; Gunwar et al., 1990) and little is known about their molecular properties.

We have isolated and characterized cDNA clones that code for about 50% of the sequence of the human &(IV) chain from the carboxyl-terminal end (Hos- tikka et al., 1990). Amino acid sequence comparison demonstrated that the (r5(IV) chain is more closely related to the al(IV) chain than to the a2(IV) chain. In the NC domain, the sequence identity between the (u5(IV) chain and the cul(IV) or a2(IV) chain is 83 or 63%, respectively. For the collagenous domain, the sequence identity is 58 and 46%, respectively. All the interruptions in the collagenous domain coincide in

0888-7543/91 $3.00 Copyright 8 1991 by Academic Press, Inc.

All rights of reproduction in any form reserved.

their location between the al(IV) and the tu5(IV) chains. Immunofluorescence studies with antiserum raised against a synthetic a5(IV) chain peptide showed that the location of the a5(IV) chain is highly restricted in the kidney to the glomerular basement membrane. The a5(IV) collagen gene (COL4A5) has been assigned to the Xq22 locus (Hostikka et al., 1990), a region where the Alport syndrome gene has been located (Atkin et al., 1988b; Flinter et al., 1988; Brunner et al., 1988). This X-linked disease is charac- terized primarily by hematuria and patchy splitting of the glomerular basement membrane (GBM) (Atkin et al., 1988a). We have recently identified three different mutations in the a5(IV) gene in 3 of 18 kindreds with Alport syndrome (Barker et al., 1990). One of these mutations involves the deletion of an approximately 15-kb segment containing exons 5 through 10, as counted from the 3’ end of the gene (Barker et al., 1990; this study). This mutation results in the synthe- sis of an a5(IV) collagen chain that lacks 240 amino acids from the carboxyl-terminal region. A second mutation involved a single base change in exon 3, converting a G to a C (Zhou et al., 1991). This muta- tion changes the TGT codon of cysteine in the NC domain of the chain to the TCT codon for serine. The mutation also generates new restriction sites for PstI and BglII and, consequently, a restriction fragment length polymorphism (RFLP) that is diagnostic for the disease in this kindred (Barker et al., 1990; Zhou et al., 1991). In a third kindred the absence of a TaqI fragment has been observed (Barker et al., 1990) but the exact nature of this variant has not yet been eluci- dated.

It is apparent that X-chromosome-linked Alport syndrome is a heterogeneous disease caused by sev- eral different mutations in the tu5(IV) collagen gene. Therefore, detailed characterization of this locus is crucial for future studies of the nature of mutations in different individuals with Alport syndrome. In the present study we describe the exon-intron structure of the 3’ end of the a5(IV) gene containing the last 19 exons that code for about one-half of the n5(IV) chain. The sequence of the exons and the flanking intron sequences are provided. The gene structure is also compared with the corresponding region of the previously characterized cul(IV) and cu2(IV) genes.

MATERIALS AND METHODS

Genomic Clones

Human genomic libraries in h Charon 4A (kindly provided by Dr. Eric Fritsch; generated by partial AluI/HaeIII digestion), in EMBL-3 (Clontech NO. HL 1067J), and in X Fix (Stratagene No. 944201) were

screened with 3”P-labeled cDNA inserts coding for the human a5(IV) gene (Hostikka et al., 1990) according to standard procedures (Maniatis et al., 1982). Iso- lated genomic clones were characterized by restric- tion mapping and by hybridization with different cDNA inserts or sequence-specific oligonucleotide probes. Suitable restriction fragments were subcloned into pUC and/or Ml3 vectors for further restriction mapping and sequence analysis.

Nucleotide Sequencing

Exon-containing genomic DNA subclones were se- quenced in Ml3 (Messing, 1983) and/or pUC vectors. Exons were sequenced with the Sanger dideoxynu- cleotide chain termination method (Sanger et al., 1977) using Sequenase (U.S. Biochemicals). Ml3 “universal primer” or sequence-specific oligonucleo- tides (made with an Applied Biosystems DNA synthe- sizer) were used.

Heteroduplex Analysis

cDNA clones PL-31 and MD-6 (Hostikka et al., 1990) were linearized with ScaI. Each was separately hybridized to the X genomic DNA clones in 10 mM Tris, 1 mMEDTA, pH 8.5,50% formamide. The reac- tion mixture was mounted onto electron microscopic grids in the presence of formamide as described previously (Chow et al., 1979; Lopata et al., 1983) and examined in a Zeiss EM10 CA electron microscope. The heteroduplexes were photographed and the nega- tive was enlarged and traced. Single-strand circular $X174 and renatured cDNA molecules in the same negative served as single- and double-strand length standards, respectively, in the determination of in- tron and exon lengths.

Polymerase Chain Reaction Amplification and Cloning of Exon 15

On the basis of structural similarities with the type IV collagen al chain gene, it could be anticipated that the 127 bp of coding sequence that was not present in the genomic clones was in one exon. To verify this, the polymerase chain reaction (Saiki et al., 1988) was carried out from total human DNA. Oligonucleotide primers, P, (5’-CTAGAATTCGGTGAGCCTGGT- CTGCCT-3’) and P, (5’-CCGAAGCTTCTGGGA- ATCCAGGAAGGC-3’) for the 5’ and 3’ ends, respec- tively, of the putative exon 15 were designed on the basis of the sequence known from the cDNA clones (Hostikka et al., 1990; Williams, 1989). The primers were synthesized in an Applied Biosystems, Inc., DNA synthesizer. The primers were designed to con- tain EcoRI (Pi) and Hind111 (P,) restriction sites at

HUMAN a5(IV) COLLAGEN GENE

the 5’ end for subsequent cloning of the amplified fragment into the sequencing vector. The polymerase chain reaction was performed by using a commercial polymerase chain reaction kit (Perkin-Elmer/Cetus) using 1 pg of human lymphocyte DNA as template and primers at 25 pmol concentration. The initial de- naturation was performed at 94°C for 10 min followed by cooling on ice for 3 min. Tuq polymerase (0.5 ~1) was then added and 25 cycles were carried out as fol- lows: denaturation at 94°C for 1.5 min, annealing at 55°C for 2 min, and extension at 72°C for 2 min. The amplified product was extracted with phenol/chloro- form, digested with EcoRI and HindIII, and electro- phoresed in Nusieve GTG low-melting-point agarose (FMC BioProducts) gel. A band of about the expected 145-bp size based on comparison with the al(IV) gene (includes 28 extra bp added for cloning of the frag- ment) was excised and ligated into an EcoRI/ HindIII-cut M13mp19 vector in the presence of 1 mM hexamine cobalt chloride (Murray, 1986). Positive clones were isolated and sequencedusing the Ml3 uni- versal primer and Sequenase.

RESULTS

Identification of Genomic Clones

Eight h phage clones that together spanned about 60 kb (Fig. 1) were isolated using the cDNA clones MD-6 and PL-31 that contain about 50% of the cod- ing sequence from the 3’ end of cr5(IV) gene (Hostikka et al., 1990). The genomic clones were purified and characterized by restriction enzyme mapping and Southern analysis. Furthermore, the exon-intron size pattern was determined by electron microscopy of heteroduplexes formed between the genomic and the cDNA clones. Six overlapping clones (ML-5, MG-2, EB-4, FM-13, F-2, F-7) covered about 9.5 kb of the 3’ flanking region and about 36 kb of the gene, including the 14 most 3’ exons. Two additional clones, F-8 and ML-2, that did not overlap with the clone F-7 were shown to contain exons 16-19 as counted from the 3’ end of the gene.

Exon-Intron Structure

Restriction fragments of the genomic clones were subcloned and sequenced to determine the exon sizes. For sequencing, either universal primers or primers designed on the basis of the nucleotide sequence known from the cDNA were used. Furthermore, the sizes of both exons and introns were determined from heteroduplexes between the cDNA and the genomic clones. The sizes of exons (data not shown) were in good agreement with those obtained from DNA se- quencing, thus validating the assignment of introns.

4 ZHOII ET AL.

The sizes of exons and introns are summarized in Ta- ble 1. The data demonstrate that the eight genomic clones studied contain 18 exons, i.e., 1-14 and 16-19, as counted from the 3’ end of the gene (Fig. 1). The size of the missing exon 15 (127 bp) was predicted by comparison of the exon size patterns in the cr5(IV) and crl(IV) collagen genes (Table 1). The sequence of exon 15 (Fig. 2) was verified after its amplification from genomic DNA with the polymerase chain reac-

TABLE I

Sizes (bp) of Exons and Introns in the 3’ End of the Human Type IV Collagen (~5 and (~1 Chain Genes

Gene

Gene segment IX5 (IV) o!l (IV)”

Exon 19 (34) 150 Intron 18 (34) 1500

Exon 18 (35) 99 Intron 17 (35) 1450

Exon 17 (36) 90 Intron 16 (36) 300

Exon 16 (37) 140 Intron 15 (37) >5000

Exon 15 (38) 127 Intron 14 (38) r4500

Exon 14 (39) 81 Intron 13 (39) 1210

Exon 13 (40) 99 Intron 12 (40) 560

Exon 12 (41) 51 Intron 11 (41) 940

Exon 11 (42) 186 Intron 10 (42) 7000

Exon 10 (43) 134 Intron 9 (43) 3100

Exon 9 (44) 73 Intron 8 (44) 133*

Exon 8 (45) 7’ Intron 7 (45) 870

Exon 7 (46) 129 Intron 6 (46) 4480

Exon 6 (47) 99 Intron 5 (47) 1400

Exon 5 (48) 213 Intron 4 (48) 5000

Exon 4 (49) 178 Intron 3 (49) 2050

Exon 3 (50) 115 Intron 2 (50) 345b

Exon 2 (51) 173 Intron 1 (51) 900

Exon 1’ (52)d 1245

153

99

90

140

127

81

99

51

186

134

73

72

129

99

213

178 >

115

173

1383

I

161

115

1100

300

97

600

1500

2000

800

2670

960

1390

1510

1210

960

-3000

2900

1900

LI Ref. (28). * Sequenced (this study). ’ The number as counted from the 3’ end. ’ The number in parentheses refers to the actual number in the

previously determined human nl (IV) gene as counted from the 5’ end.

tion and subsequent DNA sequencing. The complete sequence of the 19 most 3’ exons of the gene, together with the derived amino acid sequences, is shown in Fig. 2. With the exception of exon 15,30 bp of intron sequences flanking each exon are also provided in Fig. 2.

Exon 1, the most 3’ exon, is 1245 bp long and codes for 79 bp of the translated sequence and 1166 bp of a 3’ untranslated sequence (Fig. 2). The 3’ untranslated sequence contains one typical polyadenylation signal, AATAAA. However, we have, as yet, not isolated cDNA clones that indicate the use of this signal se- quence. In contrast, there are two closely spaced po- tential polyadenylation signals, AATATA and AATTAAA (Hostikka et al., 1990), present further downstream. One of those is most likely a functional polyadenylation signal sequence since three cDNA

HUMAN (u5(IV) COLLAGEN GENE 5

Em” 7 GiTCTCC?KXATTACCTGGTCCTKAGGACAGAGTATCATAATl!AMGGAGATGCTCGTC GPPGLPGPSG*SIIIKGDAG

CTCCXGAATCCCTGGCCAGCCTGGKTAAAGGGGTCTACCAGGACCCCM~CCTCAAG PPGIPGQPGLKGLPGPPGPO

cCrrACCAGPtaccaatg~g*~~~=~~~*~~*~=*~~ G L P

Em” 5 GT*CCCGTGGmG*TGGTCCccc?cGTccAGAT~lTGcAAGGTccccc*GGTcccc GTRGLDGPPGPDGLaGPPGP

ATAATGTTTGCAWTTKCTAAGAAATGACTATICTTA NNVCNFASRNDYSYWLSTPE

attatpttccttctccrtttcctttaccag

EIM 3 ATGTGcAGTATGTGAAGCTCCAGCTGTGGTGATcGcAGlT~cAG~*G.4cGA~cAGAT CA”CEAPA”“IA”HSQTIgI

Eron 2 cATACAAGTGcAGGGGc*GAAGGcKAGGTCAAGCCCTAGccTccccTGGmcTGcm HTSAGAEGSGoALASPGSCL

GAAGAGTTTCGTTCAGCKCClTZAT”XAATGTCATGGGAGGGGTACC?GTAACTACTAT EEFRSAPFIECHGRGTCNYY

GccAACTCCT*CAGCmGGcTGGCAACTGPAWLrrTGTcAGAcATGlTcAGgtaa*gt ANSYSFWLATVDVSDMFS

gcttatagctttaattcaggtcc

tagcaattg

~tcttaattttaccaatttgacctttctag

GTCCCARAGGTAACCCTGGTCTCCCTIXXAGC~GGT~ATAG~CCTCCTGGACTTA GPKGNPGLPGQPGLIGPPGL

CCTGG~CCCCGGATPACCAGGGARCCCCTGGAGCAAMGGAC~CCAGGCC~~CTGGA PGLPGLPGTPGAKGOPGLPG

RCCCAG F P

ttc*tttttaaattgagctctttactctap

GAACCCCAGGCCCTCCTGGACCMAAGGCCC GTPGPPGPKGISGPPGNPG:

gggtgt*acctgctgtactcaattttttag

GTGGTGGAGGKATCCTGGGCAACCAGGXCTCCAGGC GAAAAAGGCAAACCCGGWA GGGGHPGQPGPPGEKGKPGQ

ATGGTATICCrr;GRCUGCTGGACA~G~~C~qtgctgtagttttt~ttttt DGIPGPAGQKGEP

ttc9ttttstttt9tttt~*=~=~g*~*g

GTCAACCAGGCTTTGGPAACCCAGGACCCCCTGGACTTCCAGGACTTTCXgtaaacctt GOPGFGNPGPPGLPGLS

AGGGCGAACCAGGCmtACGGmCCCTGGTGTGCAGGGGCCC KGEPGFHGFPGVQGPPGPPG

FIG. Z-Continued

clones, including the previously described PC-4 (Hos- tikka et al., 1990), contain a poly(A) tail at 26 and 14 nucleotides, respectively, downstream from these two putative polyadenylation signals (Fig. 2). We have previously described a cDNA clone (PL-35) contain- ing a 3’ untranslated sequence reaching 34 nucleo- tides further downstream of PC-4 but not having a poly(A) tail (Hostikka et al., 1990). The present analy- sis of the gene implies that the mentioned 34-nucleo- tide sequence is a cloning artifact since it is not pres- ent in the gene, as verified by sequencing of two sepa- rate genomic clones (data not shown). We sequenced 282 bp downstream from the AATATA sequence and did not find the 34-nucleotide sequence or any addi- tional polyadenylation signals (Fig. 2). We conclude that the 3’ end of the structural a5(IV) collagen gene is located 1166 bp downstream from the first base of the translation termination codon.

Exons l-5 code for the NC domain. Exon 5 is a “junction exon” with 142 bp coding for the NC do- main and 71 bp coding for the Gly-Xaa-Yaa repeat- containing collagenous domain. Exons 6619 that code for the collagenous domain have sizes varying be- tween 51 and 186 bp. None of these exons has 54 bp or multiples thereof, the sizes typically found in the genes for fibrillar collagens (Boedtker et al., 1985; de Crombrugghe et al., 1985). Twelve of the fourteen ex- ons coding for the collagenous domain start with a split codon, i.e., the second base of the codon for gly- tine (Fig. 2).

The intron sizes analyzed vary between 133 and 7000 bp. The sizes of introns 14 and 15 could not be determined because genomic clones containing com- plete introns were not isolated. It can be calculated from the available data, however, that intron 14 is over 4500 bp and intron 15 more than 5000 bp. Conse- quently, the 19 most 3’ exons that contain about 50% of the human type IV collagen cu5 chain gene are lo- cated in 50 kb of genomic DNA.

Comparison of the Exon-Intron Structures of the

cuS(IV), aI( and cr2(IV) Genes

We compared the structure of the gene for the type IV collagen a5 chain with those of the (Al and cu2(IV) genes. Analysis of primary structure has previously demonstrated that the cu5, ~1, and (~2 chains of type IV collagen are all structurally closely related (Hostikka et al., 1990; Soininen et al., 1987; Hostikka and Tryggvason, 1988). However, the cu5(IV) and Cal chains are considerably more closely related to each other than to the crS(IV) chain (Hostikka et al., 1990). We have shown (Hostikka and Tryggvason, 1987; Soininen et al., 1989) that the genes for the cul(IV) and a2(IV) chains, which are

located adjacent to each other on chromosome 13 (Boyd et al., 1986; Griffin et al., 1987; Piischl et al., 1988; Soininen et al., 1988), have very different exon sizes. Interestingly, the present results demonstrate that the genes for the cu5(IV) and (ul(IV) chains have practically identical exon structures, at least in the 3’ end of the genes (Table 1; Fig. 3). The only difference is exon 19 in the (u5(IV) gene, which has 150 bp when the corresponding exon in the cul(IV) gene (exon 34) has 153 bp due to the deletion of one amino acid in the &(IV) chain (see Hostikka et al., 1990). This homol- ogy is in sharp contrast to the differences with the cu2(IV) gene, which has previously been reported to vary considerably from the (Al gene (Hostikka and Tryggvason, 1987). Thus, the NC domain is en- coded by three exons in the cu2(IV) gene but by five exons in the a5(IV) and cul(IV) genes. However, all the intron locations in this region of the genes have been conserved. Another interesting feature is that the exon-intron pattern of the a5(IV) and crl(IV) genes, on the one hand and that of the cuB(IV) gene, on the other, differ considerably in the collagenous do- main coding region. Here, only intron 4 in the ~2(1V) gene coincides with an intron location in the other two genes (Fig. 3). Furthermore, the sizes of all exons in this region differ between the t.wo groups.

Comparison of sizes of introns in the human a5(IV) and oll(IV) genes shows that regardless of the conser- vation of coding sequences in the 3’ end of the genes, the intron sizes do not show any conservation of indi- vidual introns (Table 1). However, the sizes of the corresponding 3’ ends of the genes appear to be simi- lar; the 19 exons of the a5(IV) and cul(IV) genes spanned about 50 and 40 kb of their respective geno- mic DNA. Exact comparison cannot be made since the sizes of all introns have not been determined.

DISCUSSION

The present results provide the exon-intron struc- ture of the 3’ end of the human type IV collagen (~5 chain gene located on Xq22. Analysis of the sequence of the 19 most 3’ exons demonstrated that the exon size profile is almost identical to that of the al(IV) gene, at least in this part of the gene. If this similarity holds true for the rest of the gene, the entire a5(IV) should contain 52 exons, as does the (ul(IV) gene (Soininen et al., 1989). The fact that the 19 exons studied here are contained in about 50 kb of genomic DNA indicates that the gene may have a size of about 100 kb, which is similar to that of the human al(IV) gene (Soininen et al., 1989).

The conservation of exon sizes in the a5(IV) and ~yl (IV) genes indicates that there has been a selective pressure to maintain this structure. This is somewhat

HUMAN n5CIV) COLLAGEN GENE 7

GLY -X -Y -coding

0 NC domain

0 3’ untranslated

FIG. 3. Illustration ofthe exon skucture of the 8’ends of the human genes for type IV collagen tu5, CY~, and &2 chains. Exons are indicated bv boxes and introns bv interconnectine lines (not, in scale). All intron locations in the iu5CIV) and nl(IV) genes coincide. Interrupted vertical ~I

lines depict intron locations that are identical in all three genes.

surprising considering the fact that exon sizes in the closely related ~2(1V) gene are very different. The dif- ference in exon structure of the cul(IV) and (uB(IV) genes and in the primary structure of the correspond- ing proteins led to the suggestion that there was not a severe functional constraint to maintain exact length of type IV collagen chains with the exception of the NC domains (Hostikka and Tryggvason, 1988). Therefore, divergence of gene structure with minor delet.ions or insertions of coding sequences could be tolerated. In contrast, an exactly equal length of col- lagen molecules is essential for the formation of colla- gen fibrils, and this is thought to explain the strict conservation of exon size pattern between the genes for fibrillar collagens (Boedtker et al., 1985; de Grom- brugghe et al.. 1985). With regard to cu5(IV) and (ul(IV), the present results imply that the genes have evolved from a common ancestor gene after branch- ing from the ~u2(1V) gene and that the exons were present prior to gene duplication since the exon-in- tron locations are completely conserved.

The most practical significance of the present work is doubtless in the detailed structural and nucleotide sequence data from a large portion of a gene that is affected in human disease--the Alport syndrome. This X-chromosome-linked disease is a major cause of terminal renal failure in males, with a gene fre- quency estimated to be about 1:5000 (Atkin et al., 1988a). The major clinical symptoms are loss of kid- ney function accompanied by hematuria and progres- sive sensorial deafness (Atkin et al., 1988a,b; Flinter et al., 1988; Brunner et al., 1988). Ultrastructural ab- normalities such as patchy splitting of the lamina densa in the GBM together with biochemical analyses led to the hypothesis that type IV collagen, the major structural component of the GBM, was defective (Spear, 1973; Kleppel et al., 1987). However, the af- fected chains in Alport syndrome could not be ul(IV) or (u2(IV) since their genes are on chromosome 13 (Boyd et al., 1986; Griffin et al., 1987). Our recent identification of the unique c&(W) chain and the sub- sequent localization of the gene to the locus of the

Alport syndrome on chromosome X strongly sug- gested it to be the gene affected in the disease (Hos- tikka et al., 1990). Importantly, this has been shown to be the case, since at least three difIerent mutations were found in three kindreds (Barker et al., 1990). In Utah Kindred EP (Hasstedt et al., 1986) the mutation involved the deletion of about a 15-kb segment of the gene spanning exons 5-10. As can be concluded from the present data, this deletion would result in the syn- thesis of an (u5(IV) chain missing 240 amino acid resi- dues. The variant chain would lack 193 residues from the carboxyl-terminal end of the collagenous domain and a 47--residue segment from the NC domain that contains 1 of the 12 evolutionarily conserved cysteine residues. The loss of this segment alone could not only interfere with the normal folding of the globular NC domain but also hinder the formation of the triple helix and end-to-end assembly of type IV collagen molecules (see Timpl, 1989). The additional loss of 193 residues from the collagenous domain would obvi- ously result in a totally nonfunctional N chain. A sec- ond mutation, found in large Utah Kindred P (Atkin et al., 1988a), involved the generation of a new PstI restriction site generated by a point mutation in exon 3, converting a TGT codon for cysteine to the TCT codon for serine (Zhou et al., 1991). In the third kindred, there was an absence of a TuqI fragment but the characterization of this mutation is still in prog- ress.

To date, there are no protein data available on the a5(IV) chain. Thus, it is not known whether it forms a homotrimer or whether it can assemble into hetero- trimers together with the (ul(IV) and tu2(IV) chains or the as yet poorly characterized a3(IV) and cu4(IV) chains. In the latter case, the presence of a nonfunc- tional variant a5(IV) chain could interfere with the normal formation of triple-helical type IV collagen molecules. However, because the n5(IV) chain is a minor component of type IV collagen in comparison with the ~ul(1V) and cu2(IV) chains, the damage to the GBM might be slow, which could explain the usually slow progression of the disease.

8 ZHOI ET AL,

Recent data indicate that X-chromosome-linked Alport syndrome is a heterogeneous disease involving several different mutations in the &(IV) gene. It can- not be excluded that some forms of the disease are due to defects in another adjacent type IV collagen gene. It is obvious that detailed analysis of new mutations in the tu5(IV) gene requires extensive knowledge about the structure of the normal gene. The sequence data provided here allow the design of oligonucleotide primers for the amplification and sequencing of al- most one-half of the exons and their immediate flanking regions of DNA from patients with Alport syndrome.

10.

11.

12.

13.

ACKNOWLEDGMENTS 14.

This work was supported in part by grants from the Academy of Finland and The Sigrid Juselius Foundation, by National Insti- tutes of Health Grant 36200, and by a grant from the Council for Tobacco-Research-USA (2550).

15.

1.

2.

3.

4.

5.

6.

REFERENCES

ATKIN, C. L., GREGORY, M. C., AND BORDER, W. A. (1988a). Alport syndrome. In “Diseases of the Kidney” (R. W. Schrier and C. W. Gottschalk, Eds.1, 4th ed., Chap. 19, pp. 617-641, Little, Brown, Boston.

ATKIN, C. L., HASSTEDT, S. .J., MENLOVE, L., CANNON, L., KIRSCHNER, N., SCHWARTZ, C., NGUYEN, K., AND SKOLNICK, M. (1988b). Mapping of Alport syndrome to the long arm of the X chromosome. Amer. J. Hum. Genet. 42: 249-255.

BARKER, D., HOSTIKKA, S. I,., ZHOU, J., CHOW, L. T., OLI- PHANT, A. R., GERKEN, S. C., GREGORY, M. C., SKOLNICK, M. H., ATKIN, C. L., AND TRYGGVASON, K. (1990). Identifica- tion of mutations in the COL4AFi collagen gene in Alport syn- drome. Science 248: 1224-1227.

BOEDTKER, H., FINER, M., AND AHO, S. (1985). The structure of the chicken 2 collagen gene. Ann. N. Y. Acad. Sci. 460: 85-116.

BOYD, C. D., WELIKY, K., TOTH-FEJEL, S. E., DEAK, S. B., CHRISTIANO, A. M., MACKENZIE, J. W., SANDELL, L. J., TRYGGVASON, K., AND MAGENIS, E. (1986). The single copy gene coding for human crl(IVl procollagen is located at the terminal end of the long arm of chromosome 13. Hum. Genet. 74: 121-125.

BRUNNER, H., SCHRBDER, C., VAN BENNEKOM, C., LAMBER- MON, E., TUERLINGS, J., MENZEL, D., OLBING, H., MONNENS, L., WIERINGA, B., AND ROPERS, H.-H. (19881. Localization of the gene for X-linked Alport’s syndrome. Kidney Int. 34: 507-510.

BUTKOWSKI, R. J., LANGEVELD, J. P. M., WIESLANDER, J.. HAMILTON, J., AND HUDSON, B. G. (1987). Localization of the Goodpasture epitope to a novel chain of basement membrane collagen. J. Biol. Chem. 262: 18’74~7871.

CHOW, L. T., BROKER, T. R., AND LEWIS, J. B. (19’79). Com- plex splicing patterns of RNAs from the early regions of Ade- novirus-2. J. Mol. Biol. 134: 2655303.

DE CROMBRUGGHE, B., SCHMIDT, A., LIAU, G., SETOYAMA, C., MUDRYJ, M.. YAMADA, Y., AND MCKEON, C. (1985). Struc- tural and functional analysis of the genes for a-2(1) and u-1(111) collagens. Ann. N. Y. Acad. Sci. 460: 154-162.

16.

17.

18.

19.

20.

21.

22.

23.

24.

25.

CUTTINc:, G. R., KAZAZIAN, H. H., JR., ANTONARAKIS, S. E., KILLEN, P. D., AND YAMADA, Y. (19881. Macrorestriction mapping of COL4Al and COL4A2 collagen genes on human chromosome 13q34. Genomics 3: 256-1’63.

FLINTER, F. A., CAMERON, J. S., CHANTLEH, C., HOUSTON, I., AND BOBROW, M. (1988). Genetics of classic Alport’s syn- drome. Lancet 2: 1005-1007.

GRIFFIN, C. A., EMANUEL, B. S., HANSEN, J. R., CANEVEE, W. K., AND MYERS, .J. C. (1987). Human collagen genes en- coding basement membrane tul(IV) and 1r2fIV) chains map to the distal long arm of chromosome 13. Proc. N&l. Acad. Sci. 1ISA 84: 512-516.

GUNWAR, S., SAUS, J., NOELKEN, M. E., AND HUDSON, R. G. (19901. Glomerular basement membrane. Identification of a fourth chain, 04, of type IV collagen. J. Biol. Chem. 265: 5466-5469.

HASSTEDT, S. J., ATKIN, C. L., AND SAN JUAN, A. C., JR. (1986). Genetic heterogeneity among kindreds with Alport syndrome. Amer. J. Hum. Genet. 38: 940-953.

HOSTIKKA, S. L., EDDY, R. L., BYERS, M. G., H~~YHTY;~, M.. SHOWS, T. B., AND TRYGGVASON, K. (1990). Identification of a distinct type IV collagen a chain with restricted kidney dis- tribution and assignment of the gene to the locus of X chro- mosome-linked Alport syndrome. Proc. Natl. Acad. Sci. USA 87: 1606-1610.

HOSTIKKA, S. L.. AND TRYGGVASON, K. (1987). Extensive structural differences between genes for the cul and n2 chains of type IV collagen despite conservation of coding sequences. FEBS Lett. 224: 297-305.

HOSTIKKA, S. L., AND TRYGGVASON, K. (1988). The complete primary structure of the a2 chain of human type IV collagen and comparison with the nl(IV) chain. J. Biol. Chem. 263: 19488-19493.

KLEPPEL, M. M., KASHTAN, C. E., BUTKOWSKI, R. J., FISH, A. J., AND MICHAEL, A. F. (1987). Absence of 28 kilodalton non-collagenous monomers of type IV collagen in glomerular basement membrane. J. Clin. Znuest. 80: 263-266.

LOPATA, M. A., HAVERCROFF, J. C., CHOW, I,. T., AND CLEVE- LAND, D. W. t 1983). Four unique genes required for @ tubulin expression in vertebrates. Cell 32: 713-724.

MANIATIS. T., FRITSCH, E., AND SAMBROOK, d. (1982). “Mo- lecular Cloning: A Laboratory Manual,” Cold Spring Harbor Laboratory, Cold Spring Harbor, NY.

MESSING, J. (1983). New Ml3 vectors for cloning. In “Meth- ods in Enzymology” (R. Wu, L. Grossman, and K. Moldave, Eds.). Vol. 101, pp. 20-78, Academic Press, San Diego.

MURRAY, J. A. H. (1986). HCC ligation: Rapid and specific DNA construction with blunt ended DNA fragments. Nucleic Acids Res. 14: 10,118.

PGSCHL, E., POLLNER, R., AND K~~HN, K. (19881. The genes for the ol(IV) and n2fIVl chains of human basement mem- brane collagen type IV are arranged head-to-head and sepa- rated by a bidirectional promoter of unique structure. EMBO J. 7: 2687-2695.

SAIKI, R. K.. GELFAND, D. H., STOFFEL, S., SCHARF, S. J., HIGUCHI, R., HORN, G. T., MULLIS, K. B., AND ERLICH, H. A. (1988). Primer-directed enzymatic amplification of DNA with thermostable DNA polymerase. Science 239: 487-491.

SANGER, F., NICKLEN, S., AND COULSON, A. R. (19771. DNA sequencing with chain terminating inhibitors. Proc. Nat/. Acad. Sci. USA 74: 5463-5467.

HUMAN &i(W) COLLAGEN GENE 9

26.

27.

28.

29.

SAUS, J., WIESLANDER, J., LANGEVELD, J. P. M., QUINONES, S., AND HUDSON, B. G. (1988). Identification of the Goodpas- ture antigen as the tu3(IV) chain of collagen IV. J. Viol. Chem. 263: 13,374-13,380.

30. SOININEN, R., HAKA-RISKU, T., PROCKOP, D. J., AND TRYGG- VASON, K. (1987). Complete primary structure ofthe ~1 chain of human basement membrane (type IV) collagen. FELLS L&.225: 188-194.

31.

SOININEN, R., HUOTARI, M., GANGULY, A., PROCKOP, D. J.. AND TRYGGVASON, K. (1989). Structural organization of the gene for the ml(W) chain of human type IV collagen. J. Riol. (‘hem. 264: 1X565-13,571.

SOININEN, R., HUOTARI, M., HOSTIKKA, S. L., PROCKOP, D. J.. AND TRY~GVASON, K. (1988). The structural genes for

32.

33.

cul and 02 chains of human type IV collagen are divergently encoded on opposite DNA strands and have an overlapping promoter region. J. Riol. Chem. 263: 17,217-17,220.

SPEAR, G. S. (1973). Alport’s syndrome: A consideration of pathogenesis. Clin. Nephrol. 1: 336-337.

TIMPL, R. (1989). Structure and biological activity of base- ment membrane proteins. Eur. J. Biochem. 180: 487-502.

WILLIAMS, J. F. (1989). Optimization strategies for the poly- merase chain reaction. BioTechniques ‘7: 76% 768.

ZHOU, J.. BARKER, D. F., HOSTIKKA, S. L., GREGORY, M. c’., ATKIN, C. L., AND TRYGGVASON, K. (1991 I. Single base muta- tion in n5(IV) collagen chain gene converting a conserved cysteine to serine in Alport syndrome. Genomirs 9:1()-l%