Molecular cloning and nucleotide sequence of human α1 acid glycoprotein cDNA

5
Gene, 44 (1986) 127-131 Elsevier GENE 1622 127 Molecular cloning and nucleotide sequence of human aI acid glycoprotein cDNA (Recombinant DNA; pEX vectors; expression library; orosomucoid; antiserum; plasmid vector) P.G. Board”*, I.M. Jones b and A.K. Bentley b a The John Curtin School of Medical Research, Australian National University, Canberra A.C.T., 2601 (Australia) Tel. 624925.50. and’ Sir William Dunn School of Pathology, University of Oxford, South Parks Road, Oxford OX1 3RE (U.K.) Tel. 0865 5 732 1 (Received June 25th, 1985) (Revision received February 20th, 1986) (Accepted March 14th, 1986) SUMMARY A cDNA clone has been isolated from a pEX expression library that encodes aI acid glycoprotein. We present the complete nucleotide sequence encoding this protein and compare the derived amino acid sequence with pre-existing data. INTRODUCTION Human CI ,AGP, also known as orosomucoid, is a major acute-phase plasma protein. The primary structure of a,AGP has been determined and the protein has an M, of about 40000, of which 45% is carbohydrate (Schmid et al., 1973). Although a,AGP is thought to be a single chain polypeptide synthesised in the liver, recent immunological studies by Ochi et al. (1984) have suggested that it might be a component of CEA, a large glycoprotein found in many tumors. * To whom correspondence and reprint requests should be addressed. Abbreviations: aa, amino acid(s); GL, AGP, a, acid glycoprotein or orosomucoid; AMV, avian myeloblastosis virus; bp, base pair(s); CEA, carcinoembryonic antigen; CNBr, cyanogen bromide; EGF, epidermal growth factor; nt, nucleotide(s). The level of a,AGP in the circulation becomes elevated during inflammation and although the role of this protein has not been clearly defined, there is evidence that a,AGP has some immunosuppressive functions (Bennett and Schmid, 1980; Cheresh et al., 1984). Very little is known about the genetic control of human c(,AGP. The aa sequence studies of Schmid et al. (1973) have suggested that considerable heterogeneity may exist and Johnson et al. (1969) have reported a genetically determined electro- phoretic polymorphism of a,AGP. However, the molecular basis for this variation has yet to be deter- mined. We have used the pEX expression vector system (Stanley and Luzio, 1984) to construct a human liver cDNA library and to identify clones expressing a,AGP antigenic determinants. A cDNA clone containing the complete coding sequence for the CI, AGP precursor polypeptide has been sequenced. 0378-l 119/86/$03.50 0 1986 Elsevier Science Publishers B.V. (Biomedical Division)

Transcript of Molecular cloning and nucleotide sequence of human α1 acid glycoprotein cDNA

Page 1: Molecular cloning and nucleotide sequence of human α1 acid glycoprotein cDNA

Gene, 44 (1986) 127-131

Elsevier

GENE 1622

127

Molecular cloning and nucleotide sequence of human aI acid glycoprotein cDNA

(Recombinant DNA; pEX vectors; expression library; orosomucoid; antiserum; plasmid vector)

P.G. Board”*, I.M. Jones b and A.K. Bentley b

a The John Curtin School of Medical Research, Australian National University, Canberra A.C.T., 2601 (Australia) Tel. 624925.50. and’ Sir William Dunn School of Pathology, University of Oxford, South Parks Road, Oxford OX1 3RE (U.K.) Tel. 0865 5 732 1

(Received June 25th, 1985)

(Revision received February 20th, 1986)

(Accepted March 14th, 1986)

SUMMARY

A cDNA clone has been isolated from a pEX expression library that encodes aI acid glycoprotein. We present the complete nucleotide sequence encoding this protein and compare the derived amino acid sequence with pre-existing data.

INTRODUCTION

Human CI ,AGP, also known as orosomucoid, is a major acute-phase plasma protein. The primary structure of a,AGP has been determined and the protein has an M, of about 40000, of which 45% is carbohydrate (Schmid et al., 1973). Although a,AGP is thought to be a single chain polypeptide synthesised in the liver, recent immunological studies by Ochi et al. (1984) have suggested that it might be a component of CEA, a large glycoprotein found in many tumors.

* To whom correspondence and reprint requests should be

addressed.

Abbreviations: aa, amino acid(s); GL, AGP, a, acid glycoprotein

or orosomucoid; AMV, avian myeloblastosis virus; bp, base

pair(s); CEA, carcinoembryonic antigen; CNBr, cyanogen

bromide; EGF, epidermal growth factor; nt, nucleotide(s).

The level of a,AGP in the circulation becomes elevated during inflammation and although the role of this protein has not been clearly defined, there is evidence that a,AGP has some immunosuppressive functions (Bennett and Schmid, 1980; Cheresh et al., 1984).

Very little is known about the genetic control of human c(,AGP. The aa sequence studies of Schmid et al. (1973) have suggested that considerable heterogeneity may exist and Johnson et al. (1969) have reported a genetically determined electro- phoretic polymorphism of a,AGP. However, the molecular basis for this variation has yet to be deter- mined.

We have used the pEX expression vector system (Stanley and Luzio, 1984) to construct a human liver cDNA library and to identify clones expressing a,AGP antigenic determinants. A cDNA clone containing the complete coding sequence for the CI, AGP precursor polypeptide has been sequenced.

0378-l 119/86/$03.50 0 1986 Elsevier Science Publishers B.V. (Biomedical Division)

Page 2: Molecular cloning and nucleotide sequence of human α1 acid glycoprotein cDNA

128

EXPERIMENTAL AND DISCUSSION

(a) Methods

RNA was extracted from human liver in guanidine thiocyanate and purified on a CsCl step gradient by methods previously described (Chirgwin et al., 1979; Glisin et al., 1974). Polyadenylated RNA was selected by oligo(dT) chromatography (Aviv and Leder, 1972). Double-stranded cDNA was synthe- sised using AMV reverse transcriptase and T4 DNA polymerase as described by Crabtree and Kant (1981).

All restriction digests, DNA ligations and plasmid transformations were done by standard methods (Maniatis et al., 1982).

The cDNA library prepared in the pEX vectors was used to transform Escherichiu coli MC1061 (Casadaban and Cohen, 1980), which contained the plasmid ~~1857 (Remaut et al., 1983) as the source of the temperature sensitive repressor. The antiserum used to detect cr,AGP antigenic determinants was a gift from Mr. T. Suzuki.

(b) Construction and screening of a human liver

cDNA fragment library

Double-stranded cDNA prepared from human liver mRNA was digested with Sau3A, ligated into an equimolar mixture of all three reading frame pEX vectors, and the mixture used to transform E. coli

MC1061 (~~1857). A library of approx. 30000 trans- formants was obtained and subsequently screened by the colony blot procedure of Stanley and Luzio

S StiSS HS IIIII II

(1984). Cross-reacting antigen was detected using a rabbit primary antiserum and a goat anti-rabbit IgG second antibody coupled to alkaline phosphatase. Several immunologically positive clones were de- tected and the clone with the largest cDNA insert (115 bp) was termed pa,AGP-1.

(c) Identification and structure of a full-length

cDNA clone

The insert from pa,AGP-1 was used to screen an additional human liver cDNA library known to contain cDNA inserts large enough to encompass the expected full length a,AGP cDNA (Reid et al., 1984). Approx. 50000 clones were screened with end-labeled insert and revealed approx. 500 positive colonies. This finding suggested that a,AGP mRNA was extremely abundant in the liver sample used for the preparation of this full-length cDNA library. A total of 15 positive clones were selected randomly, and further characterization indicated that they contained inserts ranging in size from 400 to 1160 bp. The plasmid with the largest insert was designated pa,AGP-2 and the insert was sequenced as shown in Fig. 1. The identity of the cDNA was confirmed by comparison of the translated cDNA sequence with the aa sequence published by Schmid et al. (1973). The insert in pa,AGP-2 was found to contain the entire protein coding region for a,AGP and included 140 bp of 3’ non-coding and 15 bp of 5’ non-coding sequence. In addition, at the 5’ end, the pa,AGP insert contained an inverted 407-bp repeat extending from nt 351 to the terminal A at nt 758. It is likely that this inverted repeat was

100 bp

Fig. 1. Structure and sequencing strategy of the cDNA insert in pa,AGP-2. The open box represents the coding sequence, the solid

line represents the non-coding regions and the hatched zones represent the vector. The open zone at the 5’ end represents the inverted

repeat of the sequence from nt 351 to 758. S, Sau3A; H, HaeIII; C, &I. Arrows represent the extent of each individual sequence

determination. Sequencing was done as described by Maxam and Gilbert (1980).

Page 3: Molecular cloning and nucleotide sequence of human α1 acid glycoprotein cDNA

generated during the synthesis of the cDNA and does not reflect the structure of the mRNA. The 3’ non-coding sequence was found to contain the normal poly(A) addition signal AAUAAA (Proudfoot and Brownlee, 1976), 17 nt upstream from the poly(A) region (Fig. 2).

129

(d) Nucleotide sequence of pa,AGP-2 cDNA

There was excellent agreement between the aa sequence deduced from the cDNA nt sequence (Fig. 2) and one of the aa sequences previously published by Schmid et al. (1973). The sequence

n A L SWULTUL S L L P L CCTCCPGGTCTCAGTATGGCGCTGTCCTOGGGTT~TTACAGTCCTGAGCCTCCTACCT~TG

10 20 30 40 SO 60

L E A&I PLCANLV P v P I T N CI T

CTGGAAGCCCAGATCCCATTGTGTGCCAACCTAGTA~CGGTGC~CATCACCAACGCCACC 70 00 so 100 110 120

L D 0 I TGKWFYI A S A FRNEEY CTGGACCAGATCACTGGCAAGTGGTTT~ATATCGCATCGGCCTTTCGAAACGAGGAGTAC

130 140 150 160 170 180

N K S U_QE I QATFFYF TPNKTE AATAAGTCGGTTCAGGAGATCCAAGCAACCTTCTTTTACTTCACCC~CAA~AAGA~AGAG

190 200 210 220 230 240

D T IFLREYQ TRQDQCI Y N T T GACACGATCTTTCTCAGAGAGTACCAGACCCGACAGGACCAGTGCATCTATAACACCACC

250 260 270 200 290 300

YLNVQRENGT ISRYVGGQEH TACCTGAATGTCCAGCGGGAAAATGGGACCATCTCCAGATACGTGGGAGGCCAAGAGCAT

310 320 330 340 350 360

F A H L L ILRDTK T Y M L A F 0 V N TTCGCTCACTTGCTGATCCT~AGGGACA~~AAGACCTA~ATG~TTGCTTTTGACGTGAAC

370 380 390 400 410 420

D E K NWGLSUYADK PETTKEQ GATGAGAAGAACTGGGGGCTGTCTGTCTRTGCTGACAAOCCAGAGA~GACCAAGGAGCAA

430 440 450 460 470 480

L G EFYEALDCLRIPK S D U U Y CTGGGAGAGTTCTACGAAGCTCTCGACTGCTTGCGCGCATTCCCAAGT~AGATGTCGTGTAC

490 500 SlO szo 530 540

T D W K K DKCEPLE K 0 H E K&R K ACcGATTGGAAAAAGGATA~TGTGAGCCACTGGAGAAGCAGCACGAGAAGGAGAGGAAA

550 S60 570 580 590 600

QEEGES* CAGGAGGAGGGGGAATCCTAOCAGQACACAGCCTTGGATCAGGACAGAGACTTGGGGGCC

610 620 630 640 650 680

ATCCTGCCCCTCCAACCCGACATGTGTACCTCAGCTTTTTCCCTCACTTGCATCAATAAA 670 690 690 700 710 720

GCTTCTGTGTTTGGAACAAAAAAAAAAAAAAAAAAAA~ 730 740 790

Fig. 2. The nt sequence of the complete human a,AGP cDNA included in pa,AGP-2. The proposed leader sequence, and those

aa differing from the previously published aa sequences are underlined. Asterisk is the stop codon.

Page 4: Molecular cloning and nucleotide sequence of human α1 acid glycoprotein cDNA

130

deduced from the cDNA differed in only four places. Ikenaka et al. (1966) reported that the N-terminal aa residue of plasma cr,AGP was pyro~utamic, ulti- mately derived from glutamic. In comparison, the cDNA sequence indicates that the pyroglutamic residue is derived from glutamine. The aa sequence published by Schmid et al. (1973) also contains a glutamic residue rather than a glutamine residue at position 42 and the deduced aa sequence contains an additional lysine at position 164 and an additional glutamic residue at position 175. The deduced aa sequence, therefore, contains 183 aa compared with 181 aa for the previously published sequence. Since the nearest initiation methionine codon lies 57 nt 5’ to the apparent N-terminal glutamine codon it is likely that the c(,AGP precursor contains a 19 aa leader sequence. As is commonly found, this pre- dicted leader peptide is rich in hydrophobic aa.

(e) Conclusions

The use of Sau3A fragments simplifies the con- struction of cDNA libraries as it eliminates the additional manipulations required to add synthetic DNA linkers or C + G tails. The major drawback of this procedure results from the small average size of the resulting cDNA inserts. In this case pee, AGP- 1, which generated a crossreacting fusion protein, contained the 69-bp fragment extending from nt 129 to 198 (see Fig. 2) plus further sequence which does not appear to be related to (x,AGP. It is most likely that the additional sequence was included by chance in the ligation procedure. Additional subclones in the pEX vectors, of Suu3A fragments from the insert in pct, AGP-2 have shown that the fragment extending from nt 246 to 637 can also generate a crossreacting fusion protein. From these results it appears that polyclonal antisera are capable of detecting protein sequences generated from small cDNA fragments and suggests that Sau3A fragment expression libraries may be a rapid and efficient means of initially identifying cDNA clones of interest.

A sequence homology at the aa level has been identified between a,AGP and the EGF receptor (Toh et al., 1985) but, as a result of second and third position codon changes, this homology is only poorly apparent when the cr,AGP cDNA sequence is compared with the EGF receptor nt sequence (not shown). The aa sequence of plasma CI ,AGP reported

by Schmid et al. (1973) contains alternate aa at 21 positions. The origin of this variation is not clear. Because of the observed homology between IgG and a,AGP it has been suggested that a,AGP may have constant and variable regions similar to those of the immunoglobulins (Emura et al., 197 1). Alternatively, if a genetic explanation is considered, it seems unlikely that the 21 substituted aa are the products of allelic variants at a single locus at frequencies that would allow their equivalent representation in cl,AGP purified from pooled plasma.

The interrelationship of some of the substitutions can be observed by examining the way the different

sequences were obtained. The aa sequence was originahy determined after the initial fractionation of four CNBr fragments. CNBr fragment I contained the N-terminal sequence and CNBr fragment II contained the C-terminal sequence, and both are in good agreement with the sequence deduced from the cDNA (Schmid et al., 1971; Ikenaka et al., 1972). In contrast, CNBr III and CNBr IV were found in lower yields and appeared to be derived from the N and C ends of CNBr fragment II but contained several aa substitutions (Emura et al., 1971). The fact that all the substitu~ons in the C-terminal portion of the c(,AGP sequence appear to be linked to the presence of a substituted methionine which results in the cleavage of CNBr II to give fragments III and IV makes it highly unlikely that the different sequences are the products of alleles but suggests that plasma cl,AGP may be the product of multiple gene loci that differ in several positions. Preliminary analysis of genomic DNA digests, using several different restriction enzymes and pa,AGP-2 as a probe, has only revealed a small number of DNA fragments containing a,AGP sequences. This suggests that there are only a small number of c(, AGP genes. As CNBr fragments III and IV are found in relatively reduced yield (Emura et al., 197 l), it seems likely that the different a,AGP genes may be expressed in different mounts, and will, there- fore, be represented with varying frequencies in cloned cDNA libraries. The identification of clones with sequences that differ from the one described here will be necessary to confirm the presence of multiple a, AGP genes.

Page 5: Molecular cloning and nucleotide sequence of human α1 acid glycoprotein cDNA

131

ACKNOWLEDGEMENTS

We thank Keith K. Stanley for the gift of the pEX expression vectors, George G. Brownlee for advice and laboratory space and the Medical Research Council for funding.

REFERENCES

Aviv, H. and Leder, P.: Purification of biologically active globin

messenger RNA by chromatography on oligothymidylic acid

cellulose. Proc. Natl. Acad. Sci. USA 69 (1972) 1408-1412.

Bennett, M. and Schmid, K.: Immunosuppression by human

plasma a,-acid glycoprotein: importance ofthe carbohydrate

moiety. Proc. Natl. Acad. Sci. USA 77 (1980) 6109-6113.

Casadaban, M.J. and Cohen, S.N.: Analysis of gene control

signals by DNA fusion and cloning in J&hen&u coli. J. Mol.

Biol. 138 (1980) 179-207.

Cheresh, D.A., Haynes, D.H. and Distasio, J.A.: Interaction of

an accute phase reactant, a, acid glycoprotein (oroso-

mucoid), with lymphoid cell surface: a model for non-specific

immune suppression. Immunology 5 1 (1984) 541-548.

Chirgwin, J.M., Przybyla, A.E., MacDonald, R.J. and Rutter,

W.J.: Isolation of biologically active ribonucleic acid from

sources enriched in ribonuclease. Biochemistry 18 (1979)

5294-5299.

Costello, M.J., Gewurz, H. and Siegel, J.N.: Inhibition of

neutrophil activation by a,-acid glycoprotein. Clin. Exp.

Immunol. 55 (1984) 465-472.

Crabtree, G.R. and Kant, J.A.: Molecular cloning of cDNA for

the a, j3, and ychains of rat fibrinogen. J. Biol. Chem. 256

(1981) 9718-9723.

Dente, L., Ciliberto, G. and Cortese, R.: Structure of human a,

acid glycoprotein gene: sequence homology with other human

acute phase protein genes. Nucl. Acids Res. 13 (1985)

3941-3952.

Emma, J., lkenaka, T., Collins, J.H. and Schmid, K.: The

constant and variable regions of the carboxy-terminal CNBr

fragment of a, acid glycoprotein. J. Biol. Chem. 246 (1971)

7821-7823.

Glisin, V., Crkvenjakov, R. and Byus, C.: Ribonucleic acid

isolated by cesium chloride centrifugation. Biochemistry 13

(1974) 2633-2637.

Ikenaka, T., Bammerlin, H., Kaufmann, H. and Schmid, K.: The

aminoterminal peptide of ai acid glycoprotein. J. Biol. Chem.

241 (1966) 5560-5563.

Ikenaka, T., Ishiguro, M., Emura, J., Kaufmann, H., Isemura, S.,

Bauer, W. and Schmid, K.: Isolation and partial characteri-

zation of the cyanogen bromide fragments of a, acid glyco-

protein and the elucidation of the amino acid sequence of the

carboxy-terminal cyanogen bromide fragment. Biochemis-

try 11 (1972) 3817-3829.

Johnson, A.M., Schmid, K., Alper, C.A. and Bissett, L.: Inheri-

tance of human a, acid glycoprotein (orosomucoid) variants.

J. Clin. Invest. 48 (1969) 2293-2299.

Maniatis, T., Fritsch, E. and Sambrook, J.: Molecular Cloning.

A Laboratory Manual. Cold Spring Harbor Laboratory, Cold

Spring Harbor, NY, 1982.

Maxam, A.M. and Gilbert, W.: Sequencing end labeled DNA

with base-specific chemical cleavages. Methods Enzymol. 65,

(1980) 499-560.

Ochi, Y., Ura, Y., Hamazu, M., Fujiyama, Y., Kajita, Y., lshida,

M., Miyazaki, T. and Tamura, K.: Immunochemical identiti-

cation of an a,-acid glycoprotein - antigenic determinant on

carcinoembryonic antigen (CEA) and non-specific cross-

reacting antigen (NCA). Clin. Chim. Acta 138 (1984) 9-19.

Proudfoot, N.J. and Brownlee, G.G.: 3’ Non-coding region

sequences in eukaryotic messenger RNA. Nature 263 (1976)

21 l-214.

Reid, K.B.M., Bentley, D.R. and Wood, K.J.: Cloning and

characterisation of the cDNA for the B-chain of normal

serum Clq. Phil. Trans. R. Sot. Lond. 306 (1984) 345-354.

Remaut, E., Tsao, H. and Fiers, W.: Improved plasmid vectors

with thermoinducible expression and temperature-regulated

runaway replication, Gene 22 (1983) 103-l 13.

Schmid, K., Kaufmann, H., Isemura, S., Bauer, F., Emura, J.,

Motoyama, T., Ishiguro, M. and Nanno, S.: Structure of a,

acid glycoprotein. The complete amino acid sequence,

multiple amino acid substitutions and homology with the

immunoglobulins. Biochemistry 12 (1973) 271 l-2724.

Stanley, K.K. and Luzio, J.P.: Construction of a new family of

high efficiency bacterial expression vectors: identification of

cDNA clones coding for human liver proteins. EMBO J. 3

(1984) 1429-1434.

Toh, H., Hayashida, H., Kikuno, R., Yasunaga, T. and Miyata,

T.: Sequence similarity between EGF receptor and a,-acid glycoprotein. Nature 3 14 (1985) 199.

Communicated by R.W. Davies.

NOTE ADDED IN PROOF

Since the submission of this manuscript the sequence of a,

AGP has also been published elsewhere (Dente et al., 1985, Nucl.

Acids Res. 13 3941-3952.