Molecular cloning and nucleotide sequence of human α1 acid glycoprotein cDNA
Transcript of Molecular cloning and nucleotide sequence of human α1 acid glycoprotein cDNA
Gene, 44 (1986) 127-131
Elsevier
GENE 1622
127
Molecular cloning and nucleotide sequence of human aI acid glycoprotein cDNA
(Recombinant DNA; pEX vectors; expression library; orosomucoid; antiserum; plasmid vector)
P.G. Board”*, I.M. Jones b and A.K. Bentley b
a The John Curtin School of Medical Research, Australian National University, Canberra A.C.T., 2601 (Australia) Tel. 624925.50. and’ Sir William Dunn School of Pathology, University of Oxford, South Parks Road, Oxford OX1 3RE (U.K.) Tel. 0865 5 732 1
(Received June 25th, 1985)
(Revision received February 20th, 1986)
(Accepted March 14th, 1986)
SUMMARY
A cDNA clone has been isolated from a pEX expression library that encodes aI acid glycoprotein. We present the complete nucleotide sequence encoding this protein and compare the derived amino acid sequence with pre-existing data.
INTRODUCTION
Human CI ,AGP, also known as orosomucoid, is a major acute-phase plasma protein. The primary structure of a,AGP has been determined and the protein has an M, of about 40000, of which 45% is carbohydrate (Schmid et al., 1973). Although a,AGP is thought to be a single chain polypeptide synthesised in the liver, recent immunological studies by Ochi et al. (1984) have suggested that it might be a component of CEA, a large glycoprotein found in many tumors.
* To whom correspondence and reprint requests should be
addressed.
Abbreviations: aa, amino acid(s); GL, AGP, a, acid glycoprotein
or orosomucoid; AMV, avian myeloblastosis virus; bp, base
pair(s); CEA, carcinoembryonic antigen; CNBr, cyanogen
bromide; EGF, epidermal growth factor; nt, nucleotide(s).
The level of a,AGP in the circulation becomes elevated during inflammation and although the role of this protein has not been clearly defined, there is evidence that a,AGP has some immunosuppressive functions (Bennett and Schmid, 1980; Cheresh et al., 1984).
Very little is known about the genetic control of human c(,AGP. The aa sequence studies of Schmid et al. (1973) have suggested that considerable heterogeneity may exist and Johnson et al. (1969) have reported a genetically determined electro- phoretic polymorphism of a,AGP. However, the molecular basis for this variation has yet to be deter- mined.
We have used the pEX expression vector system (Stanley and Luzio, 1984) to construct a human liver cDNA library and to identify clones expressing a,AGP antigenic determinants. A cDNA clone containing the complete coding sequence for the CI, AGP precursor polypeptide has been sequenced.
0378-l 119/86/$03.50 0 1986 Elsevier Science Publishers B.V. (Biomedical Division)
128
EXPERIMENTAL AND DISCUSSION
(a) Methods
RNA was extracted from human liver in guanidine thiocyanate and purified on a CsCl step gradient by methods previously described (Chirgwin et al., 1979; Glisin et al., 1974). Polyadenylated RNA was selected by oligo(dT) chromatography (Aviv and Leder, 1972). Double-stranded cDNA was synthe- sised using AMV reverse transcriptase and T4 DNA polymerase as described by Crabtree and Kant (1981).
All restriction digests, DNA ligations and plasmid transformations were done by standard methods (Maniatis et al., 1982).
The cDNA library prepared in the pEX vectors was used to transform Escherichiu coli MC1061 (Casadaban and Cohen, 1980), which contained the plasmid ~~1857 (Remaut et al., 1983) as the source of the temperature sensitive repressor. The antiserum used to detect cr,AGP antigenic determinants was a gift from Mr. T. Suzuki.
(b) Construction and screening of a human liver
cDNA fragment library
Double-stranded cDNA prepared from human liver mRNA was digested with Sau3A, ligated into an equimolar mixture of all three reading frame pEX vectors, and the mixture used to transform E. coli
MC1061 (~~1857). A library of approx. 30000 trans- formants was obtained and subsequently screened by the colony blot procedure of Stanley and Luzio
S StiSS HS IIIII II
(1984). Cross-reacting antigen was detected using a rabbit primary antiserum and a goat anti-rabbit IgG second antibody coupled to alkaline phosphatase. Several immunologically positive clones were de- tected and the clone with the largest cDNA insert (115 bp) was termed pa,AGP-1.
(c) Identification and structure of a full-length
cDNA clone
The insert from pa,AGP-1 was used to screen an additional human liver cDNA library known to contain cDNA inserts large enough to encompass the expected full length a,AGP cDNA (Reid et al., 1984). Approx. 50000 clones were screened with end-labeled insert and revealed approx. 500 positive colonies. This finding suggested that a,AGP mRNA was extremely abundant in the liver sample used for the preparation of this full-length cDNA library. A total of 15 positive clones were selected randomly, and further characterization indicated that they contained inserts ranging in size from 400 to 1160 bp. The plasmid with the largest insert was designated pa,AGP-2 and the insert was sequenced as shown in Fig. 1. The identity of the cDNA was confirmed by comparison of the translated cDNA sequence with the aa sequence published by Schmid et al. (1973). The insert in pa,AGP-2 was found to contain the entire protein coding region for a,AGP and included 140 bp of 3’ non-coding and 15 bp of 5’ non-coding sequence. In addition, at the 5’ end, the pa,AGP insert contained an inverted 407-bp repeat extending from nt 351 to the terminal A at nt 758. It is likely that this inverted repeat was
100 bp
Fig. 1. Structure and sequencing strategy of the cDNA insert in pa,AGP-2. The open box represents the coding sequence, the solid
line represents the non-coding regions and the hatched zones represent the vector. The open zone at the 5’ end represents the inverted
repeat of the sequence from nt 351 to 758. S, Sau3A; H, HaeIII; C, &I. Arrows represent the extent of each individual sequence
determination. Sequencing was done as described by Maxam and Gilbert (1980).
generated during the synthesis of the cDNA and does not reflect the structure of the mRNA. The 3’ non-coding sequence was found to contain the normal poly(A) addition signal AAUAAA (Proudfoot and Brownlee, 1976), 17 nt upstream from the poly(A) region (Fig. 2).
129
(d) Nucleotide sequence of pa,AGP-2 cDNA
There was excellent agreement between the aa sequence deduced from the cDNA nt sequence (Fig. 2) and one of the aa sequences previously published by Schmid et al. (1973). The sequence
n A L SWULTUL S L L P L CCTCCPGGTCTCAGTATGGCGCTGTCCTOGGGTT~TTACAGTCCTGAGCCTCCTACCT~TG
10 20 30 40 SO 60
L E A&I PLCANLV P v P I T N CI T
CTGGAAGCCCAGATCCCATTGTGTGCCAACCTAGTA~CGGTGC~CATCACCAACGCCACC 70 00 so 100 110 120
L D 0 I TGKWFYI A S A FRNEEY CTGGACCAGATCACTGGCAAGTGGTTT~ATATCGCATCGGCCTTTCGAAACGAGGAGTAC
130 140 150 160 170 180
N K S U_QE I QATFFYF TPNKTE AATAAGTCGGTTCAGGAGATCCAAGCAACCTTCTTTTACTTCACCC~CAA~AAGA~AGAG
190 200 210 220 230 240
D T IFLREYQ TRQDQCI Y N T T GACACGATCTTTCTCAGAGAGTACCAGACCCGACAGGACCAGTGCATCTATAACACCACC
250 260 270 200 290 300
YLNVQRENGT ISRYVGGQEH TACCTGAATGTCCAGCGGGAAAATGGGACCATCTCCAGATACGTGGGAGGCCAAGAGCAT
310 320 330 340 350 360
F A H L L ILRDTK T Y M L A F 0 V N TTCGCTCACTTGCTGATCCT~AGGGACA~~AAGACCTA~ATG~TTGCTTTTGACGTGAAC
370 380 390 400 410 420
D E K NWGLSUYADK PETTKEQ GATGAGAAGAACTGGGGGCTGTCTGTCTRTGCTGACAAOCCAGAGA~GACCAAGGAGCAA
430 440 450 460 470 480
L G EFYEALDCLRIPK S D U U Y CTGGGAGAGTTCTACGAAGCTCTCGACTGCTTGCGCGCATTCCCAAGT~AGATGTCGTGTAC
490 500 SlO szo 530 540
T D W K K DKCEPLE K 0 H E K&R K ACcGATTGGAAAAAGGATA~TGTGAGCCACTGGAGAAGCAGCACGAGAAGGAGAGGAAA
550 S60 570 580 590 600
QEEGES* CAGGAGGAGGGGGAATCCTAOCAGQACACAGCCTTGGATCAGGACAGAGACTTGGGGGCC
610 620 630 640 650 680
ATCCTGCCCCTCCAACCCGACATGTGTACCTCAGCTTTTTCCCTCACTTGCATCAATAAA 670 690 690 700 710 720
GCTTCTGTGTTTGGAACAAAAAAAAAAAAAAAAAAAA~ 730 740 790
Fig. 2. The nt sequence of the complete human a,AGP cDNA included in pa,AGP-2. The proposed leader sequence, and those
aa differing from the previously published aa sequences are underlined. Asterisk is the stop codon.
130
deduced from the cDNA differed in only four places. Ikenaka et al. (1966) reported that the N-terminal aa residue of plasma cr,AGP was pyro~utamic, ulti- mately derived from glutamic. In comparison, the cDNA sequence indicates that the pyroglutamic residue is derived from glutamine. The aa sequence published by Schmid et al. (1973) also contains a glutamic residue rather than a glutamine residue at position 42 and the deduced aa sequence contains an additional lysine at position 164 and an additional glutamic residue at position 175. The deduced aa sequence, therefore, contains 183 aa compared with 181 aa for the previously published sequence. Since the nearest initiation methionine codon lies 57 nt 5’ to the apparent N-terminal glutamine codon it is likely that the c(,AGP precursor contains a 19 aa leader sequence. As is commonly found, this pre- dicted leader peptide is rich in hydrophobic aa.
(e) Conclusions
The use of Sau3A fragments simplifies the con- struction of cDNA libraries as it eliminates the additional manipulations required to add synthetic DNA linkers or C + G tails. The major drawback of this procedure results from the small average size of the resulting cDNA inserts. In this case pee, AGP- 1, which generated a crossreacting fusion protein, contained the 69-bp fragment extending from nt 129 to 198 (see Fig. 2) plus further sequence which does not appear to be related to (x,AGP. It is most likely that the additional sequence was included by chance in the ligation procedure. Additional subclones in the pEX vectors, of Suu3A fragments from the insert in pct, AGP-2 have shown that the fragment extending from nt 246 to 637 can also generate a crossreacting fusion protein. From these results it appears that polyclonal antisera are capable of detecting protein sequences generated from small cDNA fragments and suggests that Sau3A fragment expression libraries may be a rapid and efficient means of initially identifying cDNA clones of interest.
A sequence homology at the aa level has been identified between a,AGP and the EGF receptor (Toh et al., 1985) but, as a result of second and third position codon changes, this homology is only poorly apparent when the cr,AGP cDNA sequence is compared with the EGF receptor nt sequence (not shown). The aa sequence of plasma CI ,AGP reported
by Schmid et al. (1973) contains alternate aa at 21 positions. The origin of this variation is not clear. Because of the observed homology between IgG and a,AGP it has been suggested that a,AGP may have constant and variable regions similar to those of the immunoglobulins (Emura et al., 197 1). Alternatively, if a genetic explanation is considered, it seems unlikely that the 21 substituted aa are the products of allelic variants at a single locus at frequencies that would allow their equivalent representation in cl,AGP purified from pooled plasma.
The interrelationship of some of the substitutions can be observed by examining the way the different
sequences were obtained. The aa sequence was originahy determined after the initial fractionation of four CNBr fragments. CNBr fragment I contained the N-terminal sequence and CNBr fragment II contained the C-terminal sequence, and both are in good agreement with the sequence deduced from the cDNA (Schmid et al., 1971; Ikenaka et al., 1972). In contrast, CNBr III and CNBr IV were found in lower yields and appeared to be derived from the N and C ends of CNBr fragment II but contained several aa substitutions (Emura et al., 1971). The fact that all the substitu~ons in the C-terminal portion of the c(,AGP sequence appear to be linked to the presence of a substituted methionine which results in the cleavage of CNBr II to give fragments III and IV makes it highly unlikely that the different sequences are the products of alleles but suggests that plasma cl,AGP may be the product of multiple gene loci that differ in several positions. Preliminary analysis of genomic DNA digests, using several different restriction enzymes and pa,AGP-2 as a probe, has only revealed a small number of DNA fragments containing a,AGP sequences. This suggests that there are only a small number of c(, AGP genes. As CNBr fragments III and IV are found in relatively reduced yield (Emura et al., 197 l), it seems likely that the different a,AGP genes may be expressed in different mounts, and will, there- fore, be represented with varying frequencies in cloned cDNA libraries. The identification of clones with sequences that differ from the one described here will be necessary to confirm the presence of multiple a, AGP genes.
131
ACKNOWLEDGEMENTS
We thank Keith K. Stanley for the gift of the pEX expression vectors, George G. Brownlee for advice and laboratory space and the Medical Research Council for funding.
REFERENCES
Aviv, H. and Leder, P.: Purification of biologically active globin
messenger RNA by chromatography on oligothymidylic acid
cellulose. Proc. Natl. Acad. Sci. USA 69 (1972) 1408-1412.
Bennett, M. and Schmid, K.: Immunosuppression by human
plasma a,-acid glycoprotein: importance ofthe carbohydrate
moiety. Proc. Natl. Acad. Sci. USA 77 (1980) 6109-6113.
Casadaban, M.J. and Cohen, S.N.: Analysis of gene control
signals by DNA fusion and cloning in J&hen&u coli. J. Mol.
Biol. 138 (1980) 179-207.
Cheresh, D.A., Haynes, D.H. and Distasio, J.A.: Interaction of
an accute phase reactant, a, acid glycoprotein (oroso-
mucoid), with lymphoid cell surface: a model for non-specific
immune suppression. Immunology 5 1 (1984) 541-548.
Chirgwin, J.M., Przybyla, A.E., MacDonald, R.J. and Rutter,
W.J.: Isolation of biologically active ribonucleic acid from
sources enriched in ribonuclease. Biochemistry 18 (1979)
5294-5299.
Costello, M.J., Gewurz, H. and Siegel, J.N.: Inhibition of
neutrophil activation by a,-acid glycoprotein. Clin. Exp.
Immunol. 55 (1984) 465-472.
Crabtree, G.R. and Kant, J.A.: Molecular cloning of cDNA for
the a, j3, and ychains of rat fibrinogen. J. Biol. Chem. 256
(1981) 9718-9723.
Dente, L., Ciliberto, G. and Cortese, R.: Structure of human a,
acid glycoprotein gene: sequence homology with other human
acute phase protein genes. Nucl. Acids Res. 13 (1985)
3941-3952.
Emma, J., lkenaka, T., Collins, J.H. and Schmid, K.: The
constant and variable regions of the carboxy-terminal CNBr
fragment of a, acid glycoprotein. J. Biol. Chem. 246 (1971)
7821-7823.
Glisin, V., Crkvenjakov, R. and Byus, C.: Ribonucleic acid
isolated by cesium chloride centrifugation. Biochemistry 13
(1974) 2633-2637.
Ikenaka, T., Bammerlin, H., Kaufmann, H. and Schmid, K.: The
aminoterminal peptide of ai acid glycoprotein. J. Biol. Chem.
241 (1966) 5560-5563.
Ikenaka, T., Ishiguro, M., Emura, J., Kaufmann, H., Isemura, S.,
Bauer, W. and Schmid, K.: Isolation and partial characteri-
zation of the cyanogen bromide fragments of a, acid glyco-
protein and the elucidation of the amino acid sequence of the
carboxy-terminal cyanogen bromide fragment. Biochemis-
try 11 (1972) 3817-3829.
Johnson, A.M., Schmid, K., Alper, C.A. and Bissett, L.: Inheri-
tance of human a, acid glycoprotein (orosomucoid) variants.
J. Clin. Invest. 48 (1969) 2293-2299.
Maniatis, T., Fritsch, E. and Sambrook, J.: Molecular Cloning.
A Laboratory Manual. Cold Spring Harbor Laboratory, Cold
Spring Harbor, NY, 1982.
Maxam, A.M. and Gilbert, W.: Sequencing end labeled DNA
with base-specific chemical cleavages. Methods Enzymol. 65,
(1980) 499-560.
Ochi, Y., Ura, Y., Hamazu, M., Fujiyama, Y., Kajita, Y., lshida,
M., Miyazaki, T. and Tamura, K.: Immunochemical identiti-
cation of an a,-acid glycoprotein - antigenic determinant on
carcinoembryonic antigen (CEA) and non-specific cross-
reacting antigen (NCA). Clin. Chim. Acta 138 (1984) 9-19.
Proudfoot, N.J. and Brownlee, G.G.: 3’ Non-coding region
sequences in eukaryotic messenger RNA. Nature 263 (1976)
21 l-214.
Reid, K.B.M., Bentley, D.R. and Wood, K.J.: Cloning and
characterisation of the cDNA for the B-chain of normal
serum Clq. Phil. Trans. R. Sot. Lond. 306 (1984) 345-354.
Remaut, E., Tsao, H. and Fiers, W.: Improved plasmid vectors
with thermoinducible expression and temperature-regulated
runaway replication, Gene 22 (1983) 103-l 13.
Schmid, K., Kaufmann, H., Isemura, S., Bauer, F., Emura, J.,
Motoyama, T., Ishiguro, M. and Nanno, S.: Structure of a,
acid glycoprotein. The complete amino acid sequence,
multiple amino acid substitutions and homology with the
immunoglobulins. Biochemistry 12 (1973) 271 l-2724.
Stanley, K.K. and Luzio, J.P.: Construction of a new family of
high efficiency bacterial expression vectors: identification of
cDNA clones coding for human liver proteins. EMBO J. 3
(1984) 1429-1434.
Toh, H., Hayashida, H., Kikuno, R., Yasunaga, T. and Miyata,
T.: Sequence similarity between EGF receptor and a,-acid glycoprotein. Nature 3 14 (1985) 199.
Communicated by R.W. Davies.
NOTE ADDED IN PROOF
Since the submission of this manuscript the sequence of a,
AGP has also been published elsewhere (Dente et al., 1985, Nucl.
Acids Res. 13 3941-3952.