Molecular cloning of a cDNA for human pregnancy-specific β1-glycoprotein: homology with human...
Transcript of Molecular cloning of a cDNA for human pregnancy-specific β1-glycoprotein: homology with human...
Gene, 71 (1988) 439-449
Elsevier 439
GEN 02699
Molecular cloning of a cDNA for human pregnancy-specifk jl-glycoprotein: homology with human car~inoembryo~c antigen and related proteins
(Placental proteins; carcinogenesis; trophoblastic tumors; Iphage vectors; ~~oglobu~ domains, nucle- otide sequence)
Bernadette C. Rooney *b, C.H. Wilson Horue’ and Norman Hardmanhb
a Debarment of Bi~hern~~. Universes of Aberdeen, Ma~cha~ College, Aberdeen AB9 IAS (U.K.) Tel. 0224 742848, ’ ~n~ver~.~ Department of Patho~5~, Royal Victor& ~n~~a~, ~ewcast~e-~~~-T~ne NE1 4LP (U.K.) Tel. 0912328511 and ‘Department of Molecular Biology, Biotechnology Section, CIBA-GEIGY AG, CH-4002 Base1 (Switzerland)
Received 8 June 1988 Accepted 12 July 1988 Received by publisher I.5 August 1988
SUMMARY
Human pregnancy-specific fll-glycoprotein (SP,) plays an essential role in normal pregnancy. It is also a well-characterized oncodevelopmental antigen, expressed aberrantly by all trophoblastic tumors and some other malignant cell types. Here we report the ident~cation of a human placental cDNA encoding the SP, polypeptide sequence. The coding sequence shows 95 % identity at the nucleotide level with a distinct, recently published SP, cDNA sequence (PSG16). Unexpectedly, the sequence is also highly homologous to the published sequence of human carcinoembryonic antigen (CEA). SP,, CEA and CEA-related nonspecific cross-reacting species thus belong to a group of closely related though antigenically diverse tumor-associated glycoproteins. Comparison of the deduced amino acid sequence of the SP, cDNA with that of CEA provides insight into the modular nature of these related proteins. This may have implications for the genomic organi- zation and evolution of the CEA gene family.
INTRODUCIION
Human pregnancy-specific glycoprotein, referred to by different groups as SP,, PS/IG or PAPP-C (Bohn, 1971; Tatarinov and Mas~a~ch, 1970; Lin et al., 1974) is detectable in maternal serum as
early as seven days after conception, and its level increases in an almost linear manner thereafter until
the 36th week of pregnancy when its serum concen- tration ranges between 95-3 15 &ml (G~dz~skas et al., 1977). The serum half-life of SP, is approx. 30 h, based on the rate of its disappearance from
Correspondence to: Dr. N. Hardman, Biotechnology K-681.4.43, CIBA-GEIGY AG, CH-4002 Base1 (Switzerland) Tel. 0616967061.
specific CEA cross-reacting antigen; N-CAM, neural cell adhesion molecule; nt, nucleotide(s); oligo, oligodeoxynucle- otide; ORF, open reading frame; PAAP-C. PSG and SP,, pregnancy specific /Xl-glycoprotein; PAGE, polyacrylamide gel
Abbreviations: aa, amino acid(s); bp, base pair(s); CEA. car- elec~ophoresis; SDS, sodium dodecyt sulfate; SP,, see cinoembryonic antigen; kb, kilobase or 1000 bp; NCA, non- I~ODU~ON; SSC, 150 mM NaCl, Na, ‘ citrate, pH 6.8.
0378-l 119/S8/$03.50 0 1988 Elsevier Science Publishers B.V. (Biomedical Division)
440
maternal blood following delivery (Towler et al., 1976; Dati et al., 1982) While the function of SP, in normal pregnancy is not known, it is thought to be essential since low serum levels are associated with threatening abortion (Jandial et al., 1978). The pro- tein is generally supposed to be a product of the placenta (Sorensen, 1984; Home et al., 1976b), from which it can be extracted and purified (Inaba et al., 1981). Placental SP, was previously shown to be a 80-90-kDa glycoprotein, about 30% of which is accounted for by carbohydrate (Bohn, 1971) N-linked to Asn residues (Engvall, 1980). This mate- rial is now referred to as SP,-/3 following the dis-
covery of a high-M, variant form (SP,-g 430 kDa) in late-prelacy serum (Teisner et al., 1978). Additional SP,-related antigens of various 1M,s have also been reported, such as a urinary form and the species ~~-1, ~~-2, and y (reviewed by Sorensen, 1984).
SP, is also a product of trophoblastic tumors and has been used extensively as a tumor marker in this form of rn~i~~t disease (Tat~nov et al., 1974; Seppala et al., 1978; Lee et al., 1982; Than et al., 1982). SP, is expressed by some non-trophoblastic tumors, although its diagnostic value as a marker for these other forms of malignancy remains uncertain. Conflicting results on the expression of SP, may have arisen because of different assay methods, since in some cases SP,-related antigens may be expressed but not secreted into the blood (Sorensen, 1984). SP, expression, determined by immunohistochemi- cal staining of tumor biopsy material in patients with breast cancer, correlates with poor prognosis (Kuhajda et al., 1984; Home et al., 1976a).
The present study reports the molecular cloning of the cDNA for SP, from a library of sequences derived from human placental mRNA. The cloned SP,-cDNA nucleotide sequence includes a single, long ORF coding for a polypeptide of 428 aa (M, 54543), with several potential sites for N-linked glycosylation. Comparison with available protein sequences shows that the predicted SP, polypeptide is unique, though it is closely related to a second recently published SP, sequence (Wat~abe and Chou, 1988). In addition, the sequence shows con- siderable homology (50% amino acid sequence identity) with that of human CEA and nonspecific cross-reacting antigens, NCA (Paxton et al., 1987; Zimmerman et al., 1987; Beauchemin et al., 1987;
Oikawa et al., 1987). Implications of these observa- tions for the structure of the genes for SP,, CEA and related proteins are discussed.
MATERIALS AND METHODS
(a) Amino acid sequence analysis of tryptic frag- ments, synthesis of probe oiigod~xynucIeotides and sequence analysis of SP, cDNA
Purified placental SP,-/? was kindly provided by Dr. Bohn, Behringwerke AG, Marburg (F.R.G.). The purity of the protein was confirmed by SDS-PAGE, Western blottingus~grabbit anti-SP, (Dakko) and by amino acid composition analysis (Rooney, 1987).
SP,-/I (2 mg) was subjected to reduction and car- boxymethylation, followed by trypsinization using methods described by Russell et al. (1986). Tryptic peptides were separated on a Waters Associates ~Bondap~ Column (30 cmx 3.4 mm) using 0.1% trifluoroacetic acid as aqueous phase and 100% propan-2-01 as solvent phase. High-pressure liquid chromatography was carried out over a 90-min period using a O-40% propan-2-01 gradient and a flow rate of 1.5 ml/min. Amino acid sequencing of purified SP, tryptic peptides was carried out by auto- mated Edman degradation with an Applied Bio- systems Model 470A gas-phase sequencer (SERC Protein Sequencing Facility, University of Aberdeen) using methods described elsewhere (Russell et al., 1986).
Based on amino acid sequence data of tryptic peptides, two mixed oligo probes were synthesized using the procedure described by Rink et al. (1984):
Probe 1: 5’-(~/T~~(G/A)TA~CA(T/A/G)AT- (GIA)TA(T/A/G/C)CC-3’
Probe 2: 5’-CCA(T/A/G)AT(G/A)TA(T/A/G/- C)GT(A/G)TA(G/A)TT-3’
The predicted melting temperature of probe 1 was 45 5 o C and of probe 2,43 Q C (assuming the lowest possible G -t C contents). The probes were used for hyb~dization screening (Benton and Davis, 1977) of a human placental cDNA library in ,?@I (HL1008, Clontech Laboratories Inc., Palo Alto, CA) after plating phage and transferring plaques as described by Huynh et al. (1985). Hybridization and washing was performed at 37°C in 6 x SSC. Positive plaques
441
were plaque-purified and re-screened as described above. Phage DNA was isolated as described by Blattner et al. (1978).
Nucleotide sequence analysis of the pSP,-i cDNA insert was carried out by the M 13 cloning/sequencing procedure (Sanger et al., 1977) using sequencing kits (Amersham International, U.K., Cat. No. 4502).
Computer manipulations and comparisons of nucleotide sequences were carried out using the University of Wisconsin Genetics Computer Group program package (Devereux and Haeberli, 1986).
(b) Southern- and Nothern-blot analysis of human
placental DNA and RNA
Samples of human placental DNA (5 pg) were digested to completion and electrophoresed on 0.5 % agarose gels in EDTA/Tris * acetate buffer using standard procedures. DNA blotting was performed using an adaptation of the Southern (1975) proce- dure as described by Wahl et al. (1979). The radio- actively labeled pSP,-i DNA insert was labeled by nick-translation (Rigby et al., 1977). Final washes of the filters were performed using 1 x SSC at 68°C.
For Northern-blot analysis, placental RNA was prepared using the procedure described by Ulhich et al. (1977). RNA (3 pg) was fractionated in 1 y0 formaldehyde - 1% agarosegels, transferred to nitro- cellulose membrane and probed using nick-trans- lated pSP,-i insert DNA prepared as described above. Methods used are those described by Maniatis et al. (1982).
RESULTSAND DISCUSSION
(a) Identification of a human placental SP, cDNA
The polypeptide sequences of tryptic fragments of purified SP, protein were used to predict the nucle- otide sequences of two potentially useful oligo probes for hybridization selection of candidate SP,-cDNA clones from a &t 11 human placental cDNA (oligo- 1
and oligo-2, described in MATERIALS AND
METHODS, section a). From 1 x lo4 independent plaques screened by hybridization with a mixture of oligo 1 + 2, 135 positively hybridizing phage were identified. Ten were picked at random and re-
screened with oligo-1 and oligo-2. One phage failed to rehybridize, and two other phage hybridized only to oligo-1 and had small cDNA inserts (250 and 500 bp). The remaining recombinants hybridized with both probes and contained single cDNA inserts ranging in size from 1000-2000 bp. DNA from one recombinant phage containing a cDNA of 2 kb was recovered from the vector by cleavage with EcoRI, and cloned into the plasmid vector pUC12. The recombinant plasmid was designated pSP,-i. A restriction map of the insert DNA fragment of pSP,-i
was constructed and the information used to generate appropriate overlapping DNA segments by restriction enzyme cleavage, which were subcloned into Ml3 vectors for nucleotide sequence analysis. The nucleotide sequence of the 2-kb EcoRI cDNA insert of pSP,-i is shown in Fig. 1.
The cDNA insert of pSP,-i contains a single, long ORF. Three in-frame methionine residues are located in the N-terminal region of the predicted amino acid sequence. Of these, the most 3’ ATG (located at nt 238) was chosen as being the most likely start codon based on Kozak (1984) rules. Comparison of the cDNA sequence with that of CEA similarly predicts the use of this translational start site (see RESULTS, section b). The sequence between nt 238-1520 encodes a 428-aa polypeptide with J4, of 54543 and an aa composition similar to placental SP, (Rooney, 1987).
(b) Predicted amino acid sequence ofSP, and com-
parison with related proteins
A search through the protein sequence data banks showed that the predicted sequence of SP, is unique. However, the cDNA sequence shows extensive homology with a distinct, recently published 1.9-kb SP, cDNA sequence (PSG16; Watanabe and Chou, 1988). PSGl6, and a second cDNA clone, PSG93 (2.1 kb), encode virtually identical SP,-polypeptides of 46.9 and 47.2 kDa, which appear to differ only in their C-terminal amino acids because of an 86-bp in- sertion in the 3’ -end ofthe coding sequence of PSG93 (Watanabe and Chou, 1988). Additional 5’-noncod- ing sequence in PSG93 accounts for the remaining size difference between these two cDNA species. The coding sequence of pSP,-i shows 94.7% identity with that of PSG16 at the nucleotide level (Fig. 1). Sequence differences are distributed along the entire
SPl
SPl
SPl
SPl
PSG
SPl
PSG
SPl
PSG
SPI
PSG
SPl
PSG
';Pl
PSG
SP1
PSG
SP1
PSG
se1
PSG
SPl
PSG
>r1
PSG
SP1
PSG
SP1
PSG
AAGCCACACGCCCCTTTTGCTTAGGAGGCCTCTCTGCTGGAGGATGACGATGGCXTCTT
TA
TC
TA
AG
GC
CA
CT
GA
CA
AG
TC
AT
CA
AT
AT
AG
GA
CA
GC
AC
AG
CT
GA
GA
GC
~C
TC
AG
G
Ple
t A
AG
TT
TC
TG
GA
TC
CT
AG
GC
TC
AG
CT
CC
AC
AG
AG
GA
GA
AC
AC
GC
AC
GC
AG
GC
AG
CA
GA
GA
CC
AT
G
. GlyProLeuSerAlaProProCysThrGlnArgIleThrTcpLy~GlyLeuLeuLeuTh~
GGGCCCCTCTCAGCCCCTCCCTGCACACAGCGCATCACCTGG~GGGGCTCCTGCTCACA
AA
AA
A
Thr
LYS v
AlaSerLe"LeuAsnPheTrpAsnProP~"ThrThrAlaGlnValTh~IleGl"AlaGlu
GCATCACTTTTAAACTTCTGGAACCCCGCCACTGCCCGCCGAG
T
C
C
LIZ
” GlIl
ProThrLysValSerLysGlyLysAspValLeuLeuLeuValHisAsnLeuProGlnAsn
CCAACCAAAGTTTCCAAGGGGRAGGACGTTCTTCTACGTTCTTCTACTTGTCCAC~TTTGCCCCAG~T
C
G
T
PZO
Glu --
7
LeuAl
~l~T~r~l~T~pTyrL~GlyGlnMetLysAspLe"TyrSisTy~IleTh~S~~
CTTGC
GGCTACATCTGGTAC
GGGCAAATGAAGGACCTCTACCATTACATTACATCA
AC
G
Thr
AMY
TyrVllValAspGlyGlnIleIleIleTyrGlyProAlaTyrSe~GlyA~yGl"ThrVal
TACGTAGTAGATGGTCAAATAATTATATATATGGGCCTGCATACAGTGGACGAG-CAGTA
T
C
G
T
C
Glu
Ala
I I
I I
TyrSe~ASnAlaSeKLeuLeuIleGlnAsnValThrAryGl"AspAlaGlySeCTyCThr
TATTCCAATGCATCCCTGCTGATCCAGAATGTCACCCCGGGAGGACGCAGGATCCTACACC
Leul~isIleValLysAryG1yAspGlyThrArgGlyGlyGluThcGlyHisPheTh~PheTh~
TTACACATCGTAAAGCGAGGTGATGGGACTAGAGGAGAAACTGGACATTTCACCTTCACC
A
G
A
T
G
1le
GlyAsp
va1
AC9
Le"TyrLeuGluThrProLysProSerIleSerSerSerAsnLe"TyrP~oAryGluA~p
TTATACCTGGAGACTCCCARGCCCTCCATCTCCAGCAGCACTTATACCCCAGGGAGGAC
C
T
AT
AC
His
As"
Thr
MetGl"Al~V~lSe~Le"Th~C~sAspProGl"ThrProAspAl~Se~Ty~Le"T~pT~p
ATGGAGGCTGTGAGCTTAACCTGTGATCCTGAGACTCCGGACGCAAGCTACCTGTGGTGG
C
A
MetAsnGlyGlnSerLeuProMetThrHisSerLeuGlnLeuSerLysAsnLysAryTh~
ATGAATGGTCAGAGCCTCCCTATGACTCACAGCTTGCAGTTGTCC
AAAAAC-GGACC
A
C
G
C
C
LYS
GluThrAsn
. LeuPheLeuPheGlyValThrLysTyrThrAlaGlyProTyrGl"CysGl"IleAryAsn
CTCTTTCTATTTGGTGTCACAAAGTACACACTGCAGGACCCTATG~TGTGARATACGGAAC
G
'T
Le"
ProValSerAlaSerArgSerAspP~oValThrLe"AsnLe"Le"ProLysLeuP~OLy~
CCAGTGAGTGCCAGCCGCAGTGACCCAGTCACCCTGAATCG
60
120
18"
SPl
PSG
240
SPl
PSG
300
SPl
PSG
360
SPl
PSG
420
SPl
PSG
480
SPl
PSG
540
SPl
PSG
600
SPl
660
PSG
SPl
720
PSG
SPl
780
SPl
SPl
840
SPl
SPl
900
SPl
SP1
SPl
960
SPl
ProTy~IleThrIleAsnAsnLe"AsnProArgGluAsnLysAspValLeuAlaPheThr
CCCTACATCACCATCAACAACTTAAACCCCAGGGAGAATAAGGATGTCTTAGCCTTCACC
AA
Asn
. CysGluProLysSerGl
As"TyrThrTyrIleTrp~rpLe"As"GlyGlnSerLe"Pro
r----
TGTGRACCTRAGAGTGAdAACTACACCTACATTTGG~TGGCT-TGGTCAGAGCCTCCCG
~________~
ValSerProArgValLysArgPrOIleGl"AsnA~gIleLeUIleLeUP~OSe?ValThI
GTCAGTCCCAGGGTAAAGCGACCCATTG-CAGGATCCTCATTCTACCCAGTGTCACG
A
AqAsnGluTh
GlyProTyrGlnC~sGl"IleGl"AspArgTyrGlyGlyIleArySer
AGAAATGAAACAGGACCCTATCAATGTGTG-TACAGGACCGATATGGTGGCATCCGCAGT
G
G
AC9
va1
TyrProValThrLeuAsnValLeuTyrGlyProAspLeuP~oA~gIleTyrP~oSe~Phe
TACCCAGTCRCCCTGAATGTCCTCTATGGTCCAGACCTCCCCAG~TTTACCCTTCATTC
G
Asp
ThrTyrTyrHisSerGlyGluAsnLeuTyrLeuSe~C~sPheAl~AspSe~A~nP~~P~~
ACCTATTACCATTCAGGAG-CCTCTACTTGTCCTGCTTCGCGGACTCT~CCCACCA
'G
GT
T CT
G
AMY
va1
Ser
AlaGluTyrSerTrpThrIleAsnGlyLysPheGlnLeuSerGlyGlnLysLeUPheIle
GCAGAATATTCTTGGACRATTARTGGGG~GTTTCAGCTATCAGGAC-GCTCTTTATC
CG
A
A
C
Gln
Glu
Pro
G
T
T
Ary
His
Val
A
G
AA
LYS
Glu
Glu
LeuProGlyLeuAsnProLeuEnd
CTTCCTGGCCTTAATCCATTATAGCAGCCGTCATTGACTG
GC
AG
AC
AG
TT
GC
TT
TC
AT
TC
TT
CC
TC
-GT
AC
CA
TT
TG
C
TTTTTGTTCAAGGAGATTTATG-GACAAGGAGTTCCTG
ATAACTTCAAGATCATACATGGACTAAGAACTTTCAAAATCAGGCTGATAC
TTCATGAAATTCAAGACAAAGAAAAAAA
CCCAATTTTATTGGACTAAATAGTC-CAA
TGTTTTCATAATTTTCTATTTG-TGTGCTTTGAT
TTATGCACTTTTTTTCTTCAGCAATTGGTARRGTATACTT
TTGAAAC
ATTTGCTTTTGCTCCCTRAGTGCCCCAGRATTGGGG-CTATTCAGGAGTATTCATATGT
TTATGGTWGTTATCTGCACAAACCCGAATTC
1020
lOS0
1140
1200
1260
1320
1380
1440
1500
1560
1620
1680
1740
1800
1860
1920
1980
2016
Fig.
1.
Nuc
leot
ide
sequ
ence
of
the
cD
NA
in
sert
of
pSP
,-i.
The
se
quen
ce
is t
erm
inat
ed
byE
coR
1 cl
onin
g si
tes
whi
ch
link
the
cDN
A
to t
he
lgtl
1 cl
onin
g ve
ctor
. T
wo
in-f
ram
e A
TG
s,
loca
ted
5’ t
o th
e su
ppos
ed
tran
slat
iona
l st
art
codo
n an
d a
pote
ntia
l po
lyad
enyl
atio
n si
te
(AA
TA
AA
) ar
e bo
xed.
D
ots
and
brac
kets
, re
spec
tivel
y,
show
th
e lo
catio
n of
the
se
ven
Cys
re
sidu
es
and
four
co
nsen
sus
site
s fo
r N
-lin
ked
glyc
osyl
atio
n.
The
se
quen
ces
of S
P,
tryp
tic
pept
ides
ar
e un
derl
ined
an
d th
e po
sitio
ns
ofol
igo-
1 an
d ol
igo-
2 ho
mol
ogou
s se
quen
ces
indi
cate
d by
das
hed-
line
boxe
s.
As
expe
cted
fo
r tr
yptic
pe
ptid
es
each
se
quen
ce
is p
rece
ded
eith
er
by
Lys
or
A
rg.
The
do
wnw
ard
arro
whe
ad
indi
cate
s th
e po
ssib
le
clea
vage
si
te
for
rem
oval
of
a l
eade
r pe
ptid
e
base
d on
st
ruct
ural
an
alog
y w
ith
CE
A
and
NC
A
(see
Fi
g.
2 an
d R
ESU
LT
S A
ND
D
ISC
USS
ION
, se
ctio
n b)
. O
nly
the
codi
ng
segm
ent
is c
ompa
red
with
th
at
of P
SG16
(W
atan
abe
and
Cho
u,
1988
) an
d fo
r cl
arity
id
entic
al
nucl
eotid
es/a
min
o ar
e no
t in
dica
ted.
H
omol
ogie
s in
the
5’
- an
d 3’
-non
cocl
mg
regi
ons
are
muc
h le
ss
sign
ific
ant
(aro
und
40%
) an
d ar
e no
t sh
own
here
.
length of the coding regions, with marked divergence at their C-termini. Amino acid changes in SP,-& coded protein lead to loss of three of the seven potential sites of N-linked glycosylation compared with the PSGl6 pol~tide (Watanabe and Chou, 1988), but potentially critical Cys residues are con- served. PSG peptides sequenced by Watanabe and Chou (1988) contain some residues that are at variance with the predicted cDNA sequence deter- mined here (Val-393 ; Lys401; Glu-412). Peptides sequenced in the present study correspond with the cDNA sequence shown, but differ from the polypeptide sequence predicted from PSGI6 at three locations: Ala-29 reads Thr-29; Val-67 reads Ala-67; and Asp-127 reads Thr-127, all for PSG16. Re-examination of our data raises the possi- bility of an Asp/Thr mixed sequence at aa position 127. These conflicts may reflect differences in abun- dance of protein isoforms in the separate prepara- tions, or differential recovery of peptides, or both. In contrast to the coding regions, the 5’ and 3’ non- coding segments of the two cDNAs show little homology; 42% and 40%, respectively, using the alignment program ‘GAP’. These results show that the pSP,-i and PSG16 mRNAs are almost certainly derived from distinct SP, genes.
The predicted SP,-i-coded polypeptide sequence shows a hitherto undetected significant homology with the published sequence of human carcino- embryonic protein and related tumor antigens (Paxton et al., 1987; Zimmerman et al., 1987; Beauchemin et al., 1987; Oikawa et al., 1987). The
sequence relationship between SP,, CEA and NCA is illustrated in Fig. 2. The methionine at nt position 238 marks the beginning of a 202-aa region of amino acid sequence homology with the N-terminal seg- ment of CEA. The N-terminal sequence of NCA is also shown, extending up to the limit of the published data (residue 109; Paxton et al., 1987). The partial nucleotide sequence of a genomic NCA gene (Thompson et al., 1987) extends the region of homology with CEA and SP,, to include an addi- tional N-te~inal 12 aa starting with serine, after which the genomic NCA sequence shows no obvious homology with SP, or CEA. This is thought to be the position where an intron interrupts the coding region of the NCA gene (Thompson et al., 1987). Taken together, this information suggests that the N-termi- nal 34 aa of SP, encodes a leader peptide similar to
443
that of CEA and NCA which is probably located on at least two exons. The protein is predicted to be cleaved to produce a mature polypeptide with an N-terminal glutamine (amino acid residue 1, Fig. 2). We were unable to obtain the N-terminal sequence of the mature SP, protein to confirm this possibility. From these data SP,, CEA and NCA are approx. 50% homologous at the amino acid sequence level.
SP, and CEA rn~t~ a similar level of homology in their C-terminal domains (domain C, Fig. 2) extending from aa 203 in SP, to the end of the coding sequence. A gap of 14 aa residues occurs in the homology between SP, and CEA mediately before the C-terminus of the two proteins, and homology resumes for the remaining 9 C-terminal aa. This is included within the hydrophobic terminal 27-aa seg- ment of CEA that is believed to anchor the protein to the plasma membrane (Zags et al., 1987; Beauchemin et al., 1987). The BESTFIT nucleotide sequence homology between the SP, and CEA cDNAs shows that the coding sequences terminate at equivalent nt positions, and that a similar level of homology (approx. 80% using the GAP program) is maintained in both the 5’-and the 3’noncoding seg- ments of the two cDNAs, up to the point at which the one for CEA contains an MuI-family repeated sequence (Zimmerman et al., 1987). The repeated element presumably inserted into the 3’-untranslated region of the CEA gene after it diverged from that of SP,. It is intriguing that, although the predicted coding sequence of pSP,-i is closely related to that of PSG16, the noncoding segments of pSP,-i are
much more homologous to those of the CEA cDNA. This underlines the close evolutions relationship between the SP, and CEA family of genes.
An additional interesting feature of the SP, protein sequence is the presence of internal repeats in the C-terminal domain (Figs. 2 and 3; segments la, 2a and 2b). Similar repeats have been described previ- ously in the CEA polypeptide sequence (Zimmerman et al., 1987; Beauchemin et al., 1987). These regions are highlighted in Fig. 2 and shown schematically in Fig. 3. The C-terminal portion of CEA contains three related repeats of 178 aa residues (l-3; Fig. 2 and Beauchemin et al., 1987). Based on a compari- son with SP,, each of these domains can be sub- divided further into two segments (a and b; Fig. 2). Segments a and b each contain two conserved Cys residues which flank regions of additional internal
444
/aa -34 -34 -13
SPl: CEA: NCA:
MGPLSAPPCT QRITWKGLLL TASLLNFWNP PTTA *ESp****HR WC*P*QR*** *****T”*** **k*
*X*T**?r* tk***
1 N
1 -
C
SPl: CEA: NCA:
:: 1
QVTIEA EPTKVSKGKD KL***S T*FN*AE**E KL***S T*FN*AE**E
SPl: CJZA: NCA:
17
::
VLLLVHNLPQ NLAGYIWYKG QMKDLYHYIT SYWDGQIII YGPAYSGRET ****ttk**** H*F*"S"**" ERV”GNRQ”I G**IGT*QAT p********I ****A****” *RI**S***G EBV*GNSL*V G**IGT*QAT pi****+***
SPl: CEA: NCA:
ii; 65
VYSNASLLIQ NVTREDAGSY TLHIVKRGDG TRGETGHFTF TLY I*p*k***** ***QN*T*Fk “*“VI*SDLV f-.JEEA**Q--* RVX
I”p*****“* ***QN*T*F* **QVI*SDLV NEEA**Q--* HV”
la- LETPKPS p*L****
v la- ISSSNLYPRE DMEAVSLTCD PETPDASYLW WMNGQSLPMT HSLQLSKNKR +**N*SK"v" *KD*IAF**E ***Q**T*** *V*N?r***VS PR****NGN”
v TLFLFGVTKY TAGPYECEIR NPVSASRSDP VTLNLL **T**N**RN DT’S*K**TQ ““““*R*“*S *I**V*YGPD APTISPLNTS
lb- v
YRSGENLNLS CHAASNPPAQ YSWFVNGTFQ QSTQELFIPN ITVNNSGSYT
SPl: 110 CEA: 108
SPl: 117 CEA: 115
SPl: 167 CEA: 165
SPl: CEA: 215
SPl: CEA:
IQNTTYLWWV NNQSLPVSPR LQLSNDNRTL TLLSVTRNDV GPYEEGIQNE
265
SPl: CEA: 315
SPl: CEA: LSVDH~DPVI LNVLYGPDDP TISPSYTYYR PG~LSLSJH AASNPPAQYS
2b- WLIDGNIQQH TQELFISN~T EKNSGLYT:Q ANNSAS SR T~KTI~S
%%- KPYITINNLN AE** L*S*SS**SK 3a-
PRENKDVLAF TEEPKSENYT YIWWLNGQSL PVSPRVKRPI ENRILILPSV *v*D*AAv** ****EAQ+T& *L**Vk*““” *****LQLSN G**T*T*FN*
TRNETGPYQ: EIQDRYGGIR SYPVTLNVLY GPDLPRIYPS FTYYHSGENL 2b-
***DA~"v+ G**NSvSAN* +D*?r**D*** **fcT*I*S"p DSS"L**A** 3b -
YLS&?ADSNP PAEYSWTING KFQLSGQKLF IPQITTKHSG LYA:SVRNSA N***HSA*** SPQ***R*** IP*QHT*V*F *AK**PNNN* T***F*S*L*
365
SPl: CEA: 415
SPl: 203 CEA: 464
SPl: 217 CEA: 478
SPl: 267 CEA: 528
SPl: 317 cm: 578
SPl: 367 TGMESSKSMT VKVSAPSGT- ---------- ---GHLpGLN PL
CEA: 628 **RNN*IVKS IT***-***S PGLSAGATVG IMI*V*V*VA LI
Fig. 2. Comparison of the deduced SP, amino acid sequence with those of human CEA and NCA. The predicted SP, amino acid sequence from Fig. 1 is aligned with those of CEA and NCA, and aa positions numbered according CO N termini of the mature CEA and NCA polypeptides (Paxton et al., 1977). The CEA-coding sequence was derived from the cDNA sequence reported by Beauchemin et al. (1987). Met residues (aa position -34) are those located at the start of the SP, and CEA ORFs. The SP, ammo acid sequence
445
a. chemically similar or identical. Regions surrounding the conserved Cys residues in the repeated segments
r--N , C I show some similarity with the repeating units of the CEA 1 la 1 lb 1 2a 1 2b 1 3a I 3b b
I 14
LA neural cell adhesion molecule N-CAM (Hemperly et al., 1986). A 35 aa segment of the N-domain of
i SP, (aa residues 34-69) is weakly related to the human vitronectin amino acid sequence (aa residues
1 la 2a 2b 1ooaa 239-254; Suzuki et al., 1985), and two segments of
-NLC- the C-terminal domain encompassing virtually the b.
sp ?XJJ
1 entire SP,-la and lb regions (aa residues 120-203
V and 2 13-296) can both be aligned with one of sixteen 90-aa-long Type-III cell/DNA/heparin binding
la 2a 2b domains of human fibronectin (nt residues 1215-1298; Kornblihtt et al., 1985). The possible
Fig. 3. Structure of SP, and CEA. (a) Diagramatic representa- significance of these homologies has not been tion of the homologies between SP, and CEA polypeptide investigated further. sequences shown in Fig. 2. M represents the supposed C-ter- minal segment of CEA not present in the SP, sequence, and L represents the N-terminal 34-aa leader peptide segment of SP, (d) Southern- and Northern-blot analysis of human
and CEA which is absent from the mature CEA and NCA placental DNA and RNA polypeptides. The segments la, lb etc. are referred to in Fig. 2. (b) Schematical drawing showing the possible arrangement of Human placental DNA, digested to completion the SP, structural domains, according to the scheme devised for CEA by Beauchemin et al. (1987). Each includes two Cys
using a variety of restriction endonucleases, was
residues (closed circles). The remaining Cys in SP, is predicted blotted using the Southern procedure and hybridized
to be lost from the mature protein on removal (arrowhead) of the with the 2016-bp pSP,-i cDNA insert as probe. The leader peptide. Open circles denote potential sites of N-linked result is shown in Fig. 4a. The complex hybridization glycosylation. pattern observed is similar to that obtained for an
NCA gene, a member of the CEA gene family (Thompson et al., 1987). The result suggests that the
sequence homology and contain a number of spe- SP, mRNA is transcribed from an extremely com-
cifically placed amino acid residues observed plex genetic locus and/or that its sequences are part
in the immunoglobulin-like protein superfamily of a multigene family. It is not known whether SP,
(L----FT--D-G-Y-C; Hunkapiller and Hood, nucleotide sequences cross-hybridize with those of
1976; Beauchemin et al., 1987). Interestingly, the CEA or NCA, but the high level (72% with CEA; see
segment lb-2a-2b of CEA has no counterpart in the RESULTS AND DISCUSSION, section b) of homology
SP, polypeptide sequence. suggests that this is likely. Cloning of these genomic DNA fragments will be necessary to determine the
(c) Comparison of SP, with other known protein structural organization of, and relationship between, sequences this family of genes.
Northern-blot analysis of total human placental
Only weak, but possibly interesting homologies RNA was performed using the pSP,-i cDNA insert.
were detected using computer-assisted searches tak- The result is shown in Fig. 4b. Watanabe and Chou
ing into account the presence of amino acids that are (1988) detected two mRNAs of approx. 1.7 and
is divided into an N-terminal domain (N) and C-terminal domain(C), the latter containing three segments (la, 2a and 2b; large numerals) two of which (la and 2a) are highly-homologous tandemly-repeated sequences of 93 aa residues. The CEA sequence includes six C-terminal segments forming three highly-homologous tandem repeats (la and lb, 2a and 2b, 3a and 3b; small numerals). The region corresponding to segments lb, 2a and 2b of CEA are missing from the SP, sequence, as is a small Was region near to the predicted C-terminus of the SP, polypeptide. Arrowheads indicate the positions of conserved Cys residues in CEA and SF’,. Identical aa are indicated with asterisks, and gaps by dashes.
446
a.
I kb 1234567 /kb 21123s
9.4 -
6.6 -
5.0/5.2= 4.3/4.4=
3.5 -
c3 - 1: = 8 1.6 - 1.4 -
1.0 - 0.8 -
9.4-
6.6-
5.0/5.2= 4.3j4.4=
3.5-
2.3- 2*0- 1 .Q- 1.6- 1.4-
1.0-- 0.8-
Fig. 4. Southern- and Northern-blot analyses ofplacental DNA and RNA using and SP,-XDNA probe. (Panel a) Southern-blot analysis
of human placental DNA (5 pg) digested with: (1)HindIII; (2)EcoRI; (3)BrI; (4)BamHI; (5)XbaI; (6)AccI, and (7) SacI. (Panel b) R, Northern-blot analysis of human placental RNA (3 pg). Arrows in panel b indicate positions of three major class sizes of RNA.
Markers (M) were denatured, radioactively labeled bacteriophage A DNA fragments, as on the left margin of panel a. Experimental details are described in MATERIALS AND METHODS, section b.
2.2 kb, whereas we detect three distinct size-classes of RNA with approximate sizes of 1.9, 2.1 and 2.3 kb. These species are close to the size of the cDNA inserts of pSP,-i(2016 bp), PSG16 (1900 bp) and PSG93 (2200 bp). Further work is required to establish the precise relationship between these RNA species and the cDNAs.
(e) Reia~onship of SP, with CEA and cross-reacting tumor antigens
This study draws attention to a clear relationship between SP, species, human CEA and nonspecitic CEA cross-reacting antigens. This is especially intriguing in view of a previous report that SF, and
CEA are immunolo~c~y distinct (Kaminska et al., 1979). The complexity of the Southern-blot analysis data indicates that there are probably several SP,- and NCA-related gene sequences in the human genome. Likewise, the Northern-blot analysis of human placental RNA using the SP, cDNA probe reveals the presence of three species of RNA with sizes similar to those detected by a CEA cDNA probe in human tumor-cell lines (Beauch.e~n et al., 1987). At least some of these sequences may repre- sent cross-hybridizing species, since the cDNAs for SP, and CEA are over 70% homologous over exten- sive regions. This is perhaps not surprising since evidence suggests there are at least seven distinct species of protein antigenicaliy reiated to CEA (sum-
441
marized in Thompson et al., 1987), apart from addi- tional proteins such as SP, which do not interact with anti-CEA antibodies.
The study also emphasizes the structural com- plexity evident at the C tern&i of the SP, species (PSGl6, PSG93 and SP,-i) and CEA. Moreover, comparison of the coding sequence of SP, with the known sequences of CEA and NCA reveals the internal modular nature of these proteins (Fig. 2). Explanations for these structural observations, and for the unusual sequence relationships between the noncoding segments of SP, and CEA cDNAs referred to above presumably lies in the structure of the various genes and the way in which their mRNAs are processed.
(f) Possible functions of SP,
The function of SP, is not known, although several ~ssibiIities have been suggested, including a role in carbohydrate metabolism (Tatra et al., 1976), bind- ing of steroid hormones and iron (Lm et al., 1974) and immune suppression, which may help to protect the fetus against immune rejection by the mother (Home et al., 1976b). It was anticipated that struc- tural comparisons with other known proteins might shed some light on the nature of SP, and its possible functional role, both as a natural placental protein
essential. for fetal development and as a tumor marker associated with certain forms of malignancy.
It was therefore encouraging, although at the same time surprising, to End a close structural relationship between SP, and CEA antigens, hitherto considered to be a distinct group of oncodevelopment~ glyco- proteins whose aberrant expression is likewise asso- ciated with the malignant phenotype. Different domains of SP, show weak, but possibly interesting, amino acid sequence alignments with N-CAM and two additional well-characterized cell adhesion molecules, human libronectin and vitronectin (Komblihtt et al., 1985; Suzuki et al., 1985). By analogy with the function of N-CAM this may point to a role for SP,/CEA-like proteins in mediating/- coordinating cell-cell interactions during embryo genesis. On this basis it could be speculated that the same property may in some way explain why their aberrant expression is associated with a malignant cell phenotype. The further ch~acte~ation these structur~ly complex proteins and their genes will
provide a firm basis for future studies specifically addressing the question of the function of this important group of tumor-associated antigens.
ACKNOWLEDGEMENTS
We are extremely grateful to Hans Bohn, Behringwerke AG, Marburg (F.R.G.) for a generous gift of purified placental SP,, and to Bryan Dunbar and John Fothergill, University of Aberdeen, for their help in sequencing SP, peptides. We are also indebted to Bob Cox and Mervyn Monteiro (NIMR, Mill Hill, London), to Roger SutclitTe and Debra Nickson (Genetics Institute, University of Glasgow) and to Andrew Coulson and colleagues (Department of Molecular Biology, University of Edinburgh) for valuable information on cDNA methodology, design of probe oligodeoxynuci~tides and initial sequence homology searches. We thank our colleagues Lee Gill, Anne Glover, Mairi Gordon, Domenico Ammaturo, Felix Businger and Ron de Winter for various contributions. B.C.R. was in receipt of a studentship from the University of Aberdeen Medi- cal Endowments Fund.
REFERENCES
Beauchemin, N., Benchimol, S, Coumoyer, D., Fuks, A. and Statmers, C.P.: Isolation and characterization of fall-length functional cDNA clones for human carcinoembryonic anti- gen. Mol. Cell. Biol. 7 (1987) 3221-3230.
Benton, W.D. and Davis, R.M. Screening of 1gt recombinant clones by hyb~disation to single plaques in situ. Science 196 (1977) 180-182.
Blattner, F.R., Blechl, A.E., Denniston-Thompson, K., Faber, H.E., Richards, J.E., Slightom, J.L., Tucker, P.W. and Smithies, 0.: Cloning human fetal gamma globin and mouse alpha-type globin cDNA: preparation and screening of shot- gun collections. Science 202 (1978) 1279-1284.
Bohn, H.: Detection and characterisation of pregnancy proteins in the human placenta and their quantitative ~~010~~~ detestation in sera from pregnant women. Arch. Gyngkot. 210 (1971) 440-457.
Dati, F., Grenner, G., Luben, G., Kapmeyer, W., Sieber, A. Bohn, H. and Bellman, 0.: Comparison of enzyme immuno- assays for SP, and AFP with other immunochemical methods. Ric. Clin. Lab. 12 (1982) 265-287.
Devereux, I. and Haeberli, P.: Introduction to the Sequence Analysis Software Package of the University of Wisconsin Genetics Computer Group, version 4. University of Wiscon- sin, Madison, WI (U.S.A.) 1986.
448
Engvall, E.: Pregnancy-specific bl glycoprotein: purification and
partial characterisation. Oncodevelop. Biol. Med. 1 (1980)
113-122.
Grudzinskas, J.G., Gordon, Y.B., Jeffrey, D. and Chard, T.:
Specific and sensitive determination of pregnancy-specific Bl
glycoprotein by radioimmunoassay. Lancet 1 (1977)
333-334.
Hemperly, J.J., Murray, B.A., Edelman, G.M. and Cunningham,
B.A.: Sequence of a cDNA clone encoding the polysialic
acid-rich and cytoplasmic domains of the neural cell adhesion
molecule N-CAM. Proc. Nat]. Acad. Sci. USA 83 (1986)
3037-3041.
Horne, C.H.W., Reid, I.N. and Milne, G.D.: Prognostic signifi-
cance of inappropriate production of pregnancy proteins by
breast cancers. Lancet II (1976a) 279-282.
Horne, C.H.W., Towler, C.M., Pugh-Humphreys, R.G.P.,
Thomson, A.W. and Bohn, H.: Pregnancy-specific 81 glyco-
protein: a product of the syncytiotrophoblast. Experientia 32
(197613) 1179-1199.
Hunkapiller, T. and Hood, L: The growing immunoglobulin gene
superfamily. Nature 323 (1986) 15-16.
Huynh, T.V., Young, R.A. and Davis, R.W.: Constructing and
screening cDNA libraries in lgtl0 and 1gtll. In Glover, D.A.
(Ed.), DNA Cloning, Vol. I. A Practical Approach. IRL
Press, Oxford, 1985, pp. 49-78.
Inaba, N., Renk, T., Ax, W., Schlottler, S., Weinmann, E. and
Bohn, H.: Possible synthesis of pregnancy-specific /II glyco-
protein and pregnancy-specific tissue proteins (PP,, and
PP,,) by human and cynomolgus monkey leukocytes. Acta
Haematol. 66 (1981) 35-38.
Jandial, V., Towler, C.M., Horne, C.H.W. and Abramovich,
D.R.: Plasma pregnancy-specific 81 glycoprotein in compli-
cations of early pregnancy. Br. J. Obstet. Gynaecol. 85 (1978)
832-838.
Kaminska, J., Calvert, I. and Rosen, S.W.: Radioimmunoassay
of pregnancy-specific 81 glycoprotein (SP,). Clin. Chem. 25
(1979) 577-580.
Kornblihtt, A.R., Umezawa, K., Vibe-Pedersen, K. and Barahe,
F.E.: Primary structure of human tibronectin: differential
splicing may generate at least 10 polypeptides from a single
gene. EMBO J. 4 (1985) 1755-1759.
Kuhajda, F.P., Bohn, H. and Mendelsohn, G.: Pregnancy-
specificpl glycoprotein (SP,)in breast carcinoma: pathologic
and clinical considerations. Cancer 54 (1984) 1392-1396.
Kozak, M.: Compilation and analysis of sequences upstream
from the translational start site in eukaryotic mRNAs.
Nucleic Acids Res. 12 (1984) 857-872.
Lee, J.N., Salem, H.T.,Chard,T., Huang, S.C. and Ouyang, PC.:
Circulating placental proteins (hCG, SP, and PP5) in
trophoblastic disease. Br. J. Obstet. Gynaecol. 89 (1982)
69-72. Lin, T.M., Halbert, S.P., Kiefer, D., Spellacy, W.N. and Gall, S.:
Characterisation of four pregnancy-associated plasma pro-
teins. Am. J. Obstet. Gynaecol. 118 (1974) 223-236.
Maniatis, T., Frisch, E.F. and Sambrook, J.: Molecular Cloning.
A Laboratory Manual, Cold Spring Harbor Laboratory, Cold
Spring Harbor, NY, 1982, pp. 202-203.
Oikawa, S., Imajo, S., Noguchi, T., Kosaki, G. and Nakazato,
H.: The carcinoembryonic antigen contains multiple immuno-
globulin-like domains. Biochem. Biophys. Res. Commun. 144
(1987) 634-642.
Paxton, R.J., Mooser, G., Pande, H., Lee, T.D. and Shively, J.E.:
Sequence analysis of carcinoembryonic antigen: identifi-
cation of glycosylation sites and homology with the immuno-
globulin supergene family. Proc. Natl. Acad. Sci. USA 84
(1984) 920-924.
Rigby, P.W.J., Dieckmann, M., Rhodes, C. and Berg, P.:
Labelling DNA to high specific activity in vitro by nick trans-
lation with DNA polymerase I. J. Mol. Biol. 113 (1977)
237-239.
Rink, H., Liersch, M., Sieber, P. and Meyer, F.: A large fragment
approach to DNA synthesis: total synthesis of a gene for the
protease inhibitor Eglin C from the leech Hirudo medicinalis
and its expression in E. coli. Nucleic Acids Res. 12 (1984)
6369-6387.
Rooney, B.C.: Molecular Studies of Schwangerschaftsprotein 1
(SP,). Ph. D. Thesis, University of Aberdeen, 1987.
Russell, G.A., Dunbar, B. and Fothergill-Gilmore, L.A.: The
complete amino acid sequence of chicken skeletal-muscle
enolase. Biochem. J. 236 (1986) 115-126.
Sanger, F., Nicklen, S. and Coulson, A.R.: DNA sequencing with
chain-terminating inhibitors. Proc. Natl. Acad. Sci USA 74
(1977) 5463-5467.
Seppala, M., Rutanen, E.M., Heikinheimo, M., Jalanko, H. and
Engvall, E.: Detection of trophoblast tumor activity by
pregnancy-specific #I1 glycoprotein. Int. J. cancer 21 (1978)
265-267.
Smith, R., Klopper, A., Hughes, G. and Wilson, G.: The com-
partmental distribution of oestrogens and pregnancy-specific
81 glycoprotein Br. J. Obstet. Gynaecol. 86 (1979) 119-124.
Sorensen, S.: Pregnancy-specific /?l glycoprotein (SP,): purifi-
cation, characterisation, quantification and clinical applica-
tion in maligancies (a review). Tumour Biol. 5 (1984)
275-302.
Southern, E.M.: Detection of specific sequences among DNA
fragments separated by gel electrophoresis. J. Mol. Biol. 98
(1975) 503-517.
Suzuki, S., Oldberg, A., Hayman, E.G., Pierschbacher, M.D. and
Ruoslahti, E.: Complete amino acid sequence of human
vitronectin deduced from cDNA similarity of cell attachment
sites in vitronectin and fibronectin. EMBO J. 4 (1985)
2519-2524.
Tatarinov, Y.S. and Masyukavich, V.N.: Immunological detec-
tion of a new /31-globulin in the blood sera ofpregnant women.
Byull. Eksp. Biol. Med. 69 (1970) 66-68.
Tatarinov, Y.S., Masyukavich, V.N., Nikulina, D.M., Novikova,
L.A., Toloknov, B.O. and Falaleeva, D.M.: Immunological
detection of a new /?l-globulin in the pregnancy-zone in serum
of patients with trophoblastic tumors. Int. J. Cancer 14 (1974)
548-554.
Tatra, G., Tempfer, B. and Placheta, P.: Influence of blood
glucose levels on serum concentrations of pregnancy-specific
protein SP, and HPL during the last trimester of pregnancy.
Eur. J. Obstet. Gynaecol. Reprod. Biol. 6 (1976) 53-67.
Teisner, B., Westergaard, J.G., Folkersen, J., Husby, S. and
Svehag, S.E.: Two pregnancy-associated serum proteins with
449
pregnancy-specific glycoprotein determinants. Am. J. Obstet. Gynaecol. 131 (1978) 262-266.
Than, G.N., Csaba, L.F., Bohn, H., Szabo, D.G., Szalmasy, MS. and Menczer, G.: Monitoring therapy in trophoblastic dis- eases by radioimmunoassay of pregnancy-specific Bl-glyco- protein and the B-subunit of human chorionic gonadotrophin. Oncodevelop. Biol. Med. 3 (1982) 315-323.
Thompson, J.A., Pande, H., Paxton, R.J, Shively, L., Padina, A., Simmer, R.L., Todd, C.W., Riggs, A.D. and Shively, J.E.: Molecular cloning of a gene belonging to the carcino- embryonic antigen gene family and discussion of a domain model. Proc. Natl. Acad. Sci. USA 84 (1987) 2965-2969.
Ullrich, A., Shine, J., Chirgwin, J., Pictet, R., Tischer, E., Rutter, W.J. and Goodman, H.M.: Rat insulin genes: construction of
plasmids containing the coding sequences. Science 196
(1977) 1313-1319. Wahl, G.M., Stern, M. and Stark, G.R.: Efllcient transfer oflarge
DNA fragments. Proc. Natl. Acad. Sci USA 76 (1979) 3683-3687.
Watanabe, S. and Chou, J.Y.: Isolation and characterization of cDNAs encoding human pregnancy-specific Bl-glycoprotein. J. Biol. Chem. 263 (1988) 2049-2054.
Zimmerman, W., Ortlieb, B., Friedrich, R. and Kleist, S.V.: Isolation and characterization of cDNA clones encoding the human carcinoembryonic antigen reveal a highly-conserved repeating structure. Proc. Natl. Acad. Sci. USA 84 (1987) 2960-2964.
Communicated by T.A. Bickle.