Molecular cloning of a cDNA for human pregnancy-specific β1-glycoprotein: homology with human...

Gene, 71 (1988) 439-449

Elsevier 439

GEN 02699

Molecular cloning of a cDNA for human pregnancy-specifk jl-glycoprotein: homology with human car~inoembryo~c antigen and related proteins

(Placental proteins; carcinogenesis; trophoblastic tumors; Iphage vectors; ~~oglobu~ domains, nucleotide sequence)

Bernadette C. Rooney *b, C.H. Wilson Horue’ and Norman Hardmanhb

a Debarment of Bi~hern~~. Universes of Aberdeen, Ma~cha~ College, Aberdeen AB9 IAS (U.K.) Tel. 0224 742848, ’ ~n~ver~.~ Department of Patho~5~, Royal Victor& ~n~~a~, ~ewcast~e-~~~-T~ne NE1 4LP (U.K.) Tel. 0912328511 and ‘Department of Molecular Biology, Biotechnology Section, CIBA-GEIGY AG, CH-4002 Base1 (Switzerland)

Received 8 June 1988 Accepted 12 July 1988 Received by publisher I.5 August 1988

SUMMARY

Human pregnancy-specific fll-glycoprotein (SP,) plays an essential role in normal pregnancy. It is also a well-characterized oncodevelopmental antigen, expressed aberrantly by all trophoblastic tumors and some other malignant cell types. Here we report the ident~cation of a human placental cDNA encoding the SP, polypeptide sequence. The coding sequence shows 95 % identity at the nucleotide level with a distinct, recently published SP, cDNA sequence (PSG16). Unexpectedly, the sequence is also highly homologous to the published sequence of human carcinoembryonic antigen (CEA). SP,, CEA and CEA-related nonspecific cross-reacting species thus belong to a group of closely related though antigenically diverse tumor-associated glycoproteins. Comparison of the deduced amino acid sequence of the SP, cDNA with that of CEA provides insight into the modular nature of these related proteins. This may have implications for the genomic organization and evolution of the CEA gene family.

INTRODUCIION

Human pregnancy-specific glycoprotein, referred to by different groups as SP,, PS/IG or PAPP-C (Bohn, 1971; Tatarinov and Mas~a~ch, 1970; Lin et al., 1974) is detectable in maternal serum as

early as seven days after conception, and its level increases in an almost linear manner thereafter until

the 36th week of pregnancy when its serum concen- tration ranges between 95-3 15 &ml (G~dz~skas et al., 1977). The serum half-life of SP, is approx. 30 h, based on the rate of its disappearance from

Correspondence to: Dr. N. Hardman, Biotechnology K-681.4.43, CIBA-GEIGY AG, CH-4002 Base1 (Switzerland) Tel. 0616967061.

specific CEA cross-reacting antigen; N-CAM, neural cell adhesion molecule; nt, nucleotide(s); oligo, oligodeoxynucle- otide; ORF, open reading frame; PAAP-C. PSG and SP,, pregnancy specific /Xl-glycoprotein; PAGE, polyacrylamide gel

Abbreviations: aa, amino acid(s); bp, base pair(s); CEA. car- elec~ophoresis; SDS, sodium dodecyt sulfate; SP,, see cinoembryonic antigen; kb, kilobase or 1000 bp; NCA, non- I~ODU~ON; SSC, 150 mM NaCl, Na, ‘ citrate, pH 6.8.

0378-l 119/S8/$03.50 0 1988 Elsevier Science Publishers B.V. (Biomedical Division)

440

maternal blood following delivery (Towler et al., 1976; Dati et al., 1982) While the function of SP, in normal pregnancy is not known, it is thought to be essential since low serum levels are associated with threatening abortion (Jandial et al., 1978). The protein is generally supposed to be a product of the placenta (Sorensen, 1984; Home et al., 1976b), from which it can be extracted and purified (Inaba et al., 1981). Placental SP, was previously shown to be a 80-90-kDa glycoprotein, about 30% of which is accounted for by carbohydrate (Bohn, 1971) N-linked to Asn residues (Engvall, 1980). This material is now referred to as SP,-/3 following the dis-

covery of a high-M, variant form (SP,-g 430 kDa) in late-prelacy serum (Teisner et al., 1978). Additional SP,-related antigens of various 1M,s have also been reported, such as a urinary form and the species ~~-1, ~~-2, and y (reviewed by Sorensen, 1984).

SP, is also a product of trophoblastic tumors and has been used extensively as a tumor marker in this form of rn~i~~t disease (Tat~nov et al., 1974; Seppala et al., 1978; Lee et al., 1982; Than et al., 1982). SP, is expressed by some non-trophoblastic tumors, although its diagnostic value as a marker for these other forms of malignancy remains uncertain. Conflicting results on the expression of SP, may have arisen because of different assay methods, since in some cases SP,-related antigens may be expressed but not secreted into the blood (Sorensen, 1984). SP, expression, determined by immunohistochemi- cal staining of tumor biopsy material in patients with breast cancer, correlates with poor prognosis (Kuhajda et al., 1984; Home et al., 1976a).

The present study reports the molecular cloning of the cDNA for SP, from a library of sequences derived from human placental mRNA. The cloned SP,-cDNA nucleotide sequence includes a single, long ORF coding for a polypeptide of 428 aa (M, 54543), with several potential sites for N-linked glycosylation. Comparison with available protein sequences shows that the predicted SP, polypeptide is unique, though it is closely related to a second recently published SP, sequence (Wat~abe and Chou, 1988). In addition, the sequence shows con- siderable homology (50% amino acid sequence identity) with that of human CEA and nonspecific cross-reacting antigens, NCA (Paxton et al., 1987; Zimmerman et al., 1987; Beauchemin et al., 1987;

Oikawa et al., 1987). Implications of these observations for the structure of the genes for SP,, CEA and related proteins are discussed.

MATERIALS AND METHODS

(a) Amino acid sequence analysis of tryptic fragments, synthesis of probe oiigod~xynucIeotides and sequence analysis of SP, cDNA

Purified placental SP,-/? was kindly provided by Dr. Bohn, Behringwerke AG, Marburg (F.R.G.). The purity of the protein was confirmed by SDS-PAGE, Western blottingus~grabbit anti-SP, (Dakko) and by amino acid composition analysis (Rooney, 1987).

SP,-/I (2 mg) was subjected to reduction and car- boxymethylation, followed by trypsinization using methods described by Russell et al. (1986). Tryptic peptides were separated on a Waters Associates ~Bondap~ Column (30 cmx 3.4 mm) using 0.1% trifluoroacetic acid as aqueous phase and 100% propan-2-01 as solvent phase. High-pressure liquid chromatography was carried out over a 90-min period using a O-40% propan-2-01 gradient and a flow rate of 1.5 ml/min. Amino acid sequencing of purified SP, tryptic peptides was carried out by auto- mated Edman degradation with an Applied Bio- systems Model 470A gas-phase sequencer (SERC Protein Sequencing Facility, University of Aberdeen) using methods described elsewhere (Russell et al., 1986).

Based on amino acid sequence data of tryptic peptides, two mixed oligo probes were synthesized using the procedure described by Rink et al. (1984):

Probe 1: 5’-(~/T~~(G/A)TA~CA(T/A/G)AT- (GIA)TA(T/A/G/C)CC-3’

Probe 2: 5’-CCA(T/A/G)AT(G/A)TA(T/A/G/- C)GT(A/G)TA(G/A)TT-3’

The predicted melting temperature of probe 1 was 45 5 o C and of probe 2,43 Q C (assuming the lowest possible G -t C contents). The probes were used for hyb~dization screening (Benton and Davis, 1977) of a human placental cDNA library in ,?@I (HL1008, Clontech Laboratories Inc., Palo Alto, CA) after plating phage and transferring plaques as described by Huynh et al. (1985). Hybridization and washing was performed at 37°C in 6 x SSC. Positive plaques

441

were plaque-purified and re-screened as described above. Phage DNA was isolated as described by Blattner et al. (1978).

Nucleotide sequence analysis of the pSP,-i cDNA insert was carried out by the M 13 cloning/sequencing procedure (Sanger et al., 1977) using sequencing kits (Amersham International, U.K., Cat. No. 4502).

Computer manipulations and comparisons of nucleotide sequences were carried out using the University of Wisconsin Genetics Computer Group program package (Devereux and Haeberli, 1986).

(b) Southern- and Nothern-blot analysis of human

placental DNA and RNA

Samples of human placental DNA (5 pg) were digested to completion and electrophoresed on 0.5 % agarose gels in EDTA/Tris * acetate buffer using standard procedures. DNA blotting was performed using an adaptation of the Southern (1975) procedure as described by Wahl et al. (1979). The radioactively labeled pSP,-i DNA insert was labeled by nick-translation (Rigby et al., 1977). Final washes of the filters were performed using 1 x SSC at 68°C.

For Northern-blot analysis, placental RNA was prepared using the procedure described by Ulhich et al. (1977). RNA (3 pg) was fractionated in 1 y0 formaldehyde - 1% agarosegels, transferred to nitro- cellulose membrane and probed using nick-trans- lated pSP,-i insert DNA prepared as described above. Methods used are those described by Maniatis et al. (1982).

RESULTSAND DISCUSSION

(a) Identification of a human placental SP, cDNA

The polypeptide sequences of tryptic fragments of purified SP, protein were used to predict the nucleotide sequences of two potentially useful oligo probes for hybridization selection of candidate SP,-cDNA clones from a &t 11 human placental cDNA (oligo- 1

and oligo-2, described in MATERIALS AND

METHODS, section a). From 1 x lo4 independent plaques screened by hybridization with a mixture of oligo 1 + 2, 135 positively hybridizing phage were identified. Ten were picked at random and re-

screened with oligo-1 and oligo-2. One phage failed to rehybridize, and two other phage hybridized only to oligo-1 and had small cDNA inserts (250 and 500 bp). The remaining recombinants hybridized with both probes and contained single cDNA inserts ranging in size from 1000-2000 bp. DNA from one recombinant phage containing a cDNA of 2 kb was recovered from the vector by cleavage with EcoRI, and cloned into the plasmid vector pUC12. The recombinant plasmid was designated pSP,-i. A restriction map of the insert DNA fragment of pSP,-i

was constructed and the information used to generate appropriate overlapping DNA segments by restriction enzyme cleavage, which were subcloned into Ml3 vectors for nucleotide sequence analysis. The nucleotide sequence of the 2-kb EcoRI cDNA insert of pSP,-i is shown in Fig. 1.

The cDNA insert of pSP,-i contains a single, long ORF. Three in-frame methionine residues are located in the N-terminal region of the predicted amino acid sequence. Of these, the most 3’ ATG (located at nt 238) was chosen as being the most likely start codon based on Kozak (1984) rules. Comparison of the cDNA sequence with that of CEA similarly predicts the use of this translational start site (see RESULTS, section b). The sequence between nt 238-1520 encodes a 428-aa polypeptide with J4, of 54543 and an aa composition similar to placental SP, (Rooney, 1987).

(b) Predicted amino acid sequence ofSP, and com-

parison with related proteins

A search through the protein sequence data banks showed that the predicted sequence of SP, is unique. However, the cDNA sequence shows extensive homology with a distinct, recently published 1.9-kb SP, cDNA sequence (PSG16; Watanabe and Chou, 1988). PSGl6, and a second cDNA clone, PSG93 (2.1 kb), encode virtually identical SP,-polypeptides of 46.9 and 47.2 kDa, which appear to differ only in their C-terminal amino acids because of an 86-bp in- sertion in the 3’ -end ofthe coding sequence of PSG93 (Watanabe and Chou, 1988). Additional 5’-noncoding sequence in PSG93 accounts for the remaining size difference between these two cDNA species. The coding sequence of pSP,-i shows 94.7% identity with that of PSG16 at the nucleotide level (Fig. 1). Sequence differences are distributed along the entire

SPl

SPl

SPl

SPl

PSG

SPl

PSG

SPl

PSG

SPI

PSG

SPl

PSG

';Pl

PSG

SP1

PSG

SP1

PSG

se1

PSG

SPl

PSG

>r1

PSG

SP1

PSG

SP1

PSG

AAGCCACACGCCCCTTTTGCTTAGGAGGCCTCTCTGCTGGAGGATGACGATGGCXTCTT

TA

TC

TA

AG

GC

CA

CT

GA

CA

AG

TC

AT

CA

AT

AT

AG

GA

CA

GC

AC

AG

CT

GA

GA

GC

~C

TC

AG

G

Ple

t A

AG

TT

TC

TG

GA

TC

CT

AG

GC

TC

AG

CT

CC

AC

AG

AG

GA

GA

AC

AC

GC

AC

GC

AG

GC

AG

CA

GA

GA

CC

AT

G

. GlyProLeuSerAlaProProCysThrGlnArgIleThrTcpLy~GlyLeuLeuLeuTh~

GGGCCCCTCTCAGCCCCTCCCTGCACACAGCGCATCACCTGG~GGGGCTCCTGCTCACA

AA

AA

A

Thr

LYS v

AlaSerLe"LeuAsnPheTrpAsnProP~"ThrThrAlaGlnValTh~IleGl"AlaGlu

GCATCACTTTTAAACTTCTGGAACCCCGCCACTGCCCGCCGAG

T

C

C

LIZ

” GlIl

ProThrLysValSerLysGlyLysAspValLeuLeuLeuValHisAsnLeuProGlnAsn

CCAACCAAAGTTTCCAAGGGGRAGGACGTTCTTCTACGTTCTTCTACTTGTCCAC~TTTGCCCCAG~T

C

G

T

PZO

Glu --

7

LeuAl

~l~T~r~l~T~pTyrL~GlyGlnMetLysAspLe"TyrSisTy~IleTh~S~~

CTTGC

GGCTACATCTGGTAC

GGGCAAATGAAGGACCTCTACCATTACATTACATCA

AC

G

Thr

AMY

TyrVllValAspGlyGlnIleIleIleTyrGlyProAlaTyrSe~GlyA~yGl"ThrVal

TACGTAGTAGATGGTCAAATAATTATATATATGGGCCTGCATACAGTGGACGAG-CAGTA

T

C

G

T

C

Glu

Ala

I I

I I

TyrSe~ASnAlaSeKLeuLeuIleGlnAsnValThrAryGl"AspAlaGlySeCTyCThr

TATTCCAATGCATCCCTGCTGATCCAGAATGTCACCCCGGGAGGACGCAGGATCCTACACC

Leul~isIleValLysAryG1yAspGlyThrArgGlyGlyGluThcGlyHisPheTh~PheTh~

TTACACATCGTAAAGCGAGGTGATGGGACTAGAGGAGAAACTGGACATTTCACCTTCACC

A

G

A

T

G

1le

GlyAsp

va1

AC9

Le"TyrLeuGluThrProLysProSerIleSerSerSerAsnLe"TyrP~oAryGluA~p

TTATACCTGGAGACTCCCARGCCCTCCATCTCCAGCAGCACTTATACCCCAGGGAGGAC

C

T

AT

AC

His

As"

Thr

MetGl"Al~V~lSe~Le"Th~C~sAspProGl"ThrProAspAl~Se~Ty~Le"T~pT~p

ATGGAGGCTGTGAGCTTAACCTGTGATCCTGAGACTCCGGACGCAAGCTACCTGTGGTGG

C

A

MetAsnGlyGlnSerLeuProMetThrHisSerLeuGlnLeuSerLysAsnLysAryTh~

ATGAATGGTCAGAGCCTCCCTATGACTCACAGCTTGCAGTTGTCC

AAAAAC-GGACC

A

C

G

C

C

LYS

GluThrAsn

. LeuPheLeuPheGlyValThrLysTyrThrAlaGlyProTyrGl"CysGl"IleAryAsn

CTCTTTCTATTTGGTGTCACAAAGTACACACTGCAGGACCCTATG~TGTGARATACGGAAC

G

'T

Le"

ProValSerAlaSerArgSerAspP~oValThrLe"AsnLe"Le"ProLysLeuP~OLy~

CCAGTGAGTGCCAGCCGCAGTGACCCAGTCACCCTGAATCG

60

120

18"

SPl

PSG

240

SPl

PSG

300

SPl

PSG

360

SPl

PSG

420

SPl

PSG

480

SPl

PSG

540

SPl

PSG

600

SPl

660

PSG

SPl

720

PSG

SPl

780

SPl

SPl

840

SPl

SPl

900

SPl

SP1

SPl

960

SPl

ProTy~IleThrIleAsnAsnLe"AsnProArgGluAsnLysAspValLeuAlaPheThr

CCCTACATCACCATCAACAACTTAAACCCCAGGGAGAATAAGGATGTCTTAGCCTTCACC

AA

Asn

. CysGluProLysSerGl

As"TyrThrTyrIleTrp~rpLe"As"GlyGlnSerLe"Pro

r----

TGTGRACCTRAGAGTGAdAACTACACCTACATTTGG~TGGCT-TGGTCAGAGCCTCCCG

~________~

ValSerProArgValLysArgPrOIleGl"AsnA~gIleLeUIleLeUP~OSe?ValThI

GTCAGTCCCAGGGTAAAGCGACCCATTG-CAGGATCCTCATTCTACCCAGTGTCACG

A

AqAsnGluTh

GlyProTyrGlnC~sGl"IleGl"AspArgTyrGlyGlyIleArySer

AGAAATGAAACAGGACCCTATCAATGTGTG-TACAGGACCGATATGGTGGCATCCGCAGT

G

G

AC9

va1

TyrProValThrLeuAsnValLeuTyrGlyProAspLeuP~oA~gIleTyrP~oSe~Phe

TACCCAGTCRCCCTGAATGTCCTCTATGGTCCAGACCTCCCCAG~TTTACCCTTCATTC

G

Asp

ThrTyrTyrHisSerGlyGluAsnLeuTyrLeuSe~C~sPheAl~AspSe~A~nP~~P~~

ACCTATTACCATTCAGGAG-CCTCTACTTGTCCTGCTTCGCGGACTCT~CCCACCA

'G

GT

T CT

G

AMY

va1

Ser

AlaGluTyrSerTrpThrIleAsnGlyLysPheGlnLeuSerGlyGlnLysLeUPheIle

GCAGAATATTCTTGGACRATTARTGGGG~GTTTCAGCTATCAGGAC-GCTCTTTATC

CG

A

A

C

Gln

Glu

Pro

G

T

T

Ary

His

Val

A

G

AA

LYS

Glu

Glu

LeuProGlyLeuAsnProLeuEnd

CTTCCTGGCCTTAATCCATTATAGCAGCCGTCATTGACTG

GC

AG

AC

AG

TT

GC

TT

TC

AT

TC

TT

CC

TC

-GT

AC

CA

TT

TG

C

TTTTTGTTCAAGGAGATTTATG-GACAAGGAGTTCCTG

ATAACTTCAAGATCATACATGGACTAAGAACTTTCAAAATCAGGCTGATAC

TTCATGAAATTCAAGACAAAGAAAAAAA

CCCAATTTTATTGGACTAAATAGTC-CAA

TGTTTTCATAATTTTCTATTTG-TGTGCTTTGAT

TTATGCACTTTTTTTCTTCAGCAATTGGTARRGTATACTT

TTGAAAC

ATTTGCTTTTGCTCCCTRAGTGCCCCAGRATTGGGG-CTATTCAGGAGTATTCATATGT

TTATGGTWGTTATCTGCACAAACCCGAATTC

1020

lOS0

1140

1200

1260

1320

1380

1440

1500

1560

1620

1680

1740

1800

1860

1920

1980

2016

Fig.

1.

Nuc

leot

ide

sequ

ence

of

the

cD

NA

in

sert

of

pSP

,-i.

The

se

quen

ce

is t

erm

inat

ed

byE

coR

1 cl

onin

g si

tes

whi

ch

link

the

cDN

A

to t

he

lgtl

1 cl

onin

g ve

ctor

. T

wo

in-f

ram

e A

TG

s,

loca

ted

5’ t

o th

e su

ppos

ed

tran

slat

iona

l st

art

codo

n an

d a

pote

ntia

l po

lyad

enyl

atio

n si

te

(AA

TA

AA

) ar

e bo

xed.

D

ots

and

brac

kets

, re

spec

tivel

y,

show

th

e lo

catio

n of

the

se

ven

Cys

re

sidu

es

and

four

co

nsen

sus

site

s fo

r N

-lin

ked

glyc

osyl

atio

n.

The

se

quen

ces

of S

P,

tryp

tic

pept

ides

ar

e un

derl

ined

an

d th

e po

sitio

ns

ofol

igo-

1 an

d ol

igo-

2 ho

mol

ogou

s se

quen

ces

indi

cate

d by

das

hed-

line

boxe

s.

As

expe

cted

fo

r tr

yptic

pe

ptid

es

each

se

quen

ce

is p

rece

ded

eith

er

by

Lys

or

A

rg.

The

do

wnw

ard

arro

whe

ad

indi

cate

s th

e po

ssib

le

clea

vage

si

te

for

rem

oval

of

a l

eade

r pe

ptid

e

base

d on

st

ruct

ural

an

alog

y w

ith

CE

A

and

NC

A

(see

Fi

g.

2 an

d R

ESU

LT

S A

ND

D

ISC

USS

ION

, se

ctio

n b)

. O

nly

the

codi

ng

segm

ent

is c

ompa

red

with

th

at

of P

SG16

(W

atan

abe

and

Cho

u,

1988

) an

d fo

r cl

arity

id

entic

al

nucl

eotid

es/a

min

o ar

e no

t in

dica

ted.

H

omol

ogie

s in

the

5’

- an

d 3’

-non

cocl

mg

regi

ons

are

muc

h le

ss

sign

ific

ant

(aro

und

40%

) an

d ar

e no

t sh

own

here

.

length of the coding regions, with marked divergence at their C-termini. Amino acid changes in SP,-& coded protein lead to loss of three of the seven potential sites of N-linked glycosylation compared with the PSGl6 pol~tide (Watanabe and Chou, 1988), but potentially critical Cys residues are conserved. PSG peptides sequenced by Watanabe and Chou (1988) contain some residues that are at variance with the predicted cDNA sequence determined here (Val-393 ; Lys401; Glu-412). Peptides sequenced in the present study correspond with the cDNA sequence shown, but differ from the polypeptide sequence predicted from PSGI6 at three locations: Ala-29 reads Thr-29; Val-67 reads Ala-67; and Asp-127 reads Thr-127, all for PSG16. Re-examination of our data raises the possibility of an Asp/Thr mixed sequence at aa position 127. These conflicts may reflect differences in abun- dance of protein isoforms in the separate prepara- tions, or differential recovery of peptides, or both. In contrast to the coding regions, the 5’ and 3’ noncoding segments of the two cDNAs show little homology; 42% and 40%, respectively, using the alignment program ‘GAP’. These results show that the pSP,-i and PSG16 mRNAs are almost certainly derived from distinct SP, genes.

The predicted SP,-i-coded polypeptide sequence shows a hitherto undetected significant homology with the published sequence of human carcinoembryonic protein and related tumor antigens (Paxton et al., 1987; Zimmerman et al., 1987; Beauchemin et al., 1987; Oikawa et al., 1987). The

sequence relationship between SP,, CEA and NCA is illustrated in Fig. 2. The methionine at nt position 238 marks the beginning of a 202-aa region of amino acid sequence homology with the N-terminal segment of CEA. The N-terminal sequence of NCA is also shown, extending up to the limit of the published data (residue 109; Paxton et al., 1987). The partial nucleotide sequence of a genomic NCA gene (Thompson et al., 1987) extends the region of homology with CEA and SP,, to include an additional N-te~inal 12 aa starting with serine, after which the genomic NCA sequence shows no obvious homology with SP, or CEA. This is thought to be the position where an intron interrupts the coding region of the NCA gene (Thompson et al., 1987). Taken together, this information suggests that the N-terminal 34 aa of SP, encodes a leader peptide similar to

443

that of CEA and NCA which is probably located on at least two exons. The protein is predicted to be cleaved to produce a mature polypeptide with an N-terminal glutamine (amino acid residue 1, Fig. 2). We were unable to obtain the N-terminal sequence of the mature SP, protein to confirm this possibility. From these data SP,, CEA and NCA are approx. 50% homologous at the amino acid sequence level.

SP, and CEA rn~t~ a similar level of homology in their C-terminal domains (domain C, Fig. 2) extending from aa 203 in SP, to the end of the coding sequence. A gap of 14 aa residues occurs in the homology between SP, and CEA mediately before the C-terminus of the two proteins, and homology resumes for the remaining 9 C-terminal aa. This is included within the hydrophobic terminal 27-aa segment of CEA that is believed to anchor the protein to the plasma membrane (Zags et al., 1987; Beauchemin et al., 1987). The BESTFIT nucleotide sequence homology between the SP, and CEA cDNAs shows that the coding sequences terminate at equivalent nt positions, and that a similar level of homology (approx. 80% using the GAP program) is maintained in both the 5’-and the 3’noncoding segments of the two cDNAs, up to the point at which the one for CEA contains an MuI-family repeated sequence (Zimmerman et al., 1987). The repeated element presumably inserted into the 3’-untranslated region of the CEA gene after it diverged from that of SP,. It is intriguing that, although the predicted coding sequence of pSP,-i is closely related to that of PSG16, the noncoding segments of pSP,-i are

much more homologous to those of the CEA cDNA. This underlines the close evolutions relationship between the SP, and CEA family of genes.

An additional interesting feature of the SP, protein sequence is the presence of internal repeats in the C-terminal domain (Figs. 2 and 3; segments la, 2a and 2b). Similar repeats have been described previously in the CEA polypeptide sequence (Zimmerman et al., 1987; Beauchemin et al., 1987). These regions are highlighted in Fig. 2 and shown schematically in Fig. 3. The C-terminal portion of CEA contains three related repeats of 178 aa residues (l-3; Fig. 2 and Beauchemin et al., 1987). Based on a comparison with SP,, each of these domains can be sub- divided further into two segments (a and b; Fig. 2). Segments a and b each contain two conserved Cys residues which flank regions of additional internal

444

/aa -34 -34 -13

SPl: CEA: NCA:

MGPLSAPPCT QRITWKGLLL TASLLNFWNP PTTA *ESp****HR WC*P*QR*** *****T”*** **k*

*X*T**?r* tk***

1 N

1 -

C

SPl: CEA: NCA:

:: 1

QVTIEA EPTKVSKGKD KL***S T*FN*AE**E KL***S T*FN*AE**E

SPl: CJZA: NCA:

17

::

VLLLVHNLPQ NLAGYIWYKG QMKDLYHYIT SYWDGQIII YGPAYSGRET ****ttk**** H*F*"S"**" ERV”GNRQ”I G**IGT*QAT p********I ****A****” *RI**S***G EBV*GNSL*V G**IGT*QAT pi****+***

SPl: CEA: NCA:

ii; 65

VYSNASLLIQ NVTREDAGSY TLHIVKRGDG TRGETGHFTF TLY I*p*k***** ***QN*T*Fk “*“VI*SDLV f-.JEEA**Q--* RVX

I”p*****“* ***QN*T*F* **QVI*SDLV NEEA**Q--* HV”

la- LETPKPS p*L****

v la- ISSSNLYPRE DMEAVSLTCD PETPDASYLW WMNGQSLPMT HSLQLSKNKR +**N*SK"v" *KD*IAF**E ***Q**T*** *V*N?r***VS PR****NGN”

v TLFLFGVTKY TAGPYECEIR NPVSASRSDP VTLNLL **T**N**RN DT’S*K**TQ ““““*R*“*S *I**V*YGPD APTISPLNTS

lb- v

YRSGENLNLS CHAASNPPAQ YSWFVNGTFQ QSTQELFIPN ITVNNSGSYT

SPl: 110 CEA: 108

SPl: 117 CEA: 115

SPl: 167 CEA: 165

SPl: CEA: 215

SPl: CEA:

IQNTTYLWWV NNQSLPVSPR LQLSNDNRTL TLLSVTRNDV GPYEEGIQNE

265

SPl: CEA: 315

SPl: CEA: LSVDH~DPVI LNVLYGPDDP TISPSYTYYR PG~LSLSJH AASNPPAQYS

2b- WLIDGNIQQH TQELFISN~T EKNSGLYT:Q ANNSAS SR T~KTI~S

%%- KPYITINNLN AE** L*S*SS**SK 3a-

PRENKDVLAF TEEPKSENYT YIWWLNGQSL PVSPRVKRPI ENRILILPSV *v*D*AAv** ****EAQ+T& *L**Vk*““” *****LQLSN G**T*T*FN*

TRNETGPYQ: EIQDRYGGIR SYPVTLNVLY GPDLPRIYPS FTYYHSGENL 2b-

***DA~"v+ G**NSvSAN* +D*?r**D*** **fcT*I*S"p DSS"L**A** 3b -

YLS&?ADSNP PAEYSWTING KFQLSGQKLF IPQITTKHSG LYA:SVRNSA N***HSA*** SPQ***R*** IP*QHT*V*F *AK**PNNN* T***F*S*L*

365

SPl: CEA: 415

SPl: 203 CEA: 464

SPl: 217 CEA: 478

SPl: 267 CEA: 528

SPl: 317 cm: 578

SPl: 367 TGMESSKSMT VKVSAPSGT- ---------- ---GHLpGLN PL

CEA: 628 **RNN*IVKS IT***-***S PGLSAGATVG IMI*V*V*VA LI

Fig. 2. Comparison of the deduced SP, amino acid sequence with those of human CEA and NCA. The predicted SP, amino acid sequence from Fig. 1 is aligned with those of CEA and NCA, and aa positions numbered according CO N termini of the mature CEA and NCA polypeptides (Paxton et al., 1977). The CEA-coding sequence was derived from the cDNA sequence reported by Beauchemin et al. (1987). Met residues (aa position -34) are those located at the start of the SP, and CEA ORFs. The SP, ammo acid sequence

445

a. chemically similar or identical. Regions surrounding the conserved Cys residues in the repeated segments

r--N , C I show some similarity with the repeating units of the CEA 1 la 1 lb 1 2a 1 2b 1 3a I 3b b

I 14

LA neural cell adhesion molecule N-CAM (Hemperly et al., 1986). A 35 aa segment of the N-domain of

i SP, (aa residues 34-69) is weakly related to the human vitronectin amino acid sequence (aa residues

1 la 2a 2b 1ooaa 239-254; Suzuki et al., 1985), and two segments of

-NLC- the C-terminal domain encompassing virtually the b.

sp ?XJJ

1 entire SP,-la and lb regions (aa residues 120-203

V and 2 13-296) can both be aligned with one of sixteen 90-aa-long Type-III cell/DNA/heparin binding

la 2a 2b domains of human fibronectin (nt residues 1215-1298; Kornblihtt et al., 1985). The possible

Fig. 3. Structure of SP, and CEA. (a) Diagramatic representa- significance of these homologies has not been tion of the homologies between SP, and CEA polypeptide investigated further. sequences shown in Fig. 2. M represents the supposed C-terminal segment of CEA not present in the SP, sequence, and L represents the N-terminal 34-aa leader peptide segment of SP, (d) Southern- and Northern-blot analysis of human

and CEA which is absent from the mature CEA and NCA placental DNA and RNA polypeptides. The segments la, lb etc. are referred to in Fig. 2. (b) Schematical drawing showing the possible arrangement of Human placental DNA, digested to completion the SP, structural domains, according to the scheme devised for CEA by Beauchemin et al. (1987). Each includes two Cys

using a variety of restriction endonucleases, was

residues (closed circles). The remaining Cys in SP, is predicted blotted using the Southern procedure and hybridized

to be lost from the mature protein on removal (arrowhead) of the with the 2016-bp pSP,-i cDNA insert as probe. The leader peptide. Open circles denote potential sites of N-linked result is shown in Fig. 4a. The complex hybridization glycosylation. pattern observed is similar to that obtained for an

NCA gene, a member of the CEA gene family (Thompson et al., 1987). The result suggests that the

sequence homology and contain a number of spe- SP, mRNA is transcribed from an extremely com-

cifically placed amino acid residues observed plex genetic locus and/or that its sequences are part

in the immunoglobulin-like protein superfamily of a multigene family. It is not known whether SP,

(L----FT--D-G-Y-C; Hunkapiller and Hood, nucleotide sequences cross-hybridize with those of

1976; Beauchemin et al., 1987). Interestingly, the CEA or NCA, but the high level (72% with CEA; see

segment lb-2a-2b of CEA has no counterpart in the RESULTS AND DISCUSSION, section b) of homology

SP, polypeptide sequence. suggests that this is likely. Cloning of these genomic DNA fragments will be necessary to determine the

(c) Comparison of SP, with other known protein structural organization of, and relationship between, sequences this family of genes.

Northern-blot analysis of total human placental

Only weak, but possibly interesting homologies RNA was performed using the pSP,-i cDNA insert.

were detected using computer-assisted searches tak- The result is shown in Fig. 4b. Watanabe and Chou

ing into account the presence of amino acids that are (1988) detected two mRNAs of approx. 1.7 and

is divided into an N-terminal domain (N) and C-terminal domain(C), the latter containing three segments (la, 2a and 2b; large numerals) two of which (la and 2a) are highly-homologous tandemly-repeated sequences of 93 aa residues. The CEA sequence includes six C-terminal segments forming three highly-homologous tandem repeats (la and lb, 2a and 2b, 3a and 3b; small numerals). The region corresponding to segments lb, 2a and 2b of CEA are missing from the SP, sequence, as is a small Was region near to the predicted C-terminus of the SP, polypeptide. Arrowheads indicate the positions of conserved Cys residues in CEA and SF’,. Identical aa are indicated with asterisks, and gaps by dashes.

446

a.

I kb 1234567 /kb 21123s

9.4 -

6.6 -

5.0/5.2= 4.3/4.4=

3.5 -

c3 - 1: = 8 1.6 - 1.4 -

1.0 - 0.8 -

9.4-

6.6-

5.0/5.2= 4.3j4.4=

3.5-

2.3- 2*0- 1 .Q- 1.6- 1.4-

1.0-- 0.8-

Fig. 4. Southern- and Northern-blot analyses ofplacental DNA and RNA using and SP,-XDNA probe. (Panel a) Southern-blot analysis

of human placental DNA (5 pg) digested with: (1)HindIII; (2)EcoRI; (3)BrI; (4)BamHI; (5)XbaI; (6)AccI, and (7) SacI. (Panel b) R, Northern-blot analysis of human placental RNA (3 pg). Arrows in panel b indicate positions of three major class sizes of RNA.

Markers (M) were denatured, radioactively labeled bacteriophage A DNA fragments, as on the left margin of panel a. Experimental details are described in MATERIALS AND METHODS, section b.

2.2 kb, whereas we detect three distinct size-classes of RNA with approximate sizes of 1.9, 2.1 and 2.3 kb. These species are close to the size of the cDNA inserts of pSP,-i(2016 bp), PSG16 (1900 bp) and PSG93 (2200 bp). Further work is required to establish the precise relationship between these RNA species and the cDNAs.

(e) Reia~onship of SP, with CEA and cross-reacting tumor antigens

This study draws attention to a clear relationship between SP, species, human CEA and nonspecitic CEA cross-reacting antigens. This is especially intriguing in view of a previous report that SF, and

CEA are immunolo~c~y distinct (Kaminska et al., 1979). The complexity of the Southern-blot analysis data indicates that there are probably several SP,- and NCA-related gene sequences in the human genome. Likewise, the Northern-blot analysis of human placental RNA using the SP, cDNA probe reveals the presence of three species of RNA with sizes similar to those detected by a CEA cDNA probe in human tumor-cell lines (Beauch.e~n et al., 1987). At least some of these sequences may repre- sent cross-hybridizing species, since the cDNAs for SP, and CEA are over 70% homologous over extensive regions. This is perhaps not surprising since evidence suggests there are at least seven distinct species of protein antigenicaliy reiated to CEA (sum-

441

marized in Thompson et al., 1987), apart from additional proteins such as SP, which do not interact with anti-CEA antibodies.

The study also emphasizes the structural complexity evident at the C tern&i of the SP, species (PSGl6, PSG93 and SP,-i) and CEA. Moreover, comparison of the coding sequence of SP, with the known sequences of CEA and NCA reveals the internal modular nature of these proteins (Fig. 2). Explanations for these structural observations, and for the unusual sequence relationships between the noncoding segments of SP, and CEA cDNAs referred to above presumably lies in the structure of the various genes and the way in which their mRNAs are processed.

(f) Possible functions of SP,

The function of SP, is not known, although several ~ssibiIities have been suggested, including a role in carbohydrate metabolism (Tatra et al., 1976), binding of steroid hormones and iron (Lm et al., 1974) and immune suppression, which may help to protect the fetus against immune rejection by the mother (Home et al., 1976b). It was anticipated that structural comparisons with other known proteins might shed some light on the nature of SP, and its possible functional role, both as a natural placental protein

essential. for fetal development and as a tumor marker associated with certain forms of malignancy.

It was therefore encouraging, although at the same time surprising, to End a close structural relationship between SP, and CEA antigens, hitherto considered to be a distinct group of oncodevelopment~ glycoproteins whose aberrant expression is likewise associated with the malignant phenotype. Different domains of SP, show weak, but possibly interesting, amino acid sequence alignments with N-CAM and two additional well-characterized cell adhesion molecules, human libronectin and vitronectin (Komblihtt et al., 1985; Suzuki et al., 1985). By analogy with the function of N-CAM this may point to a role for SP,/CEA-like proteins in mediating/- coordinating cell-cell interactions during embryo genesis. On this basis it could be speculated that the same property may in some way explain why their aberrant expression is associated with a malignant cell phenotype. The further ch~acte~ation these structur~ly complex proteins and their genes will

provide a firm basis for future studies specifically addressing the question of the function of this important group of tumor-associated antigens.

ACKNOWLEDGEMENTS

We are extremely grateful to Hans Bohn, Behringwerke AG, Marburg (F.R.G.) for a generous gift of purified placental SP,, and to Bryan Dunbar and John Fothergill, University of Aberdeen, for their help in sequencing SP, peptides. We are also indebted to Bob Cox and Mervyn Monteiro (NIMR, Mill Hill, London), to Roger SutclitTe and Debra Nickson (Genetics Institute, University of Glasgow) and to Andrew Coulson and colleagues (Department of Molecular Biology, University of Edinburgh) for valuable information on cDNA methodology, design of probe oligodeoxynuci~tides and initial sequence homology searches. We thank our colleagues Lee Gill, Anne Glover, Mairi Gordon, Domenico Ammaturo, Felix Businger and Ron de Winter for various contributions. B.C.R. was in receipt of a studentship from the University of Aberdeen Medi- cal Endowments Fund.

REFERENCES

Beauchemin, N., Benchimol, S, Coumoyer, D., Fuks, A. and Statmers, C.P.: Isolation and characterization of fall-length functional cDNA clones for human carcinoembryonic antigen. Mol. Cell. Biol. 7 (1987) 3221-3230.

Benton, W.D. and Davis, R.M. Screening of 1gt recombinant clones by hyb~disation to single plaques in situ. Science 196 (1977) 180-182.

Blattner, F.R., Blechl, A.E., Denniston-Thompson, K., Faber, H.E., Richards, J.E., Slightom, J.L., Tucker, P.W. and Smithies, 0.: Cloning human fetal gamma globin and mouse alpha-type globin cDNA: preparation and screening of shot- gun collections. Science 202 (1978) 1279-1284.

Bohn, H.: Detection and characterisation of pregnancy proteins in the human placenta and their quantitative ~~010~~~ detestation in sera from pregnant women. Arch. Gyngkot. 210 (1971) 440-457.

Dati, F., Grenner, G., Luben, G., Kapmeyer, W., Sieber, A. Bohn, H. and Bellman, 0.: Comparison of enzyme immuno- assays for SP, and AFP with other immunochemical methods. Ric. Clin. Lab. 12 (1982) 265-287.

Devereux, I. and Haeberli, P.: Introduction to the Sequence Analysis Software Package of the University of Wisconsin Genetics Computer Group, version 4. University of Wiscon- sin, Madison, WI (U.S.A.) 1986.

448

Engvall, E.: Pregnancy-specific bl glycoprotein: purification and

partial characterisation. Oncodevelop. Biol. Med. 1 (1980)

113-122.

Grudzinskas, J.G., Gordon, Y.B., Jeffrey, D. and Chard, T.:

Specific and sensitive determination of pregnancy-specific Bl

glycoprotein by radioimmunoassay. Lancet 1 (1977)

333-334.

Hemperly, J.J., Murray, B.A., Edelman, G.M. and Cunningham,

B.A.: Sequence of a cDNA clone encoding the polysialic

acid-rich and cytoplasmic domains of the neural cell adhesion

molecule N-CAM. Proc. Nat]. Acad. Sci. USA 83 (1986)

3037-3041.

Horne, C.H.W., Reid, I.N. and Milne, G.D.: Prognostic signifi-

cance of inappropriate production of pregnancy proteins by

breast cancers. Lancet II (1976a) 279-282.

Horne, C.H.W., Towler, C.M., Pugh-Humphreys, R.G.P.,

Thomson, A.W. and Bohn, H.: Pregnancy-specific 81 glyco-

protein: a product of the syncytiotrophoblast. Experientia 32

(197613) 1179-1199.

Hunkapiller, T. and Hood, L: The growing immunoglobulin gene

superfamily. Nature 323 (1986) 15-16.

Huynh, T.V., Young, R.A. and Davis, R.W.: Constructing and

screening cDNA libraries in lgtl0 and 1gtll. In Glover, D.A.

(Ed.), DNA Cloning, Vol. I. A Practical Approach. IRL

Press, Oxford, 1985, pp. 49-78.

Inaba, N., Renk, T., Ax, W., Schlottler, S., Weinmann, E. and

Bohn, H.: Possible synthesis of pregnancy-specific /II glyco-

protein and pregnancy-specific tissue proteins (PP,, and

PP,,) by human and cynomolgus monkey leukocytes. Acta

Haematol. 66 (1981) 35-38.

Jandial, V., Towler, C.M., Horne, C.H.W. and Abramovich,

D.R.: Plasma pregnancy-specific 81 glycoprotein in compli-

cations of early pregnancy. Br. J. Obstet. Gynaecol. 85 (1978)

832-838.

Kaminska, J., Calvert, I. and Rosen, S.W.: Radioimmunoassay

of pregnancy-specific 81 glycoprotein (SP,). Clin. Chem. 25

(1979) 577-580.

Kornblihtt, A.R., Umezawa, K., Vibe-Pedersen, K. and Barahe,

F.E.: Primary structure of human tibronectin: differential

splicing may generate at least 10 polypeptides from a single

gene. EMBO J. 4 (1985) 1755-1759.

Kuhajda, F.P., Bohn, H. and Mendelsohn, G.: Pregnancy-

specificpl glycoprotein (SP,)in breast carcinoma: pathologic

and clinical considerations. Cancer 54 (1984) 1392-1396.

Kozak, M.: Compilation and analysis of sequences upstream

from the translational start site in eukaryotic mRNAs.

Nucleic Acids Res. 12 (1984) 857-872.

Lee, J.N., Salem, H.T.,Chard,T., Huang, S.C. and Ouyang, PC.:

Circulating placental proteins (hCG, SP, and PP5) in

trophoblastic disease. Br. J. Obstet. Gynaecol. 89 (1982)

69-72. Lin, T.M., Halbert, S.P., Kiefer, D., Spellacy, W.N. and Gall, S.:

Characterisation of four pregnancy-associated plasma pro-

teins. Am. J. Obstet. Gynaecol. 118 (1974) 223-236.

Maniatis, T., Frisch, E.F. and Sambrook, J.: Molecular Cloning.

A Laboratory Manual, Cold Spring Harbor Laboratory, Cold

Spring Harbor, NY, 1982, pp. 202-203.

Oikawa, S., Imajo, S., Noguchi, T., Kosaki, G. and Nakazato,

H.: The carcinoembryonic antigen contains multiple immuno-

globulin-like domains. Biochem. Biophys. Res. Commun. 144

(1987) 634-642.

Paxton, R.J., Mooser, G., Pande, H., Lee, T.D. and Shively, J.E.:

Sequence analysis of carcinoembryonic antigen: identifi-

cation of glycosylation sites and homology with the immuno-

globulin supergene family. Proc. Natl. Acad. Sci. USA 84

(1984) 920-924.

Rigby, P.W.J., Dieckmann, M., Rhodes, C. and Berg, P.:

Labelling DNA to high specific activity in vitro by nick trans-

lation with DNA polymerase I. J. Mol. Biol. 113 (1977)

237-239.

Rink, H., Liersch, M., Sieber, P. and Meyer, F.: A large fragment

approach to DNA synthesis: total synthesis of a gene for the

protease inhibitor Eglin C from the leech Hirudo medicinalis

and its expression in E. coli. Nucleic Acids Res. 12 (1984)

6369-6387.

Rooney, B.C.: Molecular Studies of Schwangerschaftsprotein 1

(SP,). Ph. D. Thesis, University of Aberdeen, 1987.

Russell, G.A., Dunbar, B. and Fothergill-Gilmore, L.A.: The

complete amino acid sequence of chicken skeletal-muscle

enolase. Biochem. J. 236 (1986) 115-126.

Sanger, F., Nicklen, S. and Coulson, A.R.: DNA sequencing with

chain-terminating inhibitors. Proc. Natl. Acad. Sci USA 74

(1977) 5463-5467.

Seppala, M., Rutanen, E.M., Heikinheimo, M., Jalanko, H. and

Engvall, E.: Detection of trophoblast tumor activity by

pregnancy-specific #I1 glycoprotein. Int. J. cancer 21 (1978)

265-267.

Smith, R., Klopper, A., Hughes, G. and Wilson, G.: The com-

partmental distribution of oestrogens and pregnancy-specific

81 glycoprotein Br. J. Obstet. Gynaecol. 86 (1979) 119-124.

Sorensen, S.: Pregnancy-specific /?l glycoprotein (SP,): purifi-

cation, characterisation, quantification and clinical applica-

tion in maligancies (a review). Tumour Biol. 5 (1984)

275-302.

Southern, E.M.: Detection of specific sequences among DNA

fragments separated by gel electrophoresis. J. Mol. Biol. 98

(1975) 503-517.

Suzuki, S., Oldberg, A., Hayman, E.G., Pierschbacher, M.D. and

Ruoslahti, E.: Complete amino acid sequence of human

vitronectin deduced from cDNA similarity of cell attachment

sites in vitronectin and fibronectin. EMBO J. 4 (1985)

2519-2524.

Tatarinov, Y.S. and Masyukavich, V.N.: Immunological detec-

tion of a new /31-globulin in the blood sera ofpregnant women.

Byull. Eksp. Biol. Med. 69 (1970) 66-68.

Tatarinov, Y.S., Masyukavich, V.N., Nikulina, D.M., Novikova,

L.A., Toloknov, B.O. and Falaleeva, D.M.: Immunological

detection of a new /?l-globulin in the pregnancy-zone in serum

of patients with trophoblastic tumors. Int. J. Cancer 14 (1974)

548-554.

Tatra, G., Tempfer, B. and Placheta, P.: Influence of blood

glucose levels on serum concentrations of pregnancy-specific

protein SP, and HPL during the last trimester of pregnancy.

Eur. J. Obstet. Gynaecol. Reprod. Biol. 6 (1976) 53-67.

Teisner, B., Westergaard, J.G., Folkersen, J., Husby, S. and

Svehag, S.E.: Two pregnancy-associated serum proteins with

449

pregnancy-specific glycoprotein determinants. Am. J. Obstet. Gynaecol. 131 (1978) 262-266.

Than, G.N., Csaba, L.F., Bohn, H., Szabo, D.G., Szalmasy, MS. and Menczer, G.: Monitoring therapy in trophoblastic dis- eases by radioimmunoassay of pregnancy-specific Bl-glycoprotein and the B-subunit of human chorionic gonadotrophin. Oncodevelop. Biol. Med. 3 (1982) 315-323.

Thompson, J.A., Pande, H., Paxton, R.J, Shively, L., Padina, A., Simmer, R.L., Todd, C.W., Riggs, A.D. and Shively, J.E.: Molecular cloning of a gene belonging to the carcinoembryonic antigen gene family and discussion of a domain model. Proc. Natl. Acad. Sci. USA 84 (1987) 2965-2969.

Ullrich, A., Shine, J., Chirgwin, J., Pictet, R., Tischer, E., Rutter, W.J. and Goodman, H.M.: Rat insulin genes: construction of

plasmids containing the coding sequences. Science 196

(1977) 1313-1319. Wahl, G.M., Stern, M. and Stark, G.R.: Efllcient transfer oflarge

DNA fragments. Proc. Natl. Acad. Sci USA 76 (1979) 3683-3687.

Watanabe, S. and Chou, J.Y.: Isolation and characterization of cDNAs encoding human pregnancy-specific Bl-glycoprotein. J. Biol. Chem. 263 (1988) 2049-2054.

Zimmerman, W., Ortlieb, B., Friedrich, R. and Kleist, S.V.: Isolation and characterization of cDNA clones encoding the human carcinoembryonic antigen reveal a highly-conserved repeating structure. Proc. Natl. Acad. Sci. USA 84 (1987) 2960-2964.

Communicated by T.A. Bickle.

Molecular cloning of a cDNA for human pregnancy-specific β1-glycoprotein: homology with human...

Documents

Transcript of Molecular cloning of a cDNA for human pregnancy-specific β1-glycoprotein: homology with human...