Primary structure of an EcoRI fragment of λimm434 DNA containing regions cI-cro of phage 434 and...

15
Gene, 6 (1979) 235--249 235 © Elsevier/North-HollandBiomedicalPress, Amsterdam-- Printed in The Netherlands PRIMARY STRUCTURE OF AN EcoRI FRAGMENT OF ~imm434 DNA CONTAINING REGIONS cI-cro OF PHAGE 434 AND cH-O OF PHAGE LAMBDA (Restriction endonucleases; bacteriophage genes; nucleotide sequence; promo- ters; operator; E. coli) Yu.A. OVCHINNIKOV,S.O. GURYEV, A.S. KRAYEV, G.S. MONASTYRSKAYA, K.G. SKRYABIN,E.D. SVERDLOV,V.M. ZAKHARYEVand A.A. BAYEV M.M. Shemyakin Institute of Bioorganic Chemistry, and Institute of Molecular Biology, USSR Academy of Sciences, 117312, Moscow (U.S.S.R.) (Received November8th, 1978) (Revisions received April 2rid, 1979) (Accepted April 6th, 1979) SUMMARY Digestion of phage Ximm434 DNA with restriction endonuclease EcoRI yields 7 fragments. The shortest among them (1287 bp) contains the fight part of tl~e phage 434 immunity region and the phage DNA portion proximal to it. The complete primary structure of this fragment has been determined using the chemical method of DNA sequencing. Hypothetical amino-acid sequences of proteins coded by the cro gene of phage 434 and the cH gene of phage X, as well as NH2-terminal amino-acid sequences of the cI protein of phage 434 and the O protein of phage ),, have been deduced solely on the basis of the DNA sequence. The fragment studied contains also the PR and probably Prm promoters and the OR operator of phage 434. The sequence coding for them differs from the respective DNA sequence of phage ),. INTRODUCTION A comparatively short segment of the ), phage genome contains genes which code for a number of regulatory proteins, as well as the sites where these proteins and the RNA polymerase bind to DNA (Fig.l; for review see, Hersko- witz, 1973). This region contains an early promoter, PR, together with gene cro which it controls, and the OR operator regulated by the cI and cro repres- sots (Johnson et al., 1978). Next to the PR is another promoter, Prm, which governs the transcription of gene cI. The tR 1 terminator, between genes cro and cII is antagonized by the N gene product, thus allowing the transcription Abbreviations: bp, base pairs; PAAG, polyacrylamidegel.

Transcript of Primary structure of an EcoRI fragment of λimm434 DNA containing regions cI-cro of phage 434 and...

Gene, 6 (1979) 235--249 235 © Elsevier/North-Holland Biomedical Press, Amsterdam -- Printed in The Netherlands

PRIMARY STRUCTURE OF AN E c o R I FRAGMENT OF ~ i m m 4 3 4 DNA CONTAINING REGIONS cI-cro OF PHAGE 434 AND cH-O OF PHAGE LAMBDA

(Restriction endonucleases; bacteriophage genes; nucleotide sequence; promo- ters; operator; E. coli)

Yu.A. OVCHINNIKOV, S.O. GURYEV, A.S. KRAYEV, G.S. MONASTYRSKAYA, K.G. SKRYABIN, E.D. SVERDLOV, V.M. ZAKHARYEV and A.A. BAYEV

M.M. Shemyakin Institute of Bioorganic Chemistry, and Institute of Molecular Biology, USSR Academy of Sciences, 117312, Moscow (U.S.S.R.) (Received November 8th, 1978) (Revisions received April 2rid, 1979) (Accepted April 6th, 1979)

SUMMARY

Digestion of phage X i m m 4 3 4 DNA with restriction endonuclease E c o R I yields 7 fragments. The shortest among them (1287 bp) contains the fight part of tl~e phage 434 immunity region and the phage DNA portion proximal to it. The complete primary structure of this fragment has been determined using the chemical method of DNA sequencing. Hypothetical amino-acid sequences of proteins coded by the cro gene of phage 434 and the cH gene of phage X, as well as NH2-terminal amino-acid sequences of the cI protein of phage 434 and the O protein of phage ),, have been deduced solely on the basis of the DNA sequence. The fragment studied contains also the PR and probably Prm promoters and the OR operator of phage 434. The sequence coding for them differs from the respective DNA sequence of phage ),.

INTRODUCTION

A comparatively short segment of the ), phage genome contains genes which code for a number of regulatory proteins, as well as the sites where these proteins and the RNA polymerase bind to DNA (Fig.l; for review see, Hersko- witz, 1973). This region contains an early promoter, PR, together with gene cro which it controls, and the OR operator regulated by the cI and cro repres- sots (Johnson et al., 1978). Next to the PR is another promoter, Prm, which governs the transcription of gene cI. The tR 1 terminator, between genes cro and cII is antagonized by the N gene product, thus allowing the transcription

Abbreviations: bp, base pairs; PAAG, polyacrylamide gel.

236

0..0 .~ .2 .3 ..4 .5 .6 .7 .8 .9 !.0

A E V K J art exo I i 0 P Q SR w

, I imn~.34

I I ' I I '"] - - " I

A E B C G D .F

Irzg.1. The EcoRI restriction endonucle~e cleavage map of Mmm434 DNA. The boundary of the immunity region from phage 434 is indicated with dashed lines. Numbers designate the fractional length of the Mmm4 3 4 DNP- qqze sites of cleavage with EcoRI are shown with arrows (Sverdlov et al., 1978). The fragment which w u sequenced in this work is marked. The approximate position of genes is given below the map.

of the genes ¢H, O, P and Q. Still another promoter, Po, which initiates the synthesis of a short oop-RNA, is located within this region (Hayes and Szybalski, 1973).

The primary structure of PR, Pnn and o R of phage X is known (Maniatis et al., 1975a, b; Pirrotta, 1973; Walz and Pirrotta, 1975; Ptaslme et al., 1975; Smith et al., 1976; Humayun et al., 1977). Considerable progress has been achieved in establishing the sequence of the cI (Ptashne et al., 1975; Sauer and Anderegg, 1978) and cro proteins (Hsiang et al., 1977) of phage ).. The data are also available about the complex interaction between the cI and cro re- pressors, the RNA polymerase and the promoter-operator regions of phage X (Humayun et al., 1977; Johnson et al., 1978). In order to infer how essential these structural features of X are, we have directed our efforts towards study- ing the right part of the immunity region in phage 434 which is closely re- lated to phage ~,. This paper presents the structure of a DNA fragment which most probably contains the PR and Prm promoters, the OR operator, the NH2- terminal part of the cI gene, and the complete ¢ro gene of phage 434, as well as the structure of the y region, the vii gene, and the NH2-terminal part of the O gene of phage ~. A portion of the sequence encompassing about 300 nucleotides from the i m m u n i t y region of phage 434 was published in our previous work (Bayev et al., 1978). That part of the sequence derived from phage ~ agrees with the data published at the time when our work had been completed (Schwartz et al., 1978; Rosenberg et al., 1978; Scherer, 1978). Phage ), DNA region from 1187 to 1287 bp was also sequenced by Denniston- Thompson et al.. (1977).

MATERIALS AND METHODS

DNA. Phage Mmm434 DNA was prepared as described previously (Sverdlov et al., 1977).

Enzymes. Polynucleotide kinase was isolated according to Richardson (1971). T4 DNA po!ymerase was prepared according to Lehman (1974) and

237

contained no contaminating phosphatase or nuclease activities. Restriction endonucleases HindII (Bickle et al., 1977), BglII (Bickle et al., 1977), TaqI (Sato et al., 1977) were isolated as described in the original papers. Restriction endonuclease HpaII was purchased from Miles Lab., Inc. The enzymes did not possess any nuclease activities which could have interfered with the sequencing. Sau3A restriction endonuclease was kindly provided by Prof. H. Zadian.

The EcoRI-G fragment of ~ imm434 DNA was isolated as described previous- ly (Sverdlov et al., 1978).

Digestion of the EcoRI-G fragment by various restriction endonucleases. Digestion with HindII and BglII was performed in 0.01 M Tris- HCI, 0.01 M MgCI2, 0.01 M mercaptoethanol, pH 7.5, at 37°C for 2 h, using 2 units of the enzyme activity per I ~g of DNA. Cleavage with TaqI was carried out likewise but the incubation temperature was 70°C. Cleavage with HpaII was performed in 0.01 M Tris- HCi, 0.01 M MgCI2, 0.006 M KCI, and 0.001 M dithiothreitol, pH 7.4. The completeness of digestion was checked using electrophoresis on 8% polyacrylamide gel.

Sequencing procedures (see Fig.2). All the sequence data were obtained using the method of partial chemical degradation (Maxam and Gilbert, 1977), slightly altered (Skryabin et al., 1978). Partial depurination in 60% formic acid was used to establish the positions of purine bases (Sverdlov et al., 1973; Korobko and Grachev, 1977). If necessary, the complementary strands of labeled fragments were separated according to Szalay (Szalay et al., 1977).

RESULTS AND DISCUSSION

The EcoRI-G fragment of phage Ximm434 DNA contains the right part of the immunity region from phage 434 and the adjacent part from phage ~,. The results of restriction endonuclease cleavage are summarized in Fig.3. The frag- ment does not contain sites for restriction endonucleases HaeIII, BamHI and XbaI.

In all cases, the molecular weights of fragments were estimated from their mobility on 8% polyacrylamide gel relative to marker fragments produced upon digestion of the p y l r A 3 plasmid DNA (Petes et al., 1978) with EcoRI. When the EcoRI-G fragment was 32P-labelled at the 5'-ends and then cleaved with BglII, the label was found only in the fragments BglII-868 and BglII-355 but not in BglII-60 (Fig.3), indicating that the former two were terminal. Likewise it was shown that the terminal fragments in the TaqI digest were TaqI-512 and TaqI-414. Direct sequence analysis revealed that BglII-868 and TaqI-512 shared the same end of the original EcoRI-G, as did the BglII-355 and TaqI-414. A total of 320 nucleotides were sequenced starting from one of the EcoRI !~elled ends of the original polymer (Figs. 2a and 3b), and 270 nucleotidas from the other end ~ . 2b and 3c).

The next step in analysing the stzucture was sequencing of the two internal fragments, TaqI-196 and TaqI-l~5. The original fragment was digested with TaqI, and the digest was terminally labelled with [~.32p]-ATP, using poly-

238

Fig.2. Autoradiograms of electrophoretieally separated products obtained by incomplete chemical degradation of Bgl-870 (a) and Bgl-360 (b); see Fig. 3c. The whole fragment EcoRI-G had been 5'-terminally labelled with polynucleotide kinase and [~.a2p. ] ATP: 60 ;,g of the fragment were incubated with 40 units of T4 polynucleotide kinase and 0.66 mCi [7-~P]ATP (3000 Ci/mmol, Amersham) in 100 #1 of 0.05 M Tris. HCI (pH 9.5) 0.01 M MgCI s and 5 mM D'J['r for 60 mjn at 87"C. The labelled fragment was ethanol-pre- cipitated and cleaved with restriction endonuclease Bg|IL The resulting incubation mix- ture (200 #1) was directly loaded onto 8% polyacrylamide, gel (PAAG) and the labelled separated fragments were eluted, ethanol-precipitated and subjected to chemical cleavage reactions as described in MATERIALS AND METHODS. The degradation products were loaded on 40 x 20 × 0.15 cm slab gel of 10% PAAG]7 M urea and electrophoresed For 12 h at 800 V.

239

Eco R1

t !

>- I I I I I I I

# I I

, = i m r ~ 3 4

, ,

I P R ° R I

I

---...[

I c,ol :

* > I I

t l I I I I I I It IJ f~ t ;

i;_ Ic'-- I I i , k ;T

nl I - - - -

I I I I |

I ! I I I I l l I I I c l l l . I Po I I I

I ~ t I % P . - - - - % t - - * - ~

_ . . j i i - - I I I I I l i i I I I

I , I

I II

i ,, g _ ! ;~ I I l I I l i e , ,r- T , ,T

' 44 I J ~ T J-L--

I i . . . . t - - - t - ~ - - - - - - - - - ~ • I , l _ £ h

I , I T - [

----4

T I i I I '~.._

I I I

I ' I

- - - - - - i - - - - - I r I - I - I t l t

.L I I k~p-

oi--4- ' . . . . I

J . _ I T- ' - - 1 I-- ~p

I

t I I I I ~ . .~

q ~ i l L . . 1 I T I I

L . ~ - -

EcoR1

I I I

I I I

I I I

I

I ~ g

Fig.3. (a) The map for digestion of the EcoRI-G fragment with restriction endonucleases, and the scheme explaining the strategy used to determine the entire sequence of EcoRI-G. The total length of the fragment is 1287 b.p. Symbols with vertical arrows indicate the sites of cleavage with various restriction endonucleases: o, HpaII; e, TaqI; =, HindII; I~,BglII + Sou3A; ×, Sau3A. Broadopen horizontal arrows designate the origin and direc- tion of transcription from the Prm, PR and Po promoters, o R is an operator; cro, cII and O are genes. Horizontal arrows b--r indicate fragments and their regions whose sequences were read off from individual gels (repeats are not shown). Rifhtward arrows correspond to the I strand and leftward arrows to the r strand. Solid lines designate regions of the respective fragments whose sequences have been read off unambiguously in a particular type of experiment. Solid circles at the ends of the fragments mark the 5'-32P-labeled termini. Solid squares mark the 3'-termini, labelled with T4 DNA polymerase.

nucleotide kinase. The mixture was separated on 8% polyacrylamide gel, and complementary strands of fragments TaqI-196 and TaqI-165 were sequenced (Fig.3d and e). Fragment TaqI-196 was found to contain the BglII restriction site close to one of its ends. This finding (a) indicates that this fragment comes next to TaqI-414 in the EcoRI-G and (b) makes it possible to choose between two alternative orientations of TaqI-196 with respect to BglII, so that the evd with the proximal BglII site would be directed towards BglII-410. Thus, the TaqI and BglII cleavage sites are arranged as shown in Fig.3a. The information about orientation of the TaqI-185 f~agment was obtained from studying the respective position of the HindII sites. The EcoRI-G fragment contains only one HindII cleavage site. Sequence data located this site in the Taq-165 fragment at a distance of 17 bp from one of its ends; moreover, the Bg/II-868 fragment contains two HpaII sites at a distance of 216 and 311 bp from its Eco RI end, while the TaqI-196 fragment also contains two HpaII sites. Therefore, the

240

HpalI sites are arranged with respect to the BglII sites as shown in Fig.3a. The data on the double digestion with HpaII and HindII, showed that the HindII rite is located in the HpaII-412 fragment so that it is divided into two parts ca. 350 and 60 bp long. This can be possible only if the end of Taq-165 proxi- mal to the HindII site is directed towards Taql-196 as shown in Fig.3a. In this

AATTCTTTTGCTTTTTACCCTGGAAGAAATACTCATAAGCCACCTCTGTTATTTACCCCCAATCTTCACA TTAACAAAACGAAA~ATGCCACCTTCTTTATGAGTATTCGCTGGAGACAATAAATGGCGCTT~GTGT I I eArgLy sSerLy sValArgSer S er I1 eSer fmet ee~.ee

Cf O

AGAAAAACTGT ATTT GACAK~C AAGATACATTGTATGAAAATAC AAGAA~TI~T TGTTGAT~I~C GATA TCTTTTTGACATAAACTGTTTCTTCTATGT AACATACTTTTATGTTCTTTCAAA AACTACCTCCGCTAT

£metGlnThrLeuS erGluArgLeuLy sly sat gAr gI leAlaLeuLy sMetThrGlnTh~-GluLeuAlaThr T~aCAAACTCTTTCTGAACGCCTCAAGAAGAGGCGAATTGCGTTAAAAATGACGCAAACCGAACTGGCAAC ACGTTTGAGAAAGACTTCCGGAGTTCTTCTCCGCTTAACGt~AATTTTTACTGCGTTTGGCTTGACCGTTG

Ly sAlaGlyValLysGlnGlnSer I 1 eGlnLeuI 1 eGluAlaGlyValThrLy sAr gPr oArgPheLeu CAAAG C~TGTTAAACAGCAATCAATTCAACTGATTGAAGCTGGAGTAACCAAGCGACCGCGCTTCTTG GTTTCCCTACAATTTCTCGTTAGTTAAGTTGACTAACTTCGACCTC&TTCGTTCCCTGGCGCGAAGAAC

PheGlulleklaMetAlaLeuAsnCysAspProValTrpLeuGlnTyrGlyThrLysArgGlyLysAla TTTGAGATTGCTATGGCGCTTAAC. TG~AT~TTTCGTTACAGTACGCAACTAAACGCGCTAAACCCG AAACTCTAACGATACCGCCAATTGACACT~AAACCAATGTCATGCCTTGATTTGCGCCATTTCGGC

imm 434 AlaTer

CTTAAGACATTCCCGCTCTTACACATTCCACCCCTGAAAAAGGGCAT~TTAAACCACACCTATGGTG GAATTCTGTAAGGCCGAGAATGTGTAAGGTCCGGACTTTTTCCCGTAGTTTAATTTGCTGTGCATACCAC

. ***** fmetValArgAlaAsn TATGCATTTATTTGCATACATTCAATCAATTGTTATCTAAGGAAATACTTACATATGGTTCGTGCAAACA ATACGTAAATAAACGTATGTAAGTTAGTTAACAATAGATTCCTTTATGAATGTATACCAAGCACGTTTGT

Ly sat gAsnGluAlaLeuArgI 1 eG luSerAlaLeuLeuAsnLysI1 eAlaMetLeuGlyThrGluLy sThr AACGCAACGACGCTCTACGAA1~.AGACTGCGTTGCTTAACAA/~TCGCAATGCTTGGAACTGAGAAGAC TTGCGTTGCTCCGAGATGCTTAG~CTCACCCAACGAATTGTTTTAGCGTTACCAACCTTGACTCTTCTG

AlaGluAlaValGlyValAspLysSerGlnlleSerAcgTrpLy sArgAspTrplleProLy sPheSer AG CGGAAGCTGTCGGCCTTGATAAGTCGC~GCAGGTGGAAGAGCGACTGGATTCCAAAGTTCTCA TCGCCTTCGACACCCGCAACTATTCAGCGTCTA~CGTCCACCTTCTCCCTGACCTAAGGTTTCAAGAGT

MetLeuLeuAlaValLeuGluTrpGlyValValAspAspAspMetAlaAr gLeuA1 aAr 8GlnValA1 a

241

way unambiguous orientation of all of the fragments has been obtained. The TaqI-196 fragment was found to contain a sequence coding for oop-RNA. Since oop-RNA is known to be transcribed leftward, towards the immunity region (Honigman et al., 1976), this finding definitely indicates that the end

AlalleLeuThrAsnLysLysArgProAlaAlaThrGguArgSerGluGlnlleGlnMetGluPheTer CGATTCTCACCAATAAAAAACGCC~GCGGCAACCGAGCGTTCTGAACAAATCCAGATGGAGTTCTGAGG GCTAAGAGTGGTTATTTTTTGCGGG{~GCCGTTGGCTCGCAAGACTTGTTTAGGTCTACCTCAAGACTCC

. . . . . fmeCThrAsnThrAlaLysIleLeuAsnPheGlyArgGly TCATTACTA~ATCAACAGGAGTCATTATGACAAATACAGCAAAAATACTCAACTTCGGCAGAGGTA AGTAATGACCTA~TAGTTGTCCTCAGTAATACTGTTTATGTCGTTTTTATGAGTTGAAGCCGTCTCCAT

AsnPheAlaGlyGlnGluArgAsnValAleAspLeuAspAspGlyTyrAlaArgLeuSerAsnMeCLeuLeu ACTTTGC~GACAGGAGCGTAATGTGGC~ATCI~ATGATGGTTACGCCAGACTATCAAATATGCTGCT TGAAACG~TGTCCTCGCATTACACCGTCTA~G~ACTACCAATGCGGTCTGATAGTTTATACGACGA

GluAlaTyrSerGlyAlaAspLeuThrLysArgGlnPheLysValLeuLeuAlalleLeuArgLysThr TGAGGCTTATTCGGGCGCA~ATCTGACCAAGCGACAGTTTAAAGTGCTGCTTGCCATTCTGCGTAAAACC ACTCCGAATAAGCCCGCGACTA~CTGGTTCGCTGTCAAATTTCACGACGAACGGTAAGACGCATTTTGG

TyrGlyTrpAsnLysProMeCAspArglleThrAspSerGlnLeuSerGlulleThrLysLeuProVal TATGGGTGGAATAAACCAATGGACAGAATCACCGATTCTCAACTTAGCGAGATTACAAAGTTACCTGTCA ATACCCACCTTATTTGGTTACCTGACTTAGTGGCTAAGAGTTGAATCGCTCTAATGTTTCAATGGACAGT

LysArSCysAsnGluAlaLysLeuGluLeuValArgMetAsnllelleLysGlnGlnGlyGlyMeCPheGly AACGGTGCAATGAAGCCAAGTTAGAACTCGTCAGAATGAATATTATCAAGCAGCAAGGCGGCATGTTTGG TTGCCACGTTACTTCGGTTCAATCTTGAGCAGTCTTACTTATAATAGTTCGTCGTTCCGCCGTACAAACC

ProAsnLysAsnlleSerGluTrpCyslleProGlnAsnGluGlyLysSerProLysThrArgAspLys ACCAAATAAAAACATCTCAGAATGGTGCATCCCTCAAAACGAGGGAAAATCCCCTAAAACGAGGGATAAA TGGTTTATTTTTGTAGAGTCTTACCACGTAGGGAGTTTTGCTCCCTTTTAGGGGATTTTGCTCCCTATTT

ThrSerLeuLy sLeuGlyAspCysTyrProS erLy sGlnGlyAspThrLy sAspThr II eThrLy sGlu ACATCCCTCAAATTGGGGGATTGCTATCCCTCAAAACAGGGGGACACAAAAGACACTATTACAAAAGAAA TGTAGGGAGTTTAACCCCCTAACGATAGGGAGTTTTGTCCCCCTGTGTTTTCGTTGATAATGTTTTCTTT

Ly skrgLy sAspTyrS erS erG luAsn AAAGAAAAGATTATTCGTCAGAGAATT TTTCTTTTCTAATAAGCAGTCTCTTAA

Fig.4. The complete sequence of the £coRI-G fragment. The upper chain corresponds to the I strand. The boundaries of the immunity region in phage 434 are shown (Schwarz et al., 1978) as well as the sites of endonuclease cleavages (for designations see Fig.3). Asterisks indicate nucleotides complementary to the 3'-terminus of 168 RNA. The initia- tion codons are underlined. The translation termination codons are marked Ter. Peptides corresponding to these nucleotide sequences are shown above the non-coding strand.

242

of Bg/H-870 contains a portion of the immunity region from phage 434, and orients the restriction map of the fragmen~ with respect to the genetic map of phage Ximm434.

Determination of the complete sequence of the EcoRI-G fragment was based on the map thus obtained and consisted of the steps described in the legend to Fig.3.

The complete sequence of the f~gment from the EcoRI digest of phage Ximm434 DNA is shown in Fig.4. The upper strand of the fragment is the l strand.

The data obtained were analysed using a computer for sites of digestion by other restriction endonucleases..4mong the computer determined sites two were found to be specific for AluI and having positions 250 and 567 bp from the left terminus and five sites specific for Sau3A having positions 308, 591, 779, 869 and 929.

The experimental findings on the digestion of the fragment by these endo- nucleases are in accord with the computer analysis data. Some Sau3A digestion fragments were sequenced for verification of the obtained data (Fig.3).

(1) The structure of the region corresponding to the immunity region of phage 434

As follows from the comparison of our sequence data containing the right part of the immunity region from phage 434 with that of the phage X DNA fragment (Schwarz et al., 1978), the borderline of the phage 434 immunity region in the G fragment is located at 361 bp from its left end. If the structural- functional organization of phage 434 DNA is s'unilar to that of phage X, one might expect the portion of the fragment which contains the segment of the immunity region to comprise the PR and Prm promotem, the oR ~perator, t~,e NH2-terminal fragment of the cI gene, and the complete cro gene of phage 434. Arguments in favour of this conclusion are presented below.

(a) When the G fragment is used as a template for RNA synthesis directed by E. coU RNA polymerase in the absence of the p factor, then oop-RNA and heterogeneous RNA of a much higher molecular weight are produced. Never- theless, initiation , i.e. formation of the first phosphodiester bond, requires two triphosphates, GTP and UTP (Sverdlov et al., 1978). Synthesis conducted in the presence of these two triphosphates yielded, together with products corresponding to the 5'-terminal region of oop-RNA, a longer oligonucleotide with the structure pppGUUUGUUG (data to he published). The sequence complementary to this oligonucleotide is contained in the I strand of the phage 434 immunity region. The oligonucleotide is t.~nscribed, therefore, from the r strand and the orientation of the transcription coincides with that of mRNA transcribed from the PR promoter (Herskowitz, 1973). Apparently, this oli- gonucleotide is the 5'-terminal portion of the respective phage 434 mRNA. The data of electron microscopy (to be published) suggest that this portion of the fragment includes a site which associates tightly with RNA polymerase. The positions of the Prm promoter and of the cI gene are hypothetical.

243

(b) 10 nucleotides downstream from the 5'-terminus of this PR transcript is the sequence GGAGG complementary to the 3'-terminal region of 16S RNA which is believed to be essential for the firm binding of a ribosome to form the initiation complex for translation (Shine and Dalgamo, 1974; Steitz and Jakes, 1975). This sequence is indeed followed, at a distance of 4 nucleo- tides, by the initiation codon ATG. Both sequences must be present in the respective mRNA transcribed from the complementary r strand. An in phase termination codon TAA is located farther downstream, after 71 codons. Therefore, this region comprises all elements which are now believed to be necessary for the synthes~ of a short protein of 71 amino acids. The synthesis of such a protein is indeed observed when the G fragment is used as a tem- plate in a coupled cell-free transcription-translation system (data to be published).

Screening of other possible coding frames reveals a great number of nonsense codons in the I strand. Additional evidence that the cro-gene is present in this fragment is the immunity behaviour of E. co l i which carries plasmids with G-fragment (Sverdlov et al., to be published).

(c) The cro protein of phage ~ is a peptide 66 amino acid residues long (Hsiang et al., 1977; Roberts et al., 1977; Schwarz et al., 1978). It is termi- nated close to the right border of the immunity region like the peptide discussed here. Consequently, by analogy with the ?, cro gene, one may deduce that the DNA fragment from the immunity region of phage 434 coding for a peptide made of 71 amino acids is the cro gene of phage 434. The sequence of this hypothetical cro protein based on DNA sequence data is given in Fig.4. The c r o protein of phage 434 has been recently isolated (Aono and Horiuchi, 1977). Its molecular weight as determ_ined by eiectrophoresis in polyacryl- amide gels exceeds that of the cro protein from phage ?, (Takeda et al., 1977), and this fact is consistent with our data. The amino acid sequences of the NH2-terminal part of ¢I and of the cro protein of phage 434 differ consider- ably from those of phage ),.

(d) The left end of the G fragment is separated by 40 nucleotides from the sequence GAGGTG in the r strand complementary to the 3'-terminus of 16S RNA and adjacent to the initiation codon ATG. The mRNA containing them must be transcribed from the I strand like the mRNA of the cI repressor gene. That is why the sequence following them may code for the phage 434 cI re- pressor whose hypothetical NH2-term!nal sequence is also given in Fig.4.

(e) The relative positions of the hypothetical cI and cro repressor genes in phage 434 are very similar to that in phage ?, as can be seen in Fig.5. The distance between the origin of the hypothetical c r o and of the cI genes in phage 434 is only by three nucleotides longer than in phage ?,. The initiation codons of both repressors in the DNAs of phage ~, and phage 434 are preceded by comparatively long sequences complementary to the 3'-terminus of 16S RNA.

(f) Regions between the two genes in ~, DNA and 434 DNA differ signifi- cantly in their sequences (Fig.5). Studies of the structure of the promoter-

• nail oao aq'4 to

uo!ld.uomm

n ~jo uo

.u~

pros ~

..uo

aq~ ~

oq

s --,ozm

pelUO

Z~S. OH -paxoq ere .ram

s 6.tlemm

gs ~p

zo p

uo

au m ql.us saouenoas "pgp

pure "~ ml~

qd

]o V

bI(l a~

u! suo.~l~u :t~lomo.sd lq~.u eq~l ;]o w

ouantms to

uo

sFm

dm

eo "g'~.~l

ulo:~m

~

oao

] i

] ----

nlO

~m

~

o~o

245

operator region located between the cro and cI genes of phage }, (Ptashne et a l , 1976; Johnson et al., 1978) revealed the existence of three symmetrical sequences with a rather high GC content which are separated by segments rich in AT. Apparently, these sequences correspond to the three regions of the OR operator.

We have not found a similar organization in the appropriate region of phage 434 which is characterized by the high overall content of AT pairs, i.e. 40% in the left part and 88% in the right one (Fig.5). Actually, some symmetrical sequences are in fact observed, but they differ from those of phage ),.

While the PR promoter of phage ;~ contains the so called Pribnow box (Pribnow, 1975), that of phage 434 does not.

We do not yet know the laws governing the specificity of interaction be- tween proteins and nucleic acids. One only might expect proteins and nucleic acids pedorming similar functions in different organisms to possess certain common characteristics. If this is true such characteristics do not seem to be directly determined by the primary struxture of these molecules.

The portion of the G fragment corresponding to the phage ~ genome The EeoRI-G fragment contains 925 base pairs to the ~ght of the 434 im-

munity region corresponding to a portion of phage ~. DNA. Comparison with the genetic and restriction cleavage map of the section adjacent to the immun- ity region (Blattner et al., 1974) suggests that this sequence may comprise the vii gene, the y region, the ori region, sequences corresponding to oop-RNA, and, possibly, the NH2-terminal part of the O gene. When the work was in progress other data appeared on the sequence of the region 1187--1287 bp (Denniston-Thompson et al., 1977). When our work was almost completed some othe~ data were also published (Schwarz et al., 1978; Rosenberg et al., 1978; Scherer, 1978) describing the whole ~, DNA region of the fragment. Our sequence of the region 1187--1287 bp somewhat differs from the Dermiston-Thompson et al. results but is in accord with data by Scherer (1978).

It is noteworthy that in the region 656--662 bp (Fig.4, underlined) we ob- served a highly interesting example of "compression". During the analysis of 5'-labelled r-strand of TaqI-165 fragment (Fig.3d) on the gel autoradiogram five bands were observed, corresponding to cytosine residues.

The observed band pattern was regular, though one of the bands was more intensive comPared to others (Fig.6a). During the analysis of 3'-end labelled l-strand of the same fragment (Fig. 3t) six gu0nosine residues were observed in the same region (Fig. 6b). We realize that this region comprises six GC pairs, as in the published data (Schwarz et al., 1978). The observed "compression" was probably due to the strong secondary structure existing in the r strand a r e a . *

*We are grateful to prof. H. KSssel, who provided us with autoradiograms of his gels in this region. These suggested to us to verify our preliminary findings, which might have led us to the erroneous conclusion that this region included only five G. C pairs.

246

Fig.6. (a) Part of a sequencing gel, spanning the region withln the 652-657 b.p. from the left end of the EcoRI-G fragment (see Fig. 3d, r strand). 20 ~g Taql digest of G fragment were 5'-terminally labelled by a procedure similar to that described in the legend to Fig.2. The separated strands of TaqI-165 fragment were subjected to chemical cleavage reactions. The obtained mixture was electrophoresed on 40× 20x 0.15 em slab of 10% PAAG/7 M urea for 6 h at 800 V. (b) Part of a sequencing gel, spanning the same region, but for the I strand, labelled at the 3'-end terminus with '1'4 DNA-polymerase (see Fig.3, fragment t). 10 ~g ToqI digest of G fragment were 8'-terminally labelled with T4 DNA polymerase and [~-3~P]dCTP: the products of digestion were incubated with 250 pM [a-ssP] dCTP (400 Ci/ mmole, Amersham) and 5 units of T4 DNA polymerase in 20 ~I of buffer described previous- ly (Lehman, 1974) for 30 rain at 37°C. The following procedures were similar to those de- scribed in (a) only that eleetrophoresis on 40× 20X 0.015 cm slab of 20% PAAG was carried out for 4 h at 2000 V.

(a) The hypothetical sequence o f the clI gene. Sequences corresponding to the cII and O genes are transcribed in vivo in the presence of the gene N product which controls antiterm!nation. The resultant mRNA molecule is the message to be expressed into the cro, vii, O, P and Q proteins. That is why, in order to locate the beginning of the vii gene, we at tempted to find a se- quence which contains both a ribosome binding site and the initiation codon ATG or GTG in the same strand as does the cro gene.

The computer analysis gave a series of possible sequences for the ¢II gene. We believe that the most probable beginning of the cII gene lies within the interval of 459--474 bp which includes the longest sequence (TAAGGA-) complementary to the 3'-terminus of 16S RNA and the most frequent initia- t ion codon ATG. The predicted amino acid sequence is given in Fig.4.

247

Our location of the cH gene start coincides with the location chosen by Schwarz et al. (1978).The cII protein sequence deduced from the nucleotide sequence comprises 97 amino acids. Molecular weight of cII protein thus equals 11 046, which is well in accord with the experimentally determined value of 13 000 (Yates et al., 1977). In-phase terminator codon in this case is UGA~ Schwarz et al. (1978) give some additional evidence in favour of this cH gene location.

(b) The hypothetical sequence of the 0 gene. The product of the O gene, one of the proteins involved in phage }, DNA replication, has a molecular weight of 34 500 (Yates et al., 1977) and therefore contains about 300 amino acids. It is located to the right from the cII gene in the genetic map. Two ori- gins of translation are possible for this protein. (Schwarz et al., 1978). One of them contains the ribosome binding site AGGAG and the initiation codon ATG within 791--802 bp. Another combination of AGGAG with the initia- tion codon GTG lies within 856--868 bp. The former was chosen as the origin of translation by Schwarz et al. We chose this start for the O-protein on the basis of the same arguments. The obtained NH2-terminal sequence of 149 amino acid residues makes about half the O-protein sequence. When this paper was returned from the referees for our corrections, a publication (Scherer, 1978) appeared inwhich the whole sequence was determined. Our results agree with these data.

(¢) The oop-RNA gene. The fragment studied contains the oop-RNA gene whose sequence agrees with that reported by other authors (Scherer et al., 1977; Schwarz et al., 1978).

(d) The region between the cro and clI genes. The sequence between the cro and cH genes must contain the tRl terminator which, in the absence of the N gene product, stops the rightward early transcription (Herskowitz, 1973). Moreover, this sequence represents the y region which plays an important, though not altogether clear, role in the maintenance of the lysogenic state. A more detailed analysis of this region had been published ~,,~osenberg et al., 1978) when our work was nearly completed.

The sequence data which we report here, will form a basis for further studies on the mode of interaction between regulatory proteins and the DNA region sequenced. Studies of this kind, which apparently will become possible after the isolation of respective regulatory proteins, are believed to contribute to our knowledge of the mechanism of protein--nucleic acid interaction.

REFERENCES

Aono, J. and Horiuchi, T., Purification and some properties of presumptive t o f gene pro- duct of coliphage 434, Mol. Gen. Genet., 156 (1977) 195--201.

Bayer, A.,~, Zaharyev, V.M., Krayev, A.S., Skryabin, K.G., Monastyrskaya, G.S., Sverdlov, E.D. and Ovchinnikov, Yu.A., Nucleotide sequence of bacteriophage 434 DNA fragment containing promoters PR and Pm~, operator o R and N-terminal parts of repre-~sor ¢I and cro genes, Bioorg. Chem. (USSR), 4 (1978) 1563--1565.

248

]Sickle, T.A., Pirrotta, V. and Imber, R., A simple general procedure for purifying restric- tion endonudeases, NucL Acid Res., 4 (1977) ~661--2572.

Biattner, F.R., F'mndt, M., Ham, K.K., Twoee, P.A. and Szybaiski, W., Deletions and in- sertions in the immunity region of coliphage lambda: Revised measurement of the promoter-startpoint distance, Virology, 62 (1974) 458--471.

Denniston-Thompson, K., Moore, D.D., Kruger, K.F~, Furth, l~F~ and Blattner, F.R., Physical structure of the replication ~ of bacteriophage lambda, Science, 198 (1977) 1051--1056.

Echols, 14. and Green, L , Establishment and maintenance ofrepressioL by bacteriophage lambda: The role of the cI, cH and ¢!II proteins, Proc. Natl. Acad. ScL USA, 68 (1971) 2190--2194.

Hayes, S. and Szybalski, W., Control of short leftward transcripts from the immunity and ori regions in induced coliphage lambda, Mol. Gen. Genet., 126 (1973) 275--290.

Herskowitz, L, Control of gene expression in bacteriophage lambda, Annu. Rev. Genet., 7 (1973) 289--324.

Honigman, A., Hu, S.L, Chase, R. and Szybalski, W., 48 oop RNA is a leader sequence for the immunity-establishment transcription in coliphage ~, Nature, 262 (1976) 112--116.

lisiang, I~LIL, Cole, R.D., Takeda, Y. and Eehols, IL, P, mino acid sequence of cro regula- tory protein of bacteriophage lambda, Nature, 270 (1977) 275--277.

Humayun, 7., Kleid, D. and Ptashne, M., Sites of contact between operators and repressor, NucL AckL Res., 4 (1977a) 1595--1607.

Humayun, 7~, Jeffrey, A. and Ptashne, M., Completed DNA sequences and organization of repressor-binding sites in the operators of phage x, J. Mol. Biol., 112 (1977) 265--267.

Johnson, A., Meyer, B.J. and Pteshne, M., Machanism of action of the cro protein of bac- teriophage ~,, Proc. NatL Acad. ScL USA, 75 (1978) 1783--1787.

Korobko, V.G. and Grachev, 8.A,, Scquenee determination in DNA by a modified chemical method, Bioorg. Chen~ (USaR), 3 (1977) 1419--1422.

Lehman, I.R., T4 DNA polymerase, in Grouman, L. and Moldave, K. (Eds.), Methods in Enzymology, ~7oI. 29, Academic Press, New York, 1974, pp. 46--52.

Manlatis, T., Jeffrey, A. and Kleid, D.G., Nucleotide sequence of the rightward operator of phage ~., Proc. Natl. Acad. SCi. USA, 72 (1975a) 1184--1188.

Maniatis, T., Ptashne, bL, Backman, K., Kleid, D., Flashman, 8 , Jeffrey, A. and Maurer, It. Recognition sequences of repressor and Imlymersse in the operators of bacteriophage x, Cell, 5 (1975b)109-113.

Maxam, A. and Gilbert, W., A new method for sequencing DNA, Proc. Natl. Acad. SCi. USA, 74 (1977) 569-564.

Petes, D.T., ,Hereford, I~bL and 8kryabin, K.G., Characterization of two types of yeast ribosomal DNA genes, J. Bacteriol., 134 (1978) 295--305.

Pirrotta, V., Sequence of the oR operator of phage ~., Nature, 254 (1975) 114--117. Pribnow, D., Nucleotide sequence of an RNA polymerase binding site at an early 'if/pro,

motet, Proe. Natl. Acad. SCi. USA, 72, (1975) 784--788. Ptashne, M., Backman, K., Humayun, M.Z., Jeffrey, A., Mau~r, R., Meyer, B. and Sauer,

R.T., Autoregulation and function of a repressor in bacteriophage lambda, Science, 194 (1976) 156--161.

Richardson, C.C., Polynucleotide kinase from EscherichiB ¢oli infected with bacteriophage T4, in Cantoni, G.L and Davies, D.IL (Eds.), Procedures in Nucleic Acid Research, Vol. 2, Harper and Row, New York, 1971, pp. 815---828.

Roberts, T.M., Shimatake, H., Brady, C. and Rosenberg, M., Sequence of cro gene of bac- teriophage ~., Nature, 270 (1977), 274--276.

Rosenberg, M., Court, D., Shimatake, I4., Brady, C. and Wulff, D.L., The relationship be- tween function and DNA sequence in an intercistronie regulatory region in phage ~,, Nature, 272 (1978) 414--423.

Sato, S., Hutchison, C.A. and Harris, J.L, A thermostable sequence~speeifi¢ endonuciease from Thermus aquat/cua, Proc. Natl. Acad. Sci. USA, 74 (1977) 642--§46.

249

Sauer, R.T. and Anderegg, R., Primary structure of the ~ repressor, Biochemistry, 17 (1978) 1092--1100.

Scherer, G., Nucleotide sequence of the O gene and of the origin of replication in bacter~ phage lambda DNA, Nucl. Acid Res., 5 (1978) 3141--3156.

Scherer, G., Hobom, G. and K6ssel, H., DNA base sequence of the Po promoter region of phage ~, Nature, 265 (1977) 117--121.

Sehwarz, E., Seherer, G., Hobom, G. and K~ssel, H., Nucleotide sequence of cro, oil and part of the O gene in phage ~. DNA, Nature, 272 (1978) 410--414.

Shine, J. and Daigarno, L., The 3'-terminal sequence of Escherichia coli 16S ribosomal RNA: Complementarity to nonsense triplets and ribosome binding sites. Proc. Natl. Acad. Sci. USA, 71 (1974) 1342--1346.

Skryabin, K.G., Zakharyev, V.M. and Bayer, A.A., The structure of external transcribed spacer of yeast ribosomal DNA, Dokl. Akad. Nauk USSR, 241 (1978) 488--490.

Smith, G.R., Eisen, FL, Reichard, L. and Hedgpeth, J., Deletion locating a Prm mutation within the rightward operator, Proc. Natl. Acad. Sci. USA, 73 (1976) 712--716.

Steitz, J.A. and Jakes, K., How ribosomes select initiation region in mRNA: base pair formation between 3'-terminus of 16S RNS and mRNA during initiation of protein synthesis in E. coU, Proe. Natl. Acad. Sci. USA, 72 (1975) 4734--4738.

Sverdlov, E.D., Monastyrskaya, G.S., Chestukhin, A.V. and Budowsky, E.I., The primary structure of oligonucleotides. Partial apurinization as a method to determine the posi- tion of purine and pyrimidine residues, FEBS .T, ett., 33 (1975) 15--17.

Sverdlov, E.D., Monastyrskaya, G.S. and Rostapsllov, V.M., Isolation and characterization of a fragment of bacteriophage ~.imm434 DNA containing an operon 4S RNA (oop- RNA), Bioorg. Khim. (USSR), 4 (1978) 894--900.

Szalay, A.A., Grohmann, K. and Sinsheimer, R.L., Separation of the complementary strands of DNA fragments on polyacrylamide gets, Nucl. Acid Res., 4 (1977) 1569-- 1578.

Takeda, Y., Folkmants, A. and Eehots, H., Cro regulatory protein specified by bacterio- phage X. Structure, DNA-binding and repression of RNA synthesis, J. Biol. Chem., 252 (1977) 6177--6188.

Walz, A. and Pirrotta, V., Sequence of theplt promoter of phage x, Nature, 254 (1975) 118--121.

Walz, A., Pirrotta, V. and Ineiche~ K., x repressor regulates the switch between PR and Prm promoters, Nature, 262 (1976) 665--669.

Yates, J.L., Gette, W.R., Furth, IVLE. and Nomura, M., Effects of ribosomal mutations on the read-through of chain termination signal: Studies on the synthesis of bacteriophage x O gene protein in vitro, Proc. Natl. Aca¢1. Sci. USA, 74 (1977) 689-693.

Note added in proofs: "[he sequence of the ¢ro-cII~>op region of the phage 434 DNA was published in March, 1979 [Grossehedl, R. and khwarz, E., Nucleotide sequence of the ¢ro~ll~op region of bacteriophage 434 DNA, Nucl. Acids Res., 6 (1979) 867--881].

Communicated by W. SzybalskL