Sequences and homology analysis of two genes encoding β-glucosidases from Bacillus polymyxa

8
Gene, 95 (1990) 31-38 Elsevier GENE 0372! 31 Sequences and homology analysis of two genes encoding p-glneosidases from Bacillus polymyxa (Cellobiose; cellulose hydrolysis; codon usage; p-galactosidase; recombinant DNA; signal sequences) Luis Gonzilez-Candelas, Daniel Ram6n and Julio Polaina Ins6tmo de Agroquimica y Tecnologlade Alimenws. C.$.L C., Valencia(Spain) Received by M. Salas: 28 February 1990 Revised: ! I May 1990 Accepted: 14 May and 20 June 1990 SUMMARY The nucleotide sequences of the bglA and bgiB genes encoding~-glucosidasesfrom Bacilluspolymyxa have been determined. Both genes contain coding regions of 1344 bp, corresponding to polypeptides with M~s of 51643 and 51547, respectively. Patterns ofcodon usage indicate that both genes are expressed at a low frequency. Previous data suggestedthat the proteins encoded by bglA and bglB were intra- and extracellular enzymes, respectively; however, neither of the two deduced amino acid sequences has N termini with the typical features of a leader peptide. The proteins encoded by bglA and bgIB show remarkable homology to each other and to other ~-giucosidases (Bgl) and p-galactosidases (0Gal). On the basis of the observed homologies, we can define two groups of microbial Bgi: one &them, type I, including most bacterial Bgi, and type II, including enzymes from different yeast species and one from CIostridium thermocellum. Likewise, at least two groups of /~Gal can be distinguished: type I, including enzymes homologous to type-I B81,and type II, showing no homology to any of the previous groups. INTRODUCTION p-Glucosidases (Bgl) catalyse the hydrolysis of cellobiose and related glycosides. These enzymes, which are widely distributed among microorganisms, are the subject of increasing interest because of their involvement in the biological saccharifieation of cellulose, the most abundant renewable carbon source on earth. ~.eports on the nt sequences of several cloned Bgi-encoding genes, mostly flora bacteria and yeasts, have appeared in the literature during tile last few years (Grabnitz et al., 1989; Kohchi and Correspondenceto: Dr. J. Polaina, Instituto de Agroquimica y Tecnologla de Alimentos, Jaime Roig ! I, 46010 Valencia (Spain) Tel. 6-369-08-00, ext. 35; Fax 6-393.00-01. Abbreviations: aa, amino acid(s); Bgl, p.glucosidase(s); bgl, gene en- coding Bgl; bp, base pair(s); ~Gal,,B-galactosidase(s); kb, kilobase(s) or I000 bp; ORF, open reading frame; nt, nucleotide(s); P-#Gai, phospho- i~-galactosidase; PNPG, p.nitrophenyl-p-D-glucopyranoside. Toh-e, 1985; Love et al., 1988; Machida etal., 1988; Raynai et ai., 1987; Schnetz et al., 1987; Wakarchuk et ai., 1988). The comparative analysis of sequence data allows important information to be obtained concerning the evolutionary relatedness of the genes, the localization of the active site along the primary structure of the enzyme, the relationship between structure and function, etc. On the applied side, this information is of crucial importance for the designing of tailor-made enzymes. Bacillus polymyxa synthesizes two different Bgl. The genes coding for these two enzymes have been cloned in Escherlchia coil where they are expressed from their own promoters. Available data indicate that the enzymeencoded by one of these genes, designated bgL4, is intracellular and cleaves cellobiose probably through inorganic phosphate mediated hydrolysis. The enzyme encoded by bglB, the other gene, is extracellular, cleaves cellobiose yielding glucose as the sole product, and is particularly active on aryl-glucosidessuch as p-nitrophenyl-~-D-glucopyranoside (PNPG) (Gonz~dez-Candelas et al., 1989). 0378-1119/90/$03.50© 1990Elsevier SciencePublishers B.V.(BiomedicalDivision)

Transcript of Sequences and homology analysis of two genes encoding β-glucosidases from Bacillus polymyxa

Gene, 95 (1990) 31-38 Elsevier

GENE 0372 !

31

Sequences and homology analysis of two genes encoding p-glneosidases from Bacillus polymyxa

(Cellobiose; cellulose hydrolysis; codon usage; p-galactosidase; recombinant DNA; signal sequences)

Luis Gonzilez-Candelas, Daniel Ram6n and Julio Polaina

Ins6tmo de Agroquimica y Tecnologla de Alimenws. C.$.L C., Valencia (Spain)

Received by M. Salas: 28 February 1990 Revised: ! I May 1990 Accepted: 14 May and 20 June 1990

SUMMARY

The nucleotide sequences of the bglA and bgiB genes encoding ~-glucosidases from Bacilluspolymyxa have been determined. Both genes contain coding regions of 1344 bp, corresponding to polypeptides with M~s of 51643 and 51547, respectively. Patterns ofcodon usage indicate that both genes are expressed at a low frequency. Previous data suggested that the proteins encoded by bglA and bglB were intra- and extracellular enzymes, respectively; however, neither of the two deduced amino acid sequences has N termini with the typical features of a leader peptide. The proteins encoded by bglA and bgIB show remarkable homology to each other and to other ~-giucosidases (Bgl) and p-galactosidases (0Gal). On the basis of the observed homologies, we can define two groups of microbial Bgi: one &them, type I, including most bacterial Bgi, and type II, including enzymes from different yeast species and one from CIostridium thermocellum. Likewise, at least two groups of /~Gal can be distinguished: type I, including enzymes homologous to type-I B81, and type II, showing no homology to any of the previous groups.

INTRODUCTION

p-Glucosidases (Bgl) catalyse the hydrolysis of cellobiose and related glycosides. These enzymes, which are widely distributed among microorganisms, are the subject of increasing interest because of their involvement in the biological saccharifieation of cellulose, the most abundant renewable carbon source on earth. ~.eports on the nt sequences of several cloned Bgi-encoding genes, mostly flora bacteria and yeasts, have appeared in the literature during tile last few years (Grabnitz et al., 1989; Kohchi and

Correspondence to: Dr. J. Polaina, Instituto de Agroquimica y Tecnologla de Alimentos, Jaime Roig ! I, 46010 Valencia (Spain) Tel. 6-369-08-00, ext. 35; Fax 6-393.00-01.

Abbreviations: aa, amino acid(s); Bgl, p.glucosidase(s); bgl, gene en- coding Bgl; bp, base pair(s); ~Gal,,B-galactosidase(s); kb, kilobase(s) or I000 bp; ORF, open reading frame; nt, nucleotide(s); P-#Gai, phospho- i~-galactosidase; PNPG, p.nitrophenyl-p-D-glucopyranoside.

Toh-e, 1985; Love et al., 1988; Machida etal., 1988; Raynai et ai., 1987; Schnetz et al., 1987; Wakarchuk et ai., 1988). The comparative analysis of sequence data allows important information to be obtained concerning the evolutionary relatedness of the genes, the localization of the active site along the primary structure of the enzyme, the relationship between structure and function, etc. On the applied side, this information is of crucial importance for the designing of tailor-made enzymes.

Bacillus polymyxa synthesizes two different Bgl. The genes coding for these two enzymes have been cloned in Escherlchia coil where they are expressed from their own promoters. Available data indicate that the enzyme encoded by one of these genes, designated bgL4, is intracellular and cleaves cellobiose probably through inorganic phosphate mediated hydrolysis. The enzyme encoded by bglB, the other gene, is extracellular, cleaves cellobiose yielding glucose as the sole product, and is particularly active on aryl-glucosides such as p-nitrophenyl-~-D-glucopyranoside (PNPG) (Gonz~dez-Candelas et al., 1989).

0378-1119/90/$03.50 © 1990 Elsevier Science Publishers B.V. (Biomedical Division)

32

The aim of present study was to determine the nt sequences ofbgIA and bglB, and to investigate the structural similarities and evolutionary relationships of Bgl to other enzymes.

RESULTS AND DISCUSSION

(a) Nueleotide sequences of bglA and bglB: general features Series of plasmids with deletions, generated by the

procedure described in Fig. 1, were used as templates in the sequencing experiments. Fig. 2A and B show the nt se- quences determined for DNA fragments containing bglA and bgiB, respectively. The two genes were entirely se- quenced on both strands. The fragment containing bgiA showed a major 1344-bp ORF (448 aa; 51643 Da). The first ATG of this frame was considered to be the initial Met because of the very good alignment with the sequences of other bacterial Bgl (see section e). Sequencing of the frag- ment containing bgiB revealed a 1344-bp ORF (448 aa; 51547 Da). As in the previous case, the first ATG was considered to be the start of the protein because of the

resulting alignment with other Bgi. The G + C contents of bglA and bglB were 46.7 and 47.1% respectively, not far from the 44.3-45.6% values given for Bacillus polyrnyxa DIqA (Fahray et al., 1985).

Sequences similar to consensus signal sequences for Bacillus (McConnell et al., 1986) are underlined in Fig. 2A and B. Probable ribosome-binding sites (Shine and Dalgarno, 1974) appear at nt -8 to -16 for bglA (Fig. 2A) and -10 to -19 for bglB (Fig. 2B). The upstream region of bglA (Fig. 2A) shows sequences similar to those recognized by the o A factor (Moran eta)., 1982; Murray and Rabinowitz, 1982). A canonical -10 sequence appears at position -22 to -27. Separated by 17 nt from this, where a -35 sequence would be expected, there is a GAAACA sharing 3 out of 6 nt with the consensus TTGACA -35 sequence. Further upstream, at position -72 to -77, there appears a TTGTCA, but this is 44 nt away from the putative -10 sequence. The upstream region ofbglB (Fig. 2B) shows possible -10 sequences at positions -78 to -83 and -85 to -90, and possible -35 sequences at positions -104 to -109 and -108 to -113. No dyad symmetry elements, that could account for transcription termination signals, have been detected.

A

i L

~DA

[ I i~ut,'

fp'

1 I

I t I ',

i lure ~v~ r~.~ A

PNPG actlvitv

B

I . , , I i

L 4

4 t

t ; I

BumHI

i-i , ,, . . . . . .... / 1 1

I i

t t _ ; |

! P 4"

• ~ t lk l )

Fig. I. Bidirectional, sequential deletions in plasmids pBG2, carrying bg//l (A), and pBG4/SC3, carrying bg/B (B), constructed by the exonuclease IIl - S I nuclease digestion protocol (Henikoff, 1984), using a commercial kit from Pharmaeia. £. co//transformants harboring plasmids with deletions were tested for Bgl activity using PNPG as the substrate (see Gonz~dez-Candelas et al., 1989). The blackened bars along the inserts ofpBG2 and pBG4/SC3 indicate the position of the genes, delimited by the deletion analysis.

A

-52

1

OAT

CATG

OTG

GCO

OG

CATA

CCG

GG

TGG

AGC'r

G3'S

CCTG

QTT

OAI

TAAC

CTG

TACG

CCCG

SCTG

nAAS

A'I'G

G

-~5

0 ~G

A~A

CT

~C~A

T~0

~TA

TG

13'C

G~A

~TT

~CT

~T~C

~TC

GA

TC

TA

CC

CT

JikT

TT

AT

rCG

~~A

0A~A

~G~

-360

T~

CT

Tr 0

0T0 C

OAC

0 GCA

GG

CATA

GCA

O&q

ATO

CTG

CTG

CAAA

OCC

G CC

S~I

~CG

AG

CTG

AC

GC

TATr

GC

cTsc

ccI'G

~A~ ~

TG

-270

~C

CG

~AG

~GAC

~TC

A~C

G~C

TTG

AAA~

Ci~C

G~A

ATTA

C~G

T~AG

CAT

GG

AAT~

GTC

C~T

TC0C

GG

C11

cGTC

CG

TGC

:GG

ACG

T 8

anl I

-1

80 T

AC

CA

CA

TCG

OTC

CA

OA

CG

GG

AA

GC

TGTA

CO

A

TTC

O CA

OTG

CA

CA

TCC

GIT

rAG

CG

TTO

ATG

ATC

GTC

G

~'T

CA

G CC

T6A

TCC

CA

AG

CA

-90

CO

Q AG

GAT

TTAT

T~r G

TCAT

GG

ACTT

TTAC

OA A

AAAG

CAO

G AA

ACAk

CG

AATG

AATA

TACG

AT

TA

TA

AT

TG

AG

GG

~A~A

GA

~CA

K

nT

1 A

TG~ C

TAIT

~'J"

r CAA

TTTC

eGCA

GG

ACTT

TATG

TGO

O~'

~ (;~G

CAAC

G GCC

GCC

TACC

AAAT

CGAG

OtgG

GC~

rACC

AGG

AGG

ATG

GAA

G A

N

T

I F

QF

P

Q

DF

H

W

G

T

A

T

A

A

Y

QI

S G

A

Y

Q

I~

0 G

R

~]

0 G

GTT

GTC

G AT

Cr S

GO

ATA

CG

Tt'T

GC

GC

ATA

CG

CC

TOG

CA

AA

GTG

TTC

A ACG

OTO

ACA

ACG

GCA

ATO

TAG

GT~

GTG

A

CA

GC

TATC

ATC

GT

0 r-

S

I W

D

T

F

A

H

T

P G

K

V

.F

N

G

D

II

0 N

V

A

C

U

S

Y

H

R

Eco

RV

18

1 TA

CO

AAO

AAG

AFAT

O~O

T~TG

ATG

AAG

GAA

CTG

GG

CAT

TCG

TAC

ATAT

CG

T~C

TC~T

CC

r~G

~TAT

A~AA

T~G

AT

61

Y

E

E

D

I R

L

H

K

E

!,

G

I

R

T

Y

R

F S

V

6 ~l

p

R

I F

P N

G

D

271

GST

GA

AO

~CA

AT

CA

A~A

OG

GA

TT

GG

AC

TA

TT

AT

CA

TC

OT

GT

AG

TT

GA

TT

TO

TT

GA

AT

gAC

SAC

GG

AA

TT

SAA

~&~A

T

91

0 E

V

N

Q

E

O

L

0 ¥

Y

H

R

V

V

D

L

L

N

0 N

G

IE

P

F C

T

L

Y

~1

C

ACTG

GO

ATU'T

OCC

TCAG

0 CG

C'r A

GAG

G &T

G CC

GG

AGG

ATG

GG

(;AAA

TCG

TCG

CAC

AATT

CAG

GC

ATTT

OTC

CAO

TT]~

(:OG

~A~

H

W

D

L

P Q

A

I.,

Q

n

A.

G

O

W

0 N

R

8

T

I Q

k

r V

Q

F

A

S T

H

~ TT

CC

GTG

AG

TTTC

AC

G

GTA

AA

ATA

G AG

G^T

T0 G

CTG

AC

ATT

CA

ATG

AA

CC

OTG

~rG

TA

TC

0 CC

TTT~

TATC

CA

ATA

I'GC

Tr~G

AG

TTC

AT

F

R

E

F

H

G

K

I Q

H

W

L

T

F

N

C

P If

C

I

A

F L

S

N

N

L

GV

H

SmA

54

1 ~C

C~.

TC

TG

AC

GA

AT

CT

CC

AG

AC

TG

C~A

TT

GA

TG

TA

~A~A

TC

AT

CT

GC

TG

GT

T~C

~CA

TG

GC

CT

AT

CT

GT

AC

ccC

GA

TT

ceG

ceA

~ 18

1 &

P

0 L

T

~

L

Q

T

A

I D

V

0

H

0 L

L

V

A

H

0

L

S V

R

R

F

0 E

26]]

CTT

G GCA

CCAG

TGO

CCAG

AT

CG

GT

AT

CO

CC

CC

AA

A'ro

TcT

CC

TG

OG

CI'G

I"I'C

~A

CA

~A

~G

AG

G~

OA

T~

~G

~

LG

TS

GQ

I

GI

AP

NV

SH

AV

PY

ST

SS

SD

KA

AC

A

~21

CGCA

CGAT

TTCC

CTS C

ACA G

TOAC

TGG

TITC

TC CA

GCC

TATT

TATC

AAG

GCT

CGTA

TCCT

CAG

TTI"I"

TGG

TAG

ACTS

Grrlr

OCG

GAA

CAO

R

T

IS

L

II

S

D

~F

L

Q

P

I ~

Q

G

S Y

P

Q

F L

V

D

U

F

A

S Q

Sll

O

GA

G CC

AC

CG

TAC

CTA

~AC

AA

GA

TG G AG

ATA

TGG

AC

ATT

ATC

G GTG

AG

CC

AA

TTG

ATA

TGA

TTG

GT

AT

CA

AT

rAT

r ATA

GTA

TG"r

cGG

TT

271

O

A

T

V

P I

Q

D

G

D

H

D

1 1

G

E

P 1

D

N

I 0

l N

Y

Y

S

14

S V

C

ta|

Eco

RV

90

1 AA

TCG

ATrT

AATC

C~A

~GC

AGG

ATTT

CT~

AATC

TGAA

GAA

ATC

AATA

T~AC

TAC

C1r

~TAA

CA~

ATA]

~G~G

~GG

A~C

A 30

1 N

R

F

f~

P E

A

0

F b

Q

S E

E

I

N

~ 0

L

P Y

T

D

I

G

M

P Y

E

$

Cla

J 09

1 CG

TGG~

GT~T

AT~G

GT~C

T~CA

TTAT

TT~C

AAAA

ATAC

GGT~

CAT~

AFAT

TTAC

ATCA

CAGA

(;AAC

GG~6

CTT6

TATC

~ATG

ATGA

G 33

1 R

G

L

Y

E

V

L

ii

YL

Q

K

¥

0 S

I D

I

Y

I T

8

8 G

A

C

I

N

D

E

1081

OTC

GTk

AA

eOO

AA

~OO

T~C

AA

GA

TOA

TGO

TC~A

AT'

~CC

TAC

ATG

~AG

CA

GC

ATT

TGG

T~G

G~C

~OA

C~A

~ 36

1 V

V

14

G

K

V

Q

D

D

R

R

I

S Y

M

Q

Q

II

L

V

Q

V

H

R

T

I

H

D

G

r.

1171

C

ATG

TTA

AA

OG

CTA

TATG

GC

ATC

GTC

AC

TTTT

G G A

CA

~TTT

TG AG

TgG

GCA

GA

OG

G GTA

TAA

TATO

AG

ATT

TG GC

~CA

TT

CA

TG

TC

eAT

39

1 H

V

K

G

Y

15

A

W

S

L

L

D

N

F E

U

A

E

G

¥

N

H

R

F G

H

I

H

v D

1261

TT

TCG

CA

CA

CA

G GTC

CG

CA

CC

CC

CA

AG

GA

~kO

TTA

CTA

TTG

OTA

TCO

AA

ATO

TAG

TAA

OTA

ATA

ACT

GG

TTG

GA

GA

CTA

GA

CG

C'rA

AA

CA

~

21

F

R

T

Q

V

R

T

P K

E

S

Y

Y

14

Y

R

[4

V

V

S N

N

~

L

£ T

R

R

--

1351

AA

GA

GA

AA

~G~T

CT

GA

AT

AT

AA

AG

cAA

AT

TT

AG

GT

CT

GC

CT

CC

GC

TC

CC

GT

TT

GT

TT

AJ~

AT

A~A

~TC

CT

CT

GG

GG

TA

AT

~TA

~

1441

GA

T•A

ATT

ACA

G•T

GG

TTG

TAsT

A•A

TAG

AG

AC•

•GCA

GG

AA

TAA

AA

AA

ATA

AA

AA

ATG

TGcA

TACT

GA

AA

AA

AA

TAG

TrG

G•

B -7

11

G

•ATC

CT•T

AGCG

1TAC

••AA•

GG

TCT•

C•TT

T•G

AG&q

TTT•

ATG

GAT

GCT

TGG

GAT

AAA•

••AAT

•TCT

CTCA

ATAT

AT

-fi3

0 G

TG G

AAC

AGTA

~-t-~

i:~TA

TAC

AG G

AGTq

'TCTA

CGTT

A'rTA

ACCC

TGAT

TATA

TCCG

TGTT

TGCC

GCG

TTTC

CG

cr&

GcA

C6 G

GG

ATAT

GT

-540

TA

A~T

r~A

~CA

Cc~

CT

T~T

AT

~T~T

TT

TT

~CT

~AT

TT

CC

AT

~TA

TC

T~C

C~A

AT

Cc~

cTA

AT

CC

~AA

TT

T~A

~TA

~G

Kpn

lr -6

50

AcT

GTAC

ATAC

cA~C

TT~G

1TrA

TTTG

CTGA

AAAA

CGAC

AGG~

C~CA

TC~C

TTTC

CTGA

T~TT

T~TG

~G~T

~T~T

CAAG

TC]rG

TCTC

-36

0 C

C~A

GA~

CTG

GAT

6AA~

GG

~TG

~AAT

GG

ACG

~ATG

C~T

TATT

CAC

GTT

ATTT

GTT

CAC

~A~T

CTT

GTA

cCAC

T~AT

~AA~

C~G

T~

-270

80C

GAC

AG O

CAT

CAT

Jr'I'r0

ACAG

CC

ATTG

GTS

CAT

S OAA

CGAC

ATTA

TAG

GTC

CTAC

CATT

TATT

TGTC

TGAT

CCCO

CCTA

TCAO

CCAG

T

-180

GAC

CAAG

GG

ACrG

TTCT

clrTT

GTA

TOG

ACAA

TACA

TG A,qC

AACT

GO

CCO

CTG

CI'T

G CTT

G CO

GTA

T~O

ATTG

TTAC

ACTA

CC

G~G

GT

Kpn

l -9

0 T

A~

O,T

~~

"r

CCTC

CAG

CG CT

TTAT

TOTC

OG

CGO

TOCA

ATG

OC

TS GAO

CTG

TGAA

GTC

T]'G

ATTC

U~.A

AAG

OAC

OG

FA C~

GAG

T

1 AT

~A~C

~A~A

ATAC

CTT

TATA

TTTC

CT~

CG

AC~T

TTAT

GT~

GG

AAC

TTC

AAC

CTC

1~C

~I'TA

TCAA

ATT~

AA~G

~TAC

A~AT

~AG

GG

C

1 H

S

E

N

T

F I

F P

A

T

F H

W

G

T

S

T

8 S

Y

Q

! E

G

0

T

D

E

G

Sma !

01

GG

CA(IA

ACQ

CCCT

CCAT

TTO

GG

ATAC

TTTT

TGTC

AAAT

C CCCG

GGAA

GGTA

ATCG

GAGG

GO AC

TGTG

fikQ

ATO

TAG

CAT

GTG

ATC

AT'rT

T 3

1G

R

T

P

S /

W

D

T

F C

Q

I

P 0

K

V

I G

G

D

C

G

D

V

A

C

"D

H

F

Ksp

I 18

1 C

ACC

ACTT

TAAA

~AA~

AC~T

~AAT

TAAT

GAA

AC~T

T~TT

TTTT

A~AT

TAT~

TTTT

TCTG

TAG

CTT

~CC

~TA~

AT~C

C~G

C0

61

fl

H

F K

£

D

V

Q

L

H

K

Q

L

0 F

L

H

Y

R

F S

V

A

H

P R

I

H

p A

271

~T~C

ATT

ATT

AA

eGA

A~A

~G~G

TT~C

TCTT

CTA

C~A

~CA

TCT~

CTG

GA

TGA

GA

TTG

AG

TTG

GC

TGt1

~AG

T~A

TCC

C~A

T~C

TQA

CG

C~r

G

91

A

0 I

I N

EE

GL

LF

YE

II

LL

DE

I

EL

AG

LI

P

HL

TL

361

TATC

ATT

(]GG

ATC

TAC

C~C

A~T

~ATT

~AG

GA

~GA

GG

GTG

~AT~

AC

AC

A~C

~A~A

GA

CT~

TCC

~AC

ATT

TTA

AA

AC

~TA

T~

CC

TCTO

TA

12

1Y

H

W

D

L

P

Q

w

I E

0

E

0 O

~

T

Q

R

E

T

1Q

H

F

K

T

Y

A

S V

A51

ATC

ATG

0ATc

t1AT

TC~C

~A~C

G~T

AAAC

T~T~

AATA

C~A

TCAA

T(]A

~CC

1TA

TTG

CG

CC

TC~r

AT1~

rT~G

GC

TAT~

GTA

CA~

A~A G

151

I H

D

R

F

G

E

R

I N

I~

14

~

T

l N

E

P

¥ C

A

S

I L

Q

y

G

T

G

E

Sp

hl

5&1

cATG

~C~T

G8C

CATG

AGAA

CT~0

~AAG

CCTT

T~CT

GCC

~CCC

~TCA

TATT

CTG

~TG

TGTC

ATG

~ATT

~CC~

C~TT

T~C.

kCAA

G

18

1H

A

P

0 H

E

N

U

R

E

A

F

T

A

AII

H

~

L

H

C

H

0 Z

A

S

N

L

li K

631

GAO

AAAG

G 6~

'~'~I

~GO

TAAG

ATTO

GC

ATTA

CST

TGA A

CATG

OAA

CATO

TGG

ATG

COG

CTTC

CG AG

CO

ACC

CG

AG@

A~TG

~G~G

CC

2

11

Z

K

G

L

T

O

K

I G

l

T

L

N

H

E

II

V

D

A

A

6 E

R

P

E

D

V

A

A

A

721

,qTTA

OAA

GAO

ATG

SCTT

TATT

AATC

OI~G

OTT

T"CG

GAG

CCAT

TGTT

TAAG

G

S~A

TC~G

~GA

TATG

OTG

OA

ATG

GTA

CG

G~C

G

241

1 R

R

D

G

F

I N

R

q

F A

E

P

L

F N

G

K

¥

P E

D

H

V

£

g y

0 T

611

TATC

TGA

ATG

GA

TTG

OA

TTTT

GTA

CA

OC

CTG

GTG

ATA

TGG

AG

CTG

ATT

CA

GC

AA

CC

G

GG

GG

ATT

TTTT

GG

G CA

TTA

AC

TATT

ATA

CC

CO

T 2

71

Y

L

N

O

L

D

F v

Q

P 0

D

H

E

L

I Q

Q

p

G

D

F L

0

I H

Y

Y

T

R

901

A~AT

~ATT

~GAT

CM

*~AA

C~A

C~C

TT~T

TGG

TGC

AA0T

A~A~

A0~T

TCAC

ATG

GAG

GAG

CC

AGTA

ACG

~C

ATG

~AT(

;~A~

ATT

30

1S

Z

I

R

S T

N

U

A

S

L

L

Q

v E

Q

V

II

H

E

E

Y

V

T

D

H

O

W

E

I

991

CA

CC

CTG

AA

TCTT

TFF~

TAA

GC

TGC

T~A

CA

C~A

TT(]A

GA

AG

GA

TTTT

AG

~AG

GG

~TG

C~A

~AC

GG

~ATG

GA

GC

AG

CG

3

31

H

P

F..

S F

Y

K

L

L

T

R

I E

K

0

F S

K

0 L

P

I L

I

T

E

N

G

A

A

1081

ATG

AOG

OAT

GAA

CTG

GTA

A.qT

GO

ACAO

ATTG

AGG

ATAC

GG

GG

COTC

ACG

OCT

ATAT

TOAA

GAO

CATT

TAAA

GG

CCTG

TCAT

COCT

TCAT

T 3

61

H

R

D

E

L

v N

0

Q

I E

D

T

0

R

H

G

Y

I E

E

H

L

K

k

C

H

8 F

I

1171

OA

AG

AG

OO

AG

GTC

Ar, CTC

A A

GO

GO

TATT

TTG

TCTG

GTC

TTTC

CTT

GA

TAA

CTT

TGA

ATO

GO

CC

TGO

GO

CTA

CA

GC

AA

(;CG

TTTT

GG

CA

TT

39

1E

E

G

(]

Q

L

K

O

Y

F

V

W

S F

L

D

N

F E

W

A

W

G

Y

S

K

R

F G

l

1261

~TG

CAT

A~rC

AATT

ACG

AGAC

0C.~

AAC

~CTC

CC

AAqC

AAAG

T~cG

CTA

T~c~

A~C

AAAT

~ATG

~CG

~AG

AAC

~G~T

A 4

21

V

H

I N

Y

E

T

O

E

R

T

P

K

Q

$ A

L

~

F K

Q

H

H

A

K

N

G

F

~,

S~

hI

1351

GAA

AA~G

GGUA

TAC~

ATGC

AGCC

GTAT

TATT

TTQA

CAAT

CAAG

GACG

TTTT

GTGA

TTGA

AAAT

TTTG

CGCA

TG~A

A~CC

CTTT

TCCA

GT

14A

1 TT

CTTG

CCCG

G•A

TTG

CTG

GTG

TT•A

GG

••ATT

••••TG

TG•G

CGTA

TTAT

•TTA

ATCG

TGG

A•AA

G•A

ATTG

CGAG

CTTT

••TG

T•G

AG

1531

GAT

AAAA

ATGG

&GCG

ATTA

TGGA

ATTT

TTTe

CGOC

TAAC

CGCA

OCTA

TAGC

CT

Fig

. 2

. T

he

n

t se

qu

en

ce

s of

bglA

(A

) a

nd

bgI

B (B

), d

ete

rmin

ed

b

y t

he

did

eo

xy

ch

ain

te

rmin

ati

on

pro

ce

du

re

an

d

de

du

ce

d

aa

se

qu

en

ce

s.

Po

ssib

le r

eg

ula

tory

se

qu

en

ce

s a

re u

nd

erl

ine

d.

Ge

nB

an

k

ac

ce

ssio

n N

os.

M

34

00

9 (b

glA

); M

34

01

0 (b

glB

).

34

(Ib) Cedon usage Table I shows the frequencies of codon usage for bglA

and bg/B. Comparing these data with the pattern observed in different species (Sharp etaL, 1988), regardless the absence of compiled data for B. polymyxa, there is strong evidence indicating that neither bglA or bglB are highly expressed genes in their natural host. Results are par- ticularly revealing if there are considered just those aa for which the same strong codon usage bias is observed in Escherichia coli, Bacillus subtii~s, Saccharomyces cerevisiae and Schizosaccharomycespombe. GCU, the most frequently used codon for Ala in highly expressed genes of these species, is used at a considerable lower frequency in both bglA and bglB. GAU coding for Asp, U G U coding for Cys, and G G G coding for Gly are used at high frequencies in both bglA and bglB, which is characteristic of genes ex- pressed at a low level in all the species considered.

(e) Analysis of the N-termini of bglA and bgiB encoded proteins

Previous data (Gonz~lez-Candelas et al., 1989) sug- gested that, in B. polymyxa, the protein encoded by bgM is intracellular, while the one encoded by bglB is extracellular. Evidence supporting this view comes from the localization of the enzymatic activities encoded by these genes when they are expressed in E. coil, where bgL4 and bglB activities are found intracellularly and in the periplasm, respectively.

TABLE I

Relative synonymous codon usage

aa Codon Frequency"

bgM bglB

Ala G C A 0,96 0,31 O C C 1,28 1.69 GCG 0.96 0,92 GCU 0.80 i.08

At8 AGA 0.72 0.95 AGO 0.00 0,63 CGA 0.96 1.26 CGC !.68 0.32 CGG 0.24 0.95 CGU 2A0 1.89

Ash A A C 0,56 1.00 AAU 1.44 1.00

Asp GAC 0,64 0,52 GAU 1,36 1,48

Cys UGC 0.00 0.33 UGU 2.00 1.67

Gin C A A 0.67 0,94 CAG 1.33 !.06

TABLE I (continued)

Relative synonymous codon usage

aa Codon Frequency*

bglA b&lB

Glu GAA 0.89 0.68 GAG !.11 1.32

Gly GGA !.33 !.21 GGC !.03 !.21 GGG 0.51 0.93 GGU 1.13 0.65

His CAC 0A7 0.60 CAU 1.53 1.40

Ile AUA 0,30 0,17 AUC 1.30 0.69 AUU 1.40 2.14

Leu CUA 0.77 0.36 CUC 0.58 0.36 CUG 1.94 2.36 CUU 0.77 0.36 UUA 0.19 0.73 UUG 1.74 1.82

Lys AAA 1.25 0.59 AAG 0.75 IAI

Met AUG 1.00 !.00

Phe U U C 0.52 0.38 UUU 1.48 !.62

Pro CCA 0.67 0.67 CCC 0.22 !.33 CCG !.33 0.67 CCU 1.78 !.33

Scr AGC 0.57 !.67 AGU 1,43 0.33 U C A 0.57 0,67 UCC 1.43 0.67 UCG 1.14 0.33 UCU 0.86 2.33

Thr A C A 1.45 0.64 A C C 0.9 ! 0.64 ACG 0.91 1,92 ACU 0.73 0.80

Trp UGG 1.00 1.00

Tyr UAC 0.70 0.47 UAU 1.30 i,53

VUl GUA 1,52 2,13 GUC 0,83 0.27 GUG 0.83 1.33 GUU 0.83 0.27

* Listed values represent the observed frequency for each codon (number of times that a given aa is specified by a particular codon) divided by the expected frequency (number of times that a given aa is present in the protein, divided by the number of codons specifying this particular aa).

~JIA

G-

- 5

°S

M T I F O # P O O F M W G T ~ ' r & & Y ~ i E G A Y

B~IB

C-

M | | N ~r P t P P & T F M W O ~ $ T S S Y O I | G

Fig. 3. Hydropathy plots ofthe N-terminal regions (25 firsts aa) ofb&M and b&lB encoded proteins. Values in ordinates represent the indices of hydropathy determined according to Hopp and Wood (1981). Positive values correspond to hydrophobic residues,

Additionally, the products of both genes show similarities to other enzymes located intra- and extracelluiarly respec- tively in their natural host. In order to obtain further infor- mation about this subject, we have studied the N-terminal regions of the polypeptides encoded by bg/A and bglB. Fig. 3 shows the hydropathy profiles of these regions. As expected, BgiA does not show a leader peptide. The N terminus of BglB shows a length of hydrophobic sequence resembling a secretion signal; however, it does not show other characteristics expected from a leader peptide, such as basic aa preceding the hydrophobic core, or Ala residues indicating a potential cleavage site (McConnell et ai., 1986). Therefore, the localization of BglB in B. polymyxa remains unresolved.

(d) Comparison ofbglA and bgiB sequences We have compared the nt sequences of bglA and bglB,

and the aa sequences encoded by them, by using the Clustal

35

I program (Higgins and Sharp, 1988). This program assigns scores of homology which can be defined as the number of exactly matching nt, or aa, in the alignment between two sequences, minus a fixed penalty (equivalent to -3 matches) for every gap. Thus, the maximum score attainable is the length of a given sequence when compared to itself. With this criterion, the comparison of bglA and bglB gives scores of 562 for nt sequences, and 189 for aa sequences, which represents very similar degrees of homology at both levels.

MT---IFQ--FPQDF.~GTAT~AYQ~EGAYQEDGRGLSIWD ..... TFAR MSE-NTFI--FPATF~GTSTSEYQIEGGTDEGGRTP$1~D ..... TFCQ MS ........ FPKGFLWGAATASYQIEGAWNEDGKGESIWD ..... RFTH ~DPNTLAARFPGDFLF~VATAEFQ~EGSTKADG~KPSIVD ..... AFCN MKA ....... FPETFLNG~AT~NQrEGAWQEDG}%GI~TSDLQPHG~IGK

TPGKVFNGDN-GNVACDSYBRYEEDIRLMKELGIRTYRFSVSWPRIFPNG IFGKVIGGDC-GDVACDHFHHFKEDVOLMKQLGF!HYRFSVAWPRIMPAA QKRNILYGHN-GDVAC~HYHRFEEDVSLMKELGLKAYRFSIAWTRIFPDG MPGHVFGRHN-GDIACDHYNRWEEDLDLIKEMGVEATRFSLA~PR~IPDG ~EPRILGKENIKDVA~DFYHPYPEDIALFAEMCFTCL~ISIA~ARIFPQG

DGE-VNQEGLDYYHRVVDLLNDNGIEPFCTLY~I%'DLPQA~QDA-GGWG~R -GI-INEEGILFYEHELDEIEL%GLIPMLTLYHWDLPQWIEDE-GG~TQR FGT-~NQKGLEFYDRLINKLVENGIEPVVTLYHWDLP~KLQDI-GGWANP FGP-JNEKGLDFYDRLVDGCKARGIKTYATLYHWDLPLT~h~GD-GGWASR DEVEPNEAGLAFYD~LFDEMAQAGIKPLVTLSHYEMPYGLVKNYGGWANR

RTIQAFVQFAETMFREFHGKIQHWLTFNEPWCIAF~S~LGVHAPGLTNL ETIQHpKTYASVIMDRFGERIN~'NTINEPYCA$I~GYGTGEHAPGHENW EIVNYYFDYAMLVINRYKDKVKI~ITFNEPYCIAFLGYFHG~HAPGIKDF $TAHAFQRYAKTVMARLGDRLDAYATFNEPWCAVNLS~LYG~HAPGERNM AVIGHFE~YARTVFTRYQHKVALWLTFNEIN-MSLHA~FTG%GLAEESGE

QTAIDVG~HLLVA~GLSVR~FRELGTSSQIGIAPNVEWAVPYST$ ..... REAFTAA]~HILS~CHGIASNLHKEKGLTG]{IGiTLNME}~VDAASER ..... KVAMDVVHSLMLSHFKVV~VKENN~VEVG~TLNLTP%'YLQTERLGYK? EAAL~MHH~NLA}]GFGVEASRHVAPKVPVGLVLNAHSAIPASD ...... A~VYQAIHHQLVASARAVKAC~$LLFEAE~GNML ........ LGGLVYP~

--EEDRA~CARTISLM~DWFLQPIYQGSYPQFLVD~FAEQGATVP-~QDG ~-~,~=nn~WFAEPLFNGKYPEDMVE~YGTYLNGLDFVQPG

G-EA~LKAAE~ ~FQFRNGAFFDPVFKGE~PA~EALGDRMPVVEAEDLG TCO~ODMI,QAMEENRRWMFFGD~ QARGQ~ pGYMQRFFRD~NITI E~TESD

: W : |W WW .| : :

DMDII~Zp---IDMIG~NYY$~S-VNRFNPFAGFLQ~ . . . . ~ E ~ DM~LIQQp...GDFLGINYYTRS=II~STNDASLLQ~ . . . . E Q ~ ' ~ v MQOEVKENFIFPDFLOINYYTRA-VRL~DENgS ~I-FPIR~PAGE¥=" I..=ISOKL---DWWGLNYYT-P-MRVADDATPGVEFPATMPAPAVSDVK AEDLKHT .... VDFISFSYYS~TGCVSHDESINKNAQGNILNMIPNPMLKS

TDIGWpvEsRGLYEVLHYLQK-YGN-IDIY~TENGACINDEVV'NG~VQD TDMGWEIHPESFYKLLTRIEKDFSKGLPILITEN6AAMRDEL~'NGQ]ED TEMGWEVFPQGLFDLLIWIKESYPQ-IPIYITENGAAYNDIVTEDGKVHD TDIGWEVYAPALHTLVETLYERYDL-PECYITENGACYN'MGVENGEVND SE~G~QIDP~GLR' ~ , VLLNTLWDRYQK--PLFIVENGLG~KDSVEADGS~Q~. • WW . . . . . . . . . W:WW* : :~ :: ., . oo .° . , .

DR~ISyMOQHLVQVHRTIHDGLHVKGY~WSLLDNFE~AEG'YNMRFGMI TGRHGyIEEHLKACHRF~EEGGQLEGYFVWSFLDNFE~AWG'YSKRFG~V SKRIEYLKQHFEAARKA~NGVDLRGYFVWSLMDNFEWAMG'YTKRFGII QpRLDY~AE~LGI~ADLIRDGYPMRGYFAWSLMDNFEWAEG-YRM~FGLV DYRIAYLNDHLVQVNEAIADGVDINGYTSWGP~r'LVSAS~SQMSK~Y~FI

W :W :W: W :W ::** W: :W : : : : ::

HVDFRTQ ..... VRTPKESYYWYRNVVSN .... NWLETRR ...... HINYETQ ..... ERTPKQSALWFKQMMAK .... NGF .......... YVDYETQ ..... KRIKKDSFYFYQQYIK~"CI~I~T=~ ...... HVDYETQ ..... VRTVKNSGKWYSALASGFP~N~uAI,~':: : '= YVDRDDNGEGSLTRTRKKS---F~MVCAEVIKTRGLSLKK~T~KAF ::: :: *: ,:* : :

Pig, 4. Alignment of the aa sequences of five bacteri~ ~.~ucosidases. (1) B.po~myxa (bg/A product); (2) B.polymyxa (bglB product); (3) C. saccha~ticum (bg/,,l product); (4)Agrobacm~um sp (abg product); (5) E. co~ (bg/B product). Asterisks in~cate identity of a ~ven aa in ~l the sequences compared and colons conservative changes.

Hom

olog

y of

p-g

luco

sida

ses

(Bgl

) an

d p-

gala

ctos

idas

es (

.BG

al a

nd P

-~G

al)

from

dif

fere

nt m

icro

bial

spe

cies

a

1 2

3 4

5 6

7 8

9 10

11

12

13

14

15

16

17

18

1. B

. pol

ymyx

a B

gl, 4

48 a

a (b

glA

) 2.

B. p

olym

yxa

Bgl

, 448

aa

(bgl

B )

3. C

. sac

char

olyt

icum

Bgi

, 453

aa

(bgI

A )

4. A

grob

acte

rium

sp.

Bgl

, 459

aa

(abg

) 5.

E. c

oli B

gl,

471

aa (

bglB

) 6.

C. t

kerm

ocel

lum

Bgl

, 754

aa

(bgI

B)

7. K

.frag

ilb

Bgl

, 84

5 aa

8.

C. p

ellic

ulos

a B

gl,

825

aa

9. $

. fib

ulig

era

Bgl

, 87

6 aa

(bg

ll )

10.

S.flb

uflg

era

Bgl

, 88

0 aa

(bg

l2)

11.

$. G

ureu

s P-/

~Gal

, 47

0 aa

12

. L

. ca

sei

P-/~

Gal

, 47

4 an

(pb

g)

13.

L. l

actfs

P-/

~Gal

, 46

8 aa

(la

cG)

14.

L, b

uiga

ricu

s j~

Gal

, 97

1 aa

15

. E

. col

i ~G

al,

1023

aa

(lac

Z)

16.

£. c

oli p

Gal

, 96

4 aa

(eb

gA)

17.

IC p

neum

onfa

e ~G

al,

1034

aa

(Ioc

Z)

18.

B. s

tear

othe

rmop

hilu

s pG

al,

672

an (b

&aB

)

448

189

178

180

119

48

44

53

53

45

136

128

132

0 44

0

47

44

448

162

145

96

46

44

48

45

0 11

7 11

8 12

5 0

51

42

51

44

453

172

122

52

48

50

0 55

14

0 12

9 13

8 0

0 52

0

0 45

9 10

4 54

0

52

44

47

122

117

123

45

0 49

0

52

471

49

45

52

0 0

108

108

116

52

0 55

0

46

754

237

IIS

16

0 11

2 0

53

51

77

74

74

72

76

125

127

125

51

61

47

99

87

79

0 68

82

,5

319

316

0 47

0

79

81

75

79

63

876

714

54

48

53

87

95

83

89

70

880

53

0 0

93

84

79

77

65

470

241

379

0 0

0 50

5

! 47

4 24

1 56

51

44

54

45

46

8 0

44

47

0 48

97

1 7,

40

7,7,

3 22

5 78

10

23

270

~ 68

96

4 26

7 68

10

34

65

672

a S

core

s of

hom

oiog

y w

ere

dete

rmin

ed w

ith th

e C

lust

al I

prog

ram

(H

iggi

ns a

nd S

harp

, 198

8)w

ith a

gap

pen

alty

of t

hree

poi

nts.

Num

bers

in b

old

type

indi

cate

the

diff

eren

t gro

ups

of h

omol

ogy.

S

ymbo

ls i

n pa

rent

hesi

s ar

e th

e ge

nes.

37

(e) Homology analysis of microbial Bgl The nt sequences of a number of Bgl-encoding genes

from different microbial species have been pubfished. We have used the Clustal 4 program (Higgins and Sharp, 1989) to compare the deduced aa sequences of Bgl from E. coli (Schnetz et al., 1987), Clostridium thennocellum (Grabnitz etal., 1989) Agrobacterium sp. (Wakarchuk et al., 1988), Caldocellum saccharolyticum (Love etal., 1988) Candida pelliculosa (Kohchi and Toh-e, 1985), Kluyueromycesfragilis (Raynal eta]., 1987), and Saccbaromycopsis flbuligera (Machida et al., 1988), and those encoded by bg/A and bg/B genes from B. polymyxa. Fig. 4 presents the alignment of bglA and bglB products, and the enzymes of E. coil, .4gro- bacterium sp., and Caldocellum saccharolyticum, which showed a high degree of homology. The rest of the se- quences showed little or undetectable homology to this first group, but they did show it to each other. Particularly remarkable is the homology of Clostridium thermocellum Bgi to other Bgl from different yeasts. According to Grabnitz etal. (1989), this suggest a genetic exchange between bacteria and yeasts.

(t3 Homology of bglA and bglB encoded proteins to other enzymes

We have screened the National Biomedical Research Foundation- Protein Identification Resource (NBRF-PIR) data bank (September 1989 update) searching for aa sequences, other than Bgi, homologous to bglA and bglB products. The screening, carded out with the Proscan program from the commercial package DNASTAR (Madison, WI) revealed the existence of a sequence cor- responding to a phospho.~-galactosidase (P-~Gal) from Staphylococcus aureus (Breidt and Stewart, 1987) which showed extensive homology to bg/A and b&lB encoded enzymes. Further comparison with other published ~Gal sequences, not included in the data bank, revealed homology to other two P-~Gal from Lactobacillus casei (Porter and Chassy, 1988) and Lac¢ococcus lactis (formerly Streptococcus lactis) (Boizet oral., 1988; de Vos and Gasson, 1989). Table II presents the scores of homology, determined by the Clustal 1 program (see section d), of different Bgi and/~Gal. From these data, different groups of enzymes can be classified. We define type I Bgl as those homologous to Bacilluspolymyxo bglA product (considered the type enzyme of the group for having the closest sequence to the consensus). Type I includes the enzymes encoded by bflA and bg/B genes from B. polymyxa, and other three Bgl encoded by the genes abg from Agrobacterium sp, bgIA from C. saccbarolyticum, and bglB from E. coli. Type II includes Bgl homologous to Sacchoromycopsis flbuligera bgll and bgl2 products, as they are the enzymes of other two yeasts: Candida pelliculoso and Kluj:veromyces fragilis, and the product of the bglB gone of Clostridium thermocellum.

~ 1 N . . . . . . . . . r P . • F I . ~ .ATAb~QI~dt . • ~"DGI~. $~"B . . . . . . P . . , I ~ . Y . ~ . H . . . . . . I.P. DFITGGA~]rQtJLT~Y. ,'D6Xo-RV.~'D- . . . . . ~ , ~'T~'T . . . . 1'

t :~t t : • • t • t • 9Lqt.• ~ t • •

• GD~A ¢ l ) ] t T J n . ; 'LED.. ] . I ~ E ~ . G . . . T IFFS. A g ~ i t ] F P • O. C . . . ~ E . G I . . leT-~q.L. D . . -AF.PA S D F Y l l R T P V D ~ G V I ~ G I I~: SI t ~ s r ~ FIP. GYG - ~ V ~ . ' P R G T E ~ C

.. ~t I t ~ t t • I • • t ~ t t e t / ~ l ) q~ • • • : Vnt : •

• • .GZ • P . . T L T H V D L P Q • £ . O . . G GT"A.','R. T I . . F . . Y A . T V . . R . • . X . . . ~ • T I~ .Z 'P h'EI~UVEPFVTI.~F~TPE. 1 .2 . . -GDFL.~RE..~ D. F~DTA. FfFL'EFPE-VL'T~..'YTL-~'EI

: • w t t ~ • : • t ~ ~ • ~ I t l H r t l r

• C . • . L . . . . G V I L ~ P G . . - H . . • A - . & . E H . L . ~ q G . . T . A . . ~ . . . . . . ~ C I .'_zZ . . . . G ? 1 G~QT~VGEFpPG I ~TDF • IqVI~SH~4~V~IAI~VY~T3~ . GTI~GEI GA'VLtI.PT~

• . . S . . . . . . . . . E . D . . A . . R . . . . . g . . F . D P . . . G . T P . . N . . . . . . . . . . . . . . . . Y . . . . . . PYI)P. NP • D ~ D Z ~'g ~ ' I LI~TFLG . TS. L ' ~ G V . BI I . . . t . ~ ~.f,G .

G . . . . . . . . - - - - . . . . D F . GI ]<TTI '~ . . V . - . . . . . . . . . . . . . . . . . . . . . Q . . . . . E • . Tr..ED.. ~ ,'~tJd'~I..~Dl'I..O~'~%T~. S ~ Y I ~ I ~ S E ! l I~G• G. gCS Sk'T~IRGVC. It

. P . . . . . . TI~. G t , 'E~ ' .P .GL. . I .L . .L . • . Y . . . . IP~'YIY~'gfdt.. ~'D .~V .L'G . . . . .V • . p~.VPRTDg'Dit. IYP • GLTD~INRV. b~YP~ 'T . I ~ I Y I T E ~ f I 3 ~ D E ~ . EN . . . . . T V

l Z • • : • t • : : • l m ~ l t • / n t • • t

• D . . R I , Y . • , H L . . V . . . I . D G . . . . GYF. V S L . DI~'FEI~A. C , T • Iq~FG. IB~ 'DI"L 'TQ. • DD. It~ DYVI~QgrL .V. & l ~ I • DG.434VIt GTF I ~SLND~FS~0SI~'G -YEKRTGI.F~DF. T~E

. . . . . R~.R.S..~,'f . . . . . . . . . . F. . . . . . . . . . . . . . . . . . i ~ I q ~ K S A . I,'~'1K, L A E T . , l • • - . . . . . . . . . . .

t l r ~ t t e

Fig. 5. Alignment ofthe consensus sequences oftype-I Bgi (five enzymes) and tyl~-! ~G~l (three enzymes). The consensus sequences reflect identity in 3 out of 5 aa at a given position for type-I Bgi, and 2 out of 3 aa for type l-~Gal. Positions where a consensus an could not be assigned are indicated by dots. Gaps introduced by the alignment are represented by dashes, in the alignment between the two consensus sequences, coin- cidence of a given aa is represented by asterisks, and conservative changes by colons•

Analogously, we can defme a type I pGal, including enzymes homologous to type-I Bgl. To this group belong three P-/~Gal from Staphylococcus aureus, Lactobacillus casei and Lactococcus lactis. Taking in consideration data ana- lyzed by Schmidt et al. (1989), it can be defined a second group ofpGal, type II, formed by enzymes related to each other, showing no homology to any of the previous types. This group includes/~Gal from Klebsiella pneumoniae and Lactobacillus bulgaricus, and other two encoded by the lacZ and ebgA genes of E. coli. Finally, a thermostable ~Gal encoded by the bgaB gene of Bacillus stearothennophilus represents a separate enzyme not homologous to any ofthe previous groups.

Fig. 5 shows the alignment of the consensus sequences established for type-I Bgi and type-I pGai, evidencing their common evolutionary origin suggested by de Vos and Gasson (1989).

ACKNOWLEDGEMENTS

This research was funded by grant CICYT ALI88-168 from the Spanish Comisi6n Interministerial de Ciencia y Tecnologta. L.G.C. was supported by a FPI fellowship from Ministerio de Educaci6n y Ciencia.

We thank Agustt Flors and Jos6 Vicente Carbonell for encouragement and support throughout the realization of this work. Fernando Gonz~lez-Candelas and the Servicio

38

de Bioinform~tica de la Universidad de Valencia are thanked for assistance with computer work, and Alfonso Navarre and Enrique Herrero for facilitating the use of equipment.

REFERENCES

Boizet, B., Villeval, D., Slog, P., Novel, M., Novel, G. and Mercenier, A.: Cloning and expression of the phospho-O-galactosidase gone from $~ococcus lacgs into Escherichia coli. Gone 62 (1988) 249-261.

Breidt Jr., F. and Stewart, G.C.: Nucleotide and deduced amino acid sequences of the Staphylococcus aureus phospho-p-gulactosidase gone. Appl. Environ. Microbinl. 53 (1987) 969-973.

Fahmy, F., Flossdoff, J. and Claus, D.: The DNA base composition of the type strains of the genus Bacglus. Syst. AppL Microbiol. 6 0985) 60-65.

Gonzblez-Candelas, L. Aristoy, M.C. Polaina J. and Flors, A.: Cloning and characterization oftwo genes from Bacilluspolymyxa expressing ~-glucosidase activity in Escherichla coll. Appl. Environ. Microbiol. 55 (1989) 3173-3177.

Gribnitz, F., R0cknagei, K.P., Seiss, M. and Staudenbaner W.L.: Nucleotide sequence of the Closuidlum thermoceilum bgIB gene en- coding thermostable #-81ucosidase B: homology to fungal p-giu¢o- sidases. Mol. Gun. Genet. 217 (1989) 70-76.

Henikoir, S.: Unidirectional digestion with exonuelease III creates targeted breakpoints for DNA sequencing. Gone 28 (1984) 351-359.

Hissing, D.G. end Sharp P.M.: CLUSTAL, a package for performing multiple sequence alignments on a microcomputer. Gone 73 (1988) 237-244.

Hissing, D.G. and Sharp, P.M.: Fast and sensitive multiple sequence alignments on a microcomputer, CABleS 5 (1989) 151-153.

Hoop, T.P. and Wood, K.R.: Prediction of protein antigenic determinants from amino acid sequences. Proc. Nail. Acad. Sci, USA 78 (1981) 3824-3828.

Kohchl, C. and Toh.e A.: Nucieotide sequence of Candida peillculosa p-81ucosidase gone. Nucleic Acids Rex, 13 (1985) 6273-6282,

Love, D,R., Fisher, R, and Bergquist, P,L.: Sequence, structure and expression of a cloned p.glucosldase gone from an extreme thermo. philo. Mol. Gun. Ganet. 213 (1988) 84-92.

Machida, M.00htsuki, I., Fukui, S. and Yamashita I.: Nucleotide sequences of Sacchammycopslsflbul~ra genes for extracellular P-81u-

cosidases as expressed in Saccharomyces cere~isiae. Appl. Environ. Microbiol. 54 (1988) 3147-3155.

McConnell, DJ., Cantwell, B.A., Devine, K.M., Forage, AJ., Laoide, B.M., O'Kane, C., Olliagton, J.F. and Sharp, P.M.: Genetic engineer- ing of extracellalar enzyme systems of Bacilli. Ann. N.Y. Acad. Sci. 469 (1986) 1-17.

Moran Jr., C.P., Lang, N., LeGriee, S.FJ., Lee, G., Stephens, M., Sonensheim, A.L., Pero, J. and Losick, R.: Nucleotide sequences that signal the initiation of transcription and translation in Bacillus subglis. Mol. Gun. Goner. 186 (1982) 339-346.

Murray, C.L. and Rabinowitz, J.C.: Nucleotide sequences of transcription and translation initiation regions in Bacillus phage ~29 early genes. J. Biol. Chem. 257 (1982) 1053-1062.

Porter, V. and Chassy, B.M.: Nucleotide sequence of the p-e-phospho- galactoside galactohydrolase gone ofLactobacillus casei: comparison to analogous pbg genes of other Gram-positive organisms. Gene 62 (1988) 263-276.

Raynal, A., Gerbaud, C., Francingues, M.C. and Gnerinean, M.: Sequence and transcription of the p-giucosidase gone of Eluyvero- myces fragilis cloned in Saocharomyees cerevisiae. C'urr. Genet. 12 (1987) 175-184.

Schmidt, B.F., Adams, R.M., Requadt, C., Power, S. and Mainzer, E.: Expression and nucleotide sequence of the Lactobacillus bulgaricus ~.galactosidase gene in Escherichia coll. J. Bacteriol. 171 (1989) 625-635.

Schnetz, K., Toloczyki, C. and Rak, B.: p-Glueosidase (bg/) operon of Eschertchia coli K-12: nucleotide sequence, genetic organization, and possible evolutionary relationship to regulatory components of two Bacillus subglis genes. J. Bacteriol. 169 (1987) 2579-2590.

Sharp, P.M., Cowe, E,, Higgins, D.G., Shields, D.C., Wolfe, K.H. and Wright, F.: Codon usage patterns in £scherlchla colt, Bacillus sub~ils, Saccharomyces cerevlsiae, $calzosaccharemyces pombe, Drosophila melanogaster, and Homo sapiens; a review ofthe considerable within- species diversity. Nucleic Acids Reg. 16 (1988) 8207-8211.

Shine, J., and Dalgarno, L.: The Y-terminal sequence of £. coil 16S ribosomal RNA: complementarity to nonsense triplets and ribosome binding sites. Prec. Natl. Acad. Sci. USA 71 (1974) 1342-1346.

de Vos, W.M, and Ga,on M.J.: Structure and expression ofthe Laaocoe- cus I~.'tis gone for phospho-p-galactosidue (lacO) in Eseherichla cell and L. lacgs, J. Gan. MicrnbioL 135 (1989) 1833-1846.

Wakarchuk, W.W., Greenberg, N.M., Kilburn D.G., Miller Jr., R.C, and Warren, R.AJ.: Structure and transcription analysis of the gem encoding a ceilobiase from A~,obaaerlum sp strain ATCC 21400. J. BacterioL 170 (1988) 301-307.