Sequences and homology analysis of two genes encoding β-glucosidases from Bacillus polymyxa
Transcript of Sequences and homology analysis of two genes encoding β-glucosidases from Bacillus polymyxa
Gene, 95 (1990) 31-38 Elsevier
GENE 0372 !
31
Sequences and homology analysis of two genes encoding p-glneosidases from Bacillus polymyxa
(Cellobiose; cellulose hydrolysis; codon usage; p-galactosidase; recombinant DNA; signal sequences)
Luis Gonzilez-Candelas, Daniel Ram6n and Julio Polaina
Ins6tmo de Agroquimica y Tecnologla de Alimenws. C.$.L C., Valencia (Spain)
Received by M. Salas: 28 February 1990 Revised: ! I May 1990 Accepted: 14 May and 20 June 1990
SUMMARY
The nucleotide sequences of the bglA and bgiB genes encoding ~-glucosidases from Bacilluspolymyxa have been determined. Both genes contain coding regions of 1344 bp, corresponding to polypeptides with M~s of 51643 and 51547, respectively. Patterns ofcodon usage indicate that both genes are expressed at a low frequency. Previous data suggested that the proteins encoded by bglA and bglB were intra- and extracellular enzymes, respectively; however, neither of the two deduced amino acid sequences has N termini with the typical features of a leader peptide. The proteins encoded by bglA and bgIB show remarkable homology to each other and to other ~-giucosidases (Bgl) and p-galactosidases (0Gal). On the basis of the observed homologies, we can define two groups of microbial Bgi: one &them, type I, including most bacterial Bgi, and type II, including enzymes from different yeast species and one from CIostridium thermocellum. Likewise, at least two groups of /~Gal can be distinguished: type I, including enzymes homologous to type-I B81, and type II, showing no homology to any of the previous groups.
INTRODUCTION
p-Glucosidases (Bgl) catalyse the hydrolysis of cellobiose and related glycosides. These enzymes, which are widely distributed among microorganisms, are the subject of increasing interest because of their involvement in the biological saccharifieation of cellulose, the most abundant renewable carbon source on earth. ~.eports on the nt sequences of several cloned Bgi-encoding genes, mostly flora bacteria and yeasts, have appeared in the literature during tile last few years (Grabnitz et al., 1989; Kohchi and
Correspondence to: Dr. J. Polaina, Instituto de Agroquimica y Tecnologla de Alimentos, Jaime Roig ! I, 46010 Valencia (Spain) Tel. 6-369-08-00, ext. 35; Fax 6-393.00-01.
Abbreviations: aa, amino acid(s); Bgl, p.glucosidase(s); bgl, gene en- coding Bgl; bp, base pair(s); ~Gal,,B-galactosidase(s); kb, kilobase(s) or I000 bp; ORF, open reading frame; nt, nucleotide(s); P-#Gai, phospho- i~-galactosidase; PNPG, p.nitrophenyl-p-D-glucopyranoside.
Toh-e, 1985; Love et al., 1988; Machida etal., 1988; Raynai et ai., 1987; Schnetz et al., 1987; Wakarchuk et ai., 1988). The comparative analysis of sequence data allows important information to be obtained concerning the evolutionary relatedness of the genes, the localization of the active site along the primary structure of the enzyme, the relationship between structure and function, etc. On the applied side, this information is of crucial importance for the designing of tailor-made enzymes.
Bacillus polymyxa synthesizes two different Bgl. The genes coding for these two enzymes have been cloned in Escherlchia coil where they are expressed from their own promoters. Available data indicate that the enzyme encoded by one of these genes, designated bgL4, is intracellular and cleaves cellobiose probably through inorganic phosphate mediated hydrolysis. The enzyme encoded by bglB, the other gene, is extracellular, cleaves cellobiose yielding glucose as the sole product, and is particularly active on aryl-glucosides such as p-nitrophenyl-~-D-glucopyranoside (PNPG) (Gonz~dez-Candelas et al., 1989).
0378-1119/90/$03.50 © 1990 Elsevier Science Publishers B.V. (Biomedical Division)
32
The aim of present study was to determine the nt sequences ofbgIA and bglB, and to investigate the structural similarities and evolutionary relationships of Bgl to other enzymes.
RESULTS AND DISCUSSION
(a) Nueleotide sequences of bglA and bglB: general features Series of plasmids with deletions, generated by the
procedure described in Fig. 1, were used as templates in the sequencing experiments. Fig. 2A and B show the nt se- quences determined for DNA fragments containing bglA and bgiB, respectively. The two genes were entirely se- quenced on both strands. The fragment containing bgiA showed a major 1344-bp ORF (448 aa; 51643 Da). The first ATG of this frame was considered to be the initial Met because of the very good alignment with the sequences of other bacterial Bgl (see section e). Sequencing of the frag- ment containing bgiB revealed a 1344-bp ORF (448 aa; 51547 Da). As in the previous case, the first ATG was considered to be the start of the protein because of the
resulting alignment with other Bgi. The G + C contents of bglA and bglB were 46.7 and 47.1% respectively, not far from the 44.3-45.6% values given for Bacillus polyrnyxa DIqA (Fahray et al., 1985).
Sequences similar to consensus signal sequences for Bacillus (McConnell et al., 1986) are underlined in Fig. 2A and B. Probable ribosome-binding sites (Shine and Dalgarno, 1974) appear at nt -8 to -16 for bglA (Fig. 2A) and -10 to -19 for bglB (Fig. 2B). The upstream region of bglA (Fig. 2A) shows sequences similar to those recognized by the o A factor (Moran eta)., 1982; Murray and Rabinowitz, 1982). A canonical -10 sequence appears at position -22 to -27. Separated by 17 nt from this, where a -35 sequence would be expected, there is a GAAACA sharing 3 out of 6 nt with the consensus TTGACA -35 sequence. Further upstream, at position -72 to -77, there appears a TTGTCA, but this is 44 nt away from the putative -10 sequence. The upstream region ofbglB (Fig. 2B) shows possible -10 sequences at positions -78 to -83 and -85 to -90, and possible -35 sequences at positions -104 to -109 and -108 to -113. No dyad symmetry elements, that could account for transcription termination signals, have been detected.
A
i L
~DA
[ I i~ut,'
fp'
1 I
I t I ',
i lure ~v~ r~.~ A
PNPG actlvitv
B
I . , , I i
L 4
4 t
t ; I
BumHI
i-i , ,, . . . . . .... / 1 1
I i
t t _ ; |
! P 4"
• ~ t lk l )
Fig. I. Bidirectional, sequential deletions in plasmids pBG2, carrying bg//l (A), and pBG4/SC3, carrying bg/B (B), constructed by the exonuclease IIl - S I nuclease digestion protocol (Henikoff, 1984), using a commercial kit from Pharmaeia. £. co//transformants harboring plasmids with deletions were tested for Bgl activity using PNPG as the substrate (see Gonz~dez-Candelas et al., 1989). The blackened bars along the inserts ofpBG2 and pBG4/SC3 indicate the position of the genes, delimited by the deletion analysis.
A
-52
1
OAT
CATG
OTG
GCO
OG
CATA
CCG
GG
TGG
AGC'r
G3'S
CCTG
QTT
OAI
TAAC
CTG
TACG
CCCG
SCTG
nAAS
A'I'G
G
-~5
0 ~G
A~A
CT
~C~A
T~0
~TA
TG
13'C
G~A
~TT
~CT
~T~C
~TC
GA
TC
TA
CC
CT
JikT
TT
AT
rCG
~~A
0A~A
~G~
-360
T~
CT
Tr 0
0T0 C
OAC
0 GCA
GG
CATA
GCA
O&q
ATO
CTG
CTG
CAAA
OCC
G CC
S~I
~CG
AG
CTG
AC
GC
TATr
GC
cTsc
ccI'G
~A~ ~
TG
-270
~C
CG
~AG
~GAC
~TC
A~C
G~C
TTG
AAA~
Ci~C
G~A
ATTA
C~G
T~AG
CAT
GG
AAT~
GTC
C~T
TC0C
GG
C11
cGTC
CG
TGC
:GG
ACG
T 8
anl I
-1
80 T
AC
CA
CA
TCG
OTC
CA
OA
CG
GG
AA
GC
TGTA
CO
A
TTC
O CA
OTG
CA
CA
TCC
GIT
rAG
CG
TTO
ATG
ATC
GTC
G
~'T
CA
G CC
T6A
TCC
CA
AG
CA
-90
CO
Q AG
GAT
TTAT
T~r G
TCAT
GG
ACTT
TTAC
OA A
AAAG
CAO
G AA
ACAk
CG
AATG
AATA
TACG
AT
TA
TA
AT
TG
AG
GG
~A~A
GA
~CA
K
nT
1 A
TG~ C
TAIT
~'J"
r CAA
TTTC
eGCA
GG
ACTT
TATG
TGO
O~'
~ (;~G
CAAC
G GCC
GCC
TACC
AAAT
CGAG
OtgG
GC~
rACC
AGG
AGG
ATG
GAA
G A
N
T
I F
QF
P
Q
DF
H
W
G
T
A
T
A
A
Y
QI
S G
A
Y
Q
I~
0 G
R
~]
0 G
GTT
GTC
G AT
Cr S
GO
ATA
CG
Tt'T
GC
GC
ATA
CG
CC
TOG
CA
AA
GTG
TTC
A ACG
OTO
ACA
ACG
GCA
ATO
TAG
GT~
GTG
A
CA
GC
TATC
ATC
GT
0 r-
S
I W
D
T
F
A
H
T
P G
K
V
.F
N
G
D
II
0 N
V
A
C
U
S
Y
H
R
Eco
RV
18
1 TA
CO
AAO
AAG
AFAT
O~O
T~TG
ATG
AAG
GAA
CTG
GG
CAT
TCG
TAC
ATAT
CG
T~C
TC~T
CC
r~G
~TAT
A~AA
T~G
AT
61
Y
E
E
D
I R
L
H
K
E
!,
G
I
R
T
Y
R
F S
V
6 ~l
p
R
I F
P N
G
D
271
GST
GA
AO
~CA
AT
CA
A~A
OG
GA
TT
GG
AC
TA
TT
AT
CA
TC
OT
GT
AG
TT
GA
TT
TO
TT
GA
AT
gAC
SAC
GG
AA
TT
SAA
~&~A
T
91
0 E
V
N
Q
E
O
L
0 ¥
Y
H
R
V
V
D
L
L
N
0 N
G
IE
P
F C
T
L
Y
~1
C
ACTG
GO
ATU'T
OCC
TCAG
0 CG
C'r A
GAG
G &T
G CC
GG
AGG
ATG
GG
(;AAA
TCG
TCG
CAC
AATT
CAG
GC
ATTT
OTC
CAO
TT]~
(:OG
~A~
H
W
D
L
P Q
A
I.,
Q
n
A.
G
O
W
0 N
R
8
T
I Q
k
r V
Q
F
A
S T
H
~ TT
CC
GTG
AG
TTTC
AC
G
GTA
AA
ATA
G AG
G^T
T0 G
CTG
AC
ATT
CA
ATG
AA
CC
OTG
~rG
TA
TC
0 CC
TTT~
TATC
CA
ATA
I'GC
Tr~G
AG
TTC
AT
F
R
E
F
H
G
K
I Q
H
W
L
T
F
N
C
P If
C
I
A
F L
S
N
N
L
GV
H
SmA
54
1 ~C
C~.
TC
TG
AC
GA
AT
CT
CC
AG
AC
TG
C~A
TT
GA
TG
TA
~A~A
TC
AT
CT
GC
TG
GT
T~C
~CA
TG
GC
CT
AT
CT
GT
AC
ccC
GA
TT
ceG
ceA
~ 18
1 &
P
0 L
T
~
L
Q
T
A
I D
V
0
H
0 L
L
V
A
H
0
L
S V
R
R
F
0 E
26]]
CTT
G GCA
CCAG
TGO
CCAG
AT
CG
GT
AT
CO
CC
CC
AA
A'ro
TcT
CC
TG
OG
CI'G
I"I'C
~A
CA
~A
~G
AG
G~
OA
T~
~G
~
LG
TS
GQ
I
GI
AP
NV
SH
AV
PY
ST
SS
SD
KA
AC
A
~21
CGCA
CGAT
TTCC
CTS C
ACA G
TOAC
TGG
TITC
TC CA
GCC
TATT
TATC
AAG
GCT
CGTA
TCCT
CAG
TTI"I"
TGG
TAG
ACTS
Grrlr
OCG
GAA
CAO
R
T
IS
L
II
S
D
~F
L
Q
P
I ~
Q
G
S Y
P
Q
F L
V
D
U
F
A
S Q
Sll
O
GA
G CC
AC
CG
TAC
CTA
~AC
AA
GA
TG G AG
ATA
TGG
AC
ATT
ATC
G GTG
AG
CC
AA
TTG
ATA
TGA
TTG
GT
AT
CA
AT
rAT
r ATA
GTA
TG"r
cGG
TT
271
O
A
T
V
P I
Q
D
G
D
H
D
1 1
G
E
P 1
D
N
I 0
l N
Y
Y
S
14
S V
C
ta|
Eco
RV
90
1 AA
TCG
ATrT
AATC
C~A
~GC
AGG
ATTT
CT~
AATC
TGAA
GAA
ATC
AATA
T~AC
TAC
C1r
~TAA
CA~
ATA]
~G~G
~GG
A~C
A 30
1 N
R
F
f~
P E
A
0
F b
Q
S E
E
I
N
~ 0
L
P Y
T
D
I
G
M
P Y
E
$
Cla
J 09
1 CG
TGG~
GT~T
AT~G
GT~C
T~CA
TTAT
TT~C
AAAA
ATAC
GGT~
CAT~
AFAT
TTAC
ATCA
CAGA
(;AAC
GG~6
CTT6
TATC
~ATG
ATGA
G 33
1 R
G
L
Y
E
V
L
ii
YL
Q
K
¥
0 S
I D
I
Y
I T
8
8 G
A
C
I
N
D
E
1081
OTC
GTk
AA
eOO
AA
~OO
T~C
AA
GA
TOA
TGO
TC~A
AT'
~CC
TAC
ATG
~AG
CA
GC
ATT
TGG
T~G
G~C
~OA
C~A
~ 36
1 V
V
14
G
K
V
Q
D
D
R
R
I
S Y
M
Q
Q
II
L
V
Q
V
H
R
T
I
H
D
G
r.
1171
C
ATG
TTA
AA
OG
CTA
TATG
GC
ATC
GTC
AC
TTTT
G G A
CA
~TTT
TG AG
TgG
GCA
GA
OG
G GTA
TAA
TATO
AG
ATT
TG GC
~CA
TT
CA
TG
TC
eAT
39
1 H
V
K
G
Y
15
A
W
S
L
L
D
N
F E
U
A
E
G
¥
N
H
R
F G
H
I
H
v D
1261
TT
TCG
CA
CA
CA
G GTC
CG
CA
CC
CC
CA
AG
GA
~kO
TTA
CTA
TTG
OTA
TCO
AA
ATO
TAG
TAA
OTA
ATA
ACT
GG
TTG
GA
GA
CTA
GA
CG
C'rA
AA
CA
~
21
F
R
T
Q
V
R
T
P K
E
S
Y
Y
14
Y
R
[4
V
V
S N
N
~
L
£ T
R
R
--
1351
AA
GA
GA
AA
~G~T
CT
GA
AT
AT
AA
AG
cAA
AT
TT
AG
GT
CT
GC
CT
CC
GC
TC
CC
GT
TT
GT
TT
AJ~
AT
A~A
~TC
CT
CT
GG
GG
TA
AT
~TA
~
1441
GA
T•A
ATT
ACA
G•T
GG
TTG
TAsT
A•A
TAG
AG
AC•
•GCA
GG
AA
TAA
AA
AA
ATA
AA
AA
ATG
TGcA
TACT
GA
AA
AA
AA
TAG
TrG
G•
B -7
11
G
•ATC
CT•T
AGCG
1TAC
••AA•
GG
TCT•
C•TT
T•G
AG&q
TTT•
ATG
GAT
GCT
TGG
GAT
AAA•
••AAT
•TCT
CTCA
ATAT
AT
-fi3
0 G
TG G
AAC
AGTA
~-t-~
i:~TA
TAC
AG G
AGTq
'TCTA
CGTT
A'rTA
ACCC
TGAT
TATA
TCCG
TGTT
TGCC
GCG
TTTC
CG
cr&
GcA
C6 G
GG
ATAT
GT
-540
TA
A~T
r~A
~CA
Cc~
CT
T~T
AT
~T~T
TT
TT
~CT
~AT
TT
CC
AT
~TA
TC
T~C
C~A
AT
Cc~
cTA
AT
CC
~AA
TT
T~A
~TA
~G
Kpn
lr -6
50
AcT
GTAC
ATAC
cA~C
TT~G
1TrA
TTTG
CTGA
AAAA
CGAC
AGG~
C~CA
TC~C
TTTC
CTGA
T~TT
T~TG
~G~T
~T~T
CAAG
TC]rG
TCTC
-36
0 C
C~A
GA~
CTG
GAT
6AA~
GG
~TG
~AAT
GG
ACG
~ATG
C~T
TATT
CAC
GTT
ATTT
GTT
CAC
~A~T
CTT
GTA
cCAC
T~AT
~AA~
C~G
T~
-270
80C
GAC
AG O
CAT
CAT
Jr'I'r0
ACAG
CC
ATTG
GTS
CAT
S OAA
CGAC
ATTA
TAG
GTC
CTAC
CATT
TATT
TGTC
TGAT
CCCO
CCTA
TCAO
CCAG
T
-180
GAC
CAAG
GG
ACrG
TTCT
clrTT
GTA
TOG
ACAA
TACA
TG A,qC
AACT
GO
CCO
CTG
CI'T
G CTT
G CO
GTA
T~O
ATTG
TTAC
ACTA
CC
G~G
GT
Kpn
l -9
0 T
A~
O,T
~~
"r
CCTC
CAG
CG CT
TTAT
TOTC
OG
CGO
TOCA
ATG
OC
TS GAO
CTG
TGAA
GTC
T]'G
ATTC
U~.A
AAG
OAC
OG
FA C~
GAG
T
1 AT
~A~C
~A~A
ATAC
CTT
TATA
TTTC
CT~
CG
AC~T
TTAT
GT~
GG
AAC
TTC
AAC
CTC
1~C
~I'TA
TCAA
ATT~
AA~G
~TAC
A~AT
~AG
GG
C
1 H
S
E
N
T
F I
F P
A
T
F H
W
G
T
S
T
8 S
Y
Q
! E
G
0
T
D
E
G
Sma !
01
GG
CA(IA
ACQ
CCCT
CCAT
TTO
GG
ATAC
TTTT
TGTC
AAAT
C CCCG
GGAA
GGTA
ATCG
GAGG
GO AC
TGTG
fikQ
ATO
TAG
CAT
GTG
ATC
AT'rT
T 3
1G
R
T
P
S /
W
D
T
F C
Q
I
P 0
K
V
I G
G
D
C
G
D
V
A
C
"D
H
F
Ksp
I 18
1 C
ACC
ACTT
TAAA
~AA~
AC~T
~AAT
TAAT
GAA
AC~T
T~TT
TTTT
A~AT
TAT~
TTTT
TCTG
TAG
CTT
~CC
~TA~
AT~C
C~G
C0
61
fl
H
F K
£
D
V
Q
L
H
K
Q
L
0 F
L
H
Y
R
F S
V
A
H
P R
I
H
p A
271
~T~C
ATT
ATT
AA
eGA
A~A
~G~G
TT~C
TCTT
CTA
C~A
~CA
TCT~
CTG
GA
TGA
GA
TTG
AG
TTG
GC
TGt1
~AG
T~A
TCC
C~A
T~C
TQA
CG
C~r
G
91
A
0 I
I N
EE
GL
LF
YE
II
LL
DE
I
EL
AG
LI
P
HL
TL
361
TATC
ATT
(]GG
ATC
TAC
C~C
A~T
~ATT
~AG
GA
~GA
GG
GTG
~AT~
AC
AC
A~C
~A~A
GA
CT~
TCC
~AC
ATT
TTA
AA
AC
~TA
T~
CC
TCTO
TA
12
1Y
H
W
D
L
P
Q
w
I E
0
E
0 O
~
T
Q
R
E
T
1Q
H
F
K
T
Y
A
S V
A51
ATC
ATG
0ATc
t1AT
TC~C
~A~C
G~T
AAAC
T~T~
AATA
C~A
TCAA
T(]A
~CC
1TA
TTG
CG
CC
TC~r
AT1~
rT~G
GC
TAT~
GTA
CA~
A~A G
151
I H
D
R
F
G
E
R
I N
I~
14
~
T
l N
E
P
¥ C
A
S
I L
Q
y
G
T
G
E
Sp
hl
5&1
cATG
~C~T
G8C
CATG
AGAA
CT~0
~AAG
CCTT
T~CT
GCC
~CCC
~TCA
TATT
CTG
~TG
TGTC
ATG
~ATT
~CC~
C~TT
T~C.
kCAA
G
18
1H
A
P
0 H
E
N
U
R
E
A
F
T
A
AII
H
~
L
H
C
H
0 Z
A
S
N
L
li K
631
GAO
AAAG
G 6~
'~'~I
~GO
TAAG
ATTO
GC
ATTA
CST
TGA A
CATG
OAA
CATO
TGG
ATG
COG
CTTC
CG AG
CO
ACC
CG
AG@
A~TG
~G~G
CC
2
11
Z
K
G
L
T
O
K
I G
l
T
L
N
H
E
II
V
D
A
A
6 E
R
P
E
D
V
A
A
A
721
,qTTA
OAA
GAO
ATG
SCTT
TATT
AATC
OI~G
OTT
T"CG
GAG
CCAT
TGTT
TAAG
G
S~A
TC~G
~GA
TATG
OTG
OA
ATG
GTA
CG
G~C
G
241
1 R
R
D
G
F
I N
R
q
F A
E
P
L
F N
G
K
¥
P E
D
H
V
£
g y
0 T
611
TATC
TGA
ATG
GA
TTG
OA
TTTT
GTA
CA
OC
CTG
GTG
ATA
TGG
AG
CTG
ATT
CA
GC
AA
CC
G
GG
GG
ATT
TTTT
GG
G CA
TTA
AC
TATT
ATA
CC
CO
T 2
71
Y
L
N
O
L
D
F v
Q
P 0
D
H
E
L
I Q
Q
p
G
D
F L
0
I H
Y
Y
T
R
901
A~AT
~ATT
~GAT
CM
*~AA
C~A
C~C
TT~T
TGG
TGC
AA0T
A~A~
A0~T
TCAC
ATG
GAG
GAG
CC
AGTA
ACG
~C
ATG
~AT(
;~A~
ATT
30
1S
Z
I
R
S T
N
U
A
S
L
L
Q
v E
Q
V
II
H
E
E
Y
V
T
D
H
O
W
E
I
991
CA
CC
CTG
AA
TCTT
TFF~
TAA
GC
TGC
T~A
CA
C~A
TT(]A
GA
AG
GA
TTTT
AG
~AG
GG
~TG
C~A
~AC
GG
~ATG
GA
GC
AG
CG
3
31
H
P
F..
S F
Y
K
L
L
T
R
I E
K
0
F S
K
0 L
P
I L
I
T
E
N
G
A
A
1081
ATG
AOG
OAT
GAA
CTG
GTA
A.qT
GO
ACAO
ATTG
AGG
ATAC
GG
GG
COTC
ACG
OCT
ATAT
TOAA
GAO
CATT
TAAA
GG
CCTG
TCAT
COCT
TCAT
T 3
61
H
R
D
E
L
v N
0
Q
I E
D
T
0
R
H
G
Y
I E
E
H
L
K
k
C
H
8 F
I
1171
OA
AG
AG
OO
AG
GTC
Ar, CTC
A A
GO
GO
TATT
TTG
TCTG
GTC
TTTC
CTT
GA
TAA
CTT
TGA
ATO
GO
CC
TGO
GO
CTA
CA
GC
AA
(;CG
TTTT
GG
CA
TT
39
1E
E
G
(]
Q
L
K
O
Y
F
V
W
S F
L
D
N
F E
W
A
W
G
Y
S
K
R
F G
l
1261
~TG
CAT
A~rC
AATT
ACG
AGAC
0C.~
AAC
~CTC
CC
AAqC
AAAG
T~cG
CTA
T~c~
A~C
AAAT
~ATG
~CG
~AG
AAC
~G~T
A 4
21
V
H
I N
Y
E
T
O
E
R
T
P
K
Q
$ A
L
~
F K
Q
H
H
A
K
N
G
F
~,
S~
hI
1351
GAA
AA~G
GGUA
TAC~
ATGC
AGCC
GTAT
TATT
TTQA
CAAT
CAAG
GACG
TTTT
GTGA
TTGA
AAAT
TTTG
CGCA
TG~A
A~CC
CTTT
TCCA
GT
14A
1 TT
CTTG
CCCG
G•A
TTG
CTG
GTG
TT•A
GG
••ATT
••••TG
TG•G
CGTA
TTAT
•TTA
ATCG
TGG
A•AA
G•A
ATTG
CGAG
CTTT
••TG
T•G
AG
1531
GAT
AAAA
ATGG
&GCG
ATTA
TGGA
ATTT
TTTe
CGOC
TAAC
CGCA
OCTA
TAGC
CT
Fig
. 2
. T
he
n
t se
qu
en
ce
s of
bglA
(A
) a
nd
bgI
B (B
), d
ete
rmin
ed
b
y t
he
did
eo
xy
ch
ain
te
rmin
ati
on
pro
ce
du
re
an
d
de
du
ce
d
aa
se
qu
en
ce
s.
Po
ssib
le r
eg
ula
tory
se
qu
en
ce
s a
re u
nd
erl
ine
d.
Ge
nB
an
k
ac
ce
ssio
n N
os.
M
34
00
9 (b
glA
); M
34
01
0 (b
glB
).
34
(Ib) Cedon usage Table I shows the frequencies of codon usage for bglA
and bg/B. Comparing these data with the pattern observed in different species (Sharp etaL, 1988), regardless the absence of compiled data for B. polymyxa, there is strong evidence indicating that neither bglA or bglB are highly expressed genes in their natural host. Results are par- ticularly revealing if there are considered just those aa for which the same strong codon usage bias is observed in Escherichia coli, Bacillus subtii~s, Saccharomyces cerevisiae and Schizosaccharomycespombe. GCU, the most frequently used codon for Ala in highly expressed genes of these species, is used at a considerable lower frequency in both bglA and bglB. GAU coding for Asp, U G U coding for Cys, and G G G coding for Gly are used at high frequencies in both bglA and bglB, which is characteristic of genes ex- pressed at a low level in all the species considered.
(e) Analysis of the N-termini of bglA and bgiB encoded proteins
Previous data (Gonz~lez-Candelas et al., 1989) sug- gested that, in B. polymyxa, the protein encoded by bgM is intracellular, while the one encoded by bglB is extracellular. Evidence supporting this view comes from the localization of the enzymatic activities encoded by these genes when they are expressed in E. coil, where bgL4 and bglB activities are found intracellularly and in the periplasm, respectively.
TABLE I
Relative synonymous codon usage
aa Codon Frequency"
bgM bglB
Ala G C A 0,96 0,31 O C C 1,28 1.69 GCG 0.96 0,92 GCU 0.80 i.08
At8 AGA 0.72 0.95 AGO 0.00 0,63 CGA 0.96 1.26 CGC !.68 0.32 CGG 0.24 0.95 CGU 2A0 1.89
Ash A A C 0,56 1.00 AAU 1.44 1.00
Asp GAC 0,64 0,52 GAU 1,36 1,48
Cys UGC 0.00 0.33 UGU 2.00 1.67
Gin C A A 0.67 0,94 CAG 1.33 !.06
TABLE I (continued)
Relative synonymous codon usage
aa Codon Frequency*
bglA b&lB
Glu GAA 0.89 0.68 GAG !.11 1.32
Gly GGA !.33 !.21 GGC !.03 !.21 GGG 0.51 0.93 GGU 1.13 0.65
His CAC 0A7 0.60 CAU 1.53 1.40
Ile AUA 0,30 0,17 AUC 1.30 0.69 AUU 1.40 2.14
Leu CUA 0.77 0.36 CUC 0.58 0.36 CUG 1.94 2.36 CUU 0.77 0.36 UUA 0.19 0.73 UUG 1.74 1.82
Lys AAA 1.25 0.59 AAG 0.75 IAI
Met AUG 1.00 !.00
Phe U U C 0.52 0.38 UUU 1.48 !.62
Pro CCA 0.67 0.67 CCC 0.22 !.33 CCG !.33 0.67 CCU 1.78 !.33
Scr AGC 0.57 !.67 AGU 1,43 0.33 U C A 0.57 0,67 UCC 1.43 0.67 UCG 1.14 0.33 UCU 0.86 2.33
Thr A C A 1.45 0.64 A C C 0.9 ! 0.64 ACG 0.91 1,92 ACU 0.73 0.80
Trp UGG 1.00 1.00
Tyr UAC 0.70 0.47 UAU 1.30 i,53
VUl GUA 1,52 2,13 GUC 0,83 0.27 GUG 0.83 1.33 GUU 0.83 0.27
* Listed values represent the observed frequency for each codon (number of times that a given aa is specified by a particular codon) divided by the expected frequency (number of times that a given aa is present in the protein, divided by the number of codons specifying this particular aa).
~JIA
G-
- 5
°S
M T I F O # P O O F M W G T ~ ' r & & Y ~ i E G A Y
B~IB
C-
M | | N ~r P t P P & T F M W O ~ $ T S S Y O I | G
Fig. 3. Hydropathy plots ofthe N-terminal regions (25 firsts aa) ofb&M and b&lB encoded proteins. Values in ordinates represent the indices of hydropathy determined according to Hopp and Wood (1981). Positive values correspond to hydrophobic residues,
Additionally, the products of both genes show similarities to other enzymes located intra- and extracelluiarly respec- tively in their natural host. In order to obtain further infor- mation about this subject, we have studied the N-terminal regions of the polypeptides encoded by bg/A and bglB. Fig. 3 shows the hydropathy profiles of these regions. As expected, BgiA does not show a leader peptide. The N terminus of BglB shows a length of hydrophobic sequence resembling a secretion signal; however, it does not show other characteristics expected from a leader peptide, such as basic aa preceding the hydrophobic core, or Ala residues indicating a potential cleavage site (McConnell et ai., 1986). Therefore, the localization of BglB in B. polymyxa remains unresolved.
(d) Comparison ofbglA and bgiB sequences We have compared the nt sequences of bglA and bglB,
and the aa sequences encoded by them, by using the Clustal
35
I program (Higgins and Sharp, 1988). This program assigns scores of homology which can be defined as the number of exactly matching nt, or aa, in the alignment between two sequences, minus a fixed penalty (equivalent to -3 matches) for every gap. Thus, the maximum score attainable is the length of a given sequence when compared to itself. With this criterion, the comparison of bglA and bglB gives scores of 562 for nt sequences, and 189 for aa sequences, which represents very similar degrees of homology at both levels.
MT---IFQ--FPQDF.~GTAT~AYQ~EGAYQEDGRGLSIWD ..... TFAR MSE-NTFI--FPATF~GTSTSEYQIEGGTDEGGRTP$1~D ..... TFCQ MS ........ FPKGFLWGAATASYQIEGAWNEDGKGESIWD ..... RFTH ~DPNTLAARFPGDFLF~VATAEFQ~EGSTKADG~KPSIVD ..... AFCN MKA ....... FPETFLNG~AT~NQrEGAWQEDG}%GI~TSDLQPHG~IGK
TPGKVFNGDN-GNVACDSYBRYEEDIRLMKELGIRTYRFSVSWPRIFPNG IFGKVIGGDC-GDVACDHFHHFKEDVOLMKQLGF!HYRFSVAWPRIMPAA QKRNILYGHN-GDVAC~HYHRFEEDVSLMKELGLKAYRFSIAWTRIFPDG MPGHVFGRHN-GDIACDHYNRWEEDLDLIKEMGVEATRFSLA~PR~IPDG ~EPRILGKENIKDVA~DFYHPYPEDIALFAEMCFTCL~ISIA~ARIFPQG
DGE-VNQEGLDYYHRVVDLLNDNGIEPFCTLY~I%'DLPQA~QDA-GGWG~R -GI-INEEGILFYEHELDEIEL%GLIPMLTLYHWDLPQWIEDE-GG~TQR FGT-~NQKGLEFYDRLINKLVENGIEPVVTLYHWDLP~KLQDI-GGWANP FGP-JNEKGLDFYDRLVDGCKARGIKTYATLYHWDLPLT~h~GD-GGWASR DEVEPNEAGLAFYD~LFDEMAQAGIKPLVTLSHYEMPYGLVKNYGGWANR
RTIQAFVQFAETMFREFHGKIQHWLTFNEPWCIAF~S~LGVHAPGLTNL ETIQHpKTYASVIMDRFGERIN~'NTINEPYCA$I~GYGTGEHAPGHENW EIVNYYFDYAMLVINRYKDKVKI~ITFNEPYCIAFLGYFHG~HAPGIKDF $TAHAFQRYAKTVMARLGDRLDAYATFNEPWCAVNLS~LYG~HAPGERNM AVIGHFE~YARTVFTRYQHKVALWLTFNEIN-MSLHA~FTG%GLAEESGE
QTAIDVG~HLLVA~GLSVR~FRELGTSSQIGIAPNVEWAVPYST$ ..... REAFTAA]~HILS~CHGIASNLHKEKGLTG]{IGiTLNME}~VDAASER ..... KVAMDVVHSLMLSHFKVV~VKENN~VEVG~TLNLTP%'YLQTERLGYK? EAAL~MHH~NLA}]GFGVEASRHVAPKVPVGLVLNAHSAIPASD ...... A~VYQAIHHQLVASARAVKAC~$LLFEAE~GNML ........ LGGLVYP~
--EEDRA~CARTISLM~DWFLQPIYQGSYPQFLVD~FAEQGATVP-~QDG ~-~,~=nn~WFAEPLFNGKYPEDMVE~YGTYLNGLDFVQPG
G-EA~LKAAE~ ~FQFRNGAFFDPVFKGE~PA~EALGDRMPVVEAEDLG TCO~ODMI,QAMEENRRWMFFGD~ QARGQ~ pGYMQRFFRD~NITI E~TESD
: W : |W WW .| : :
DMDII~Zp---IDMIG~NYY$~S-VNRFNPFAGFLQ~ . . . . ~ E ~ DM~LIQQp...GDFLGINYYTRS=II~STNDASLLQ~ . . . . E Q ~ ' ~ v MQOEVKENFIFPDFLOINYYTRA-VRL~DENgS ~I-FPIR~PAGE¥=" I..=ISOKL---DWWGLNYYT-P-MRVADDATPGVEFPATMPAPAVSDVK AEDLKHT .... VDFISFSYYS~TGCVSHDESINKNAQGNILNMIPNPMLKS
TDIGWpvEsRGLYEVLHYLQK-YGN-IDIY~TENGACINDEVV'NG~VQD TDMGWEIHPESFYKLLTRIEKDFSKGLPILITEN6AAMRDEL~'NGQ]ED TEMGWEVFPQGLFDLLIWIKESYPQ-IPIYITENGAAYNDIVTEDGKVHD TDIGWEVYAPALHTLVETLYERYDL-PECYITENGACYN'MGVENGEVND SE~G~QIDP~GLR' ~ , VLLNTLWDRYQK--PLFIVENGLG~KDSVEADGS~Q~. • WW . . . . . . . . . W:WW* : :~ :: ., . oo .° . , .
DR~ISyMOQHLVQVHRTIHDGLHVKGY~WSLLDNFE~AEG'YNMRFGMI TGRHGyIEEHLKACHRF~EEGGQLEGYFVWSFLDNFE~AWG'YSKRFG~V SKRIEYLKQHFEAARKA~NGVDLRGYFVWSLMDNFEWAMG'YTKRFGII QpRLDY~AE~LGI~ADLIRDGYPMRGYFAWSLMDNFEWAEG-YRM~FGLV DYRIAYLNDHLVQVNEAIADGVDINGYTSWGP~r'LVSAS~SQMSK~Y~FI
W :W :W: W :W ::** W: :W : : : : ::
HVDFRTQ ..... VRTPKESYYWYRNVVSN .... NWLETRR ...... HINYETQ ..... ERTPKQSALWFKQMMAK .... NGF .......... YVDYETQ ..... KRIKKDSFYFYQQYIK~"CI~I~T=~ ...... HVDYETQ ..... VRTVKNSGKWYSALASGFP~N~uAI,~':: : '= YVDRDDNGEGSLTRTRKKS---F~MVCAEVIKTRGLSLKK~T~KAF ::: :: *: ,:* : :
Pig, 4. Alignment of the aa sequences of five bacteri~ ~.~ucosidases. (1) B.po~myxa (bg/A product); (2) B.polymyxa (bglB product); (3) C. saccha~ticum (bg/,,l product); (4)Agrobacm~um sp (abg product); (5) E. co~ (bg/B product). Asterisks in~cate identity of a ~ven aa in ~l the sequences compared and colons conservative changes.
Hom
olog
y of
p-g
luco
sida
ses
(Bgl
) an
d p-
gala
ctos
idas
es (
.BG
al a
nd P
-~G
al)
from
dif
fere
nt m
icro
bial
spe
cies
a
1 2
3 4
5 6
7 8
9 10
11
12
13
14
15
16
17
18
1. B
. pol
ymyx
a B
gl, 4
48 a
a (b
glA
) 2.
B. p
olym
yxa
Bgl
, 448
aa
(bgl
B )
3. C
. sac
char
olyt
icum
Bgi
, 453
aa
(bgI
A )
4. A
grob
acte
rium
sp.
Bgl
, 459
aa
(abg
) 5.
E. c
oli B
gl,
471
aa (
bglB
) 6.
C. t
kerm
ocel
lum
Bgl
, 754
aa
(bgI
B)
7. K
.frag
ilb
Bgl
, 84
5 aa
8.
C. p
ellic
ulos
a B
gl,
825
aa
9. $
. fib
ulig
era
Bgl
, 87
6 aa
(bg
ll )
10.
S.flb
uflg
era
Bgl
, 88
0 aa
(bg
l2)
11.
$. G
ureu
s P-/
~Gal
, 47
0 aa
12
. L
. ca
sei
P-/~
Gal
, 47
4 an
(pb
g)
13.
L. l
actfs
P-/
~Gal
, 46
8 aa
(la
cG)
14.
L, b
uiga
ricu
s j~
Gal
, 97
1 aa
15
. E
. col
i ~G
al,
1023
aa
(lac
Z)
16.
£. c
oli p
Gal
, 96
4 aa
(eb
gA)
17.
IC p
neum
onfa
e ~G
al,
1034
aa
(Ioc
Z)
18.
B. s
tear
othe
rmop
hilu
s pG
al,
672
an (b
&aB
)
448
189
178
180
119
48
44
53
53
45
136
128
132
0 44
0
47
44
448
162
145
96
46
44
48
45
0 11
7 11
8 12
5 0
51
42
51
44
453
172
122
52
48
50
0 55
14
0 12
9 13
8 0
0 52
0
0 45
9 10
4 54
0
52
44
47
122
117
123
45
0 49
0
52
471
49
45
52
0 0
108
108
116
52
0 55
0
46
754
237
IIS
16
0 11
2 0
53
51
77
74
74
72
76
125
127
125
51
61
47
99
87
79
0 68
82
,5
319
316
0 47
0
79
81
75
79
63
876
714
54
48
53
87
95
83
89
70
880
53
0 0
93
84
79
77
65
470
241
379
0 0
0 50
5
! 47
4 24
1 56
51
44
54
45
46
8 0
44
47
0 48
97
1 7,
40
7,7,
3 22
5 78
10
23
270
~ 68
96
4 26
7 68
10
34
65
672
a S
core
s of
hom
oiog
y w
ere
dete
rmin
ed w
ith th
e C
lust
al I
prog
ram
(H
iggi
ns a
nd S
harp
, 198
8)w
ith a
gap
pen
alty
of t
hree
poi
nts.
Num
bers
in b
old
type
indi
cate
the
diff
eren
t gro
ups
of h
omol
ogy.
S
ymbo
ls i
n pa
rent
hesi
s ar
e th
e ge
nes.
37
(e) Homology analysis of microbial Bgl The nt sequences of a number of Bgl-encoding genes
from different microbial species have been pubfished. We have used the Clustal 4 program (Higgins and Sharp, 1989) to compare the deduced aa sequences of Bgl from E. coli (Schnetz et al., 1987), Clostridium thennocellum (Grabnitz etal., 1989) Agrobacterium sp. (Wakarchuk et al., 1988), Caldocellum saccharolyticum (Love etal., 1988) Candida pelliculosa (Kohchi and Toh-e, 1985), Kluyueromycesfragilis (Raynal eta]., 1987), and Saccbaromycopsis flbuligera (Machida et al., 1988), and those encoded by bg/A and bg/B genes from B. polymyxa. Fig. 4 presents the alignment of bglA and bglB products, and the enzymes of E. coil, .4gro- bacterium sp., and Caldocellum saccharolyticum, which showed a high degree of homology. The rest of the se- quences showed little or undetectable homology to this first group, but they did show it to each other. Particularly remarkable is the homology of Clostridium thermocellum Bgi to other Bgl from different yeasts. According to Grabnitz etal. (1989), this suggest a genetic exchange between bacteria and yeasts.
(t3 Homology of bglA and bglB encoded proteins to other enzymes
We have screened the National Biomedical Research Foundation- Protein Identification Resource (NBRF-PIR) data bank (September 1989 update) searching for aa sequences, other than Bgi, homologous to bglA and bglB products. The screening, carded out with the Proscan program from the commercial package DNASTAR (Madison, WI) revealed the existence of a sequence cor- responding to a phospho.~-galactosidase (P-~Gal) from Staphylococcus aureus (Breidt and Stewart, 1987) which showed extensive homology to bg/A and b&lB encoded enzymes. Further comparison with other published ~Gal sequences, not included in the data bank, revealed homology to other two P-~Gal from Lactobacillus casei (Porter and Chassy, 1988) and Lac¢ococcus lactis (formerly Streptococcus lactis) (Boizet oral., 1988; de Vos and Gasson, 1989). Table II presents the scores of homology, determined by the Clustal 1 program (see section d), of different Bgi and/~Gal. From these data, different groups of enzymes can be classified. We define type I Bgl as those homologous to Bacilluspolymyxo bglA product (considered the type enzyme of the group for having the closest sequence to the consensus). Type I includes the enzymes encoded by bflA and bg/B genes from B. polymyxa, and other three Bgl encoded by the genes abg from Agrobacterium sp, bgIA from C. saccbarolyticum, and bglB from E. coli. Type II includes Bgl homologous to Sacchoromycopsis flbuligera bgll and bgl2 products, as they are the enzymes of other two yeasts: Candida pelliculoso and Kluj:veromyces fragilis, and the product of the bglB gone of Clostridium thermocellum.
~ 1 N . . . . . . . . . r P . • F I . ~ .ATAb~QI~dt . • ~"DGI~. $~"B . . . . . . P . . , I ~ . Y . ~ . H . . . . . . I.P. DFITGGA~]rQtJLT~Y. ,'D6Xo-RV.~'D- . . . . . ~ , ~'T~'T . . . . 1'
t :~t t : • • t • t • 9Lqt.• ~ t • •
• GD~A ¢ l ) ] t T J n . ; 'LED.. ] . I ~ E ~ . G . . . T IFFS. A g ~ i t ] F P • O. C . . . ~ E . G I . . leT-~q.L. D . . -AF.PA S D F Y l l R T P V D ~ G V I ~ G I I~: SI t ~ s r ~ FIP. GYG - ~ V ~ . ' P R G T E ~ C
.. ~t I t ~ t t • I • • t ~ t t e t / ~ l ) q~ • • • : Vnt : •
• • .GZ • P . . T L T H V D L P Q • £ . O . . G GT"A.','R. T I . . F . . Y A . T V . . R . • . X . . . ~ • T I~ .Z 'P h'EI~UVEPFVTI.~F~TPE. 1 .2 . . -GDFL.~RE..~ D. F~DTA. FfFL'EFPE-VL'T~..'YTL-~'EI
: • w t t ~ • : • t ~ ~ • ~ I t l H r t l r
• C . • . L . . . . G V I L ~ P G . . - H . . • A - . & . E H . L . ~ q G . . T . A . . ~ . . . . . . ~ C I .'_zZ . . . . G ? 1 G~QT~VGEFpPG I ~TDF • IqVI~SH~4~V~IAI~VY~T3~ . GTI~GEI GA'VLtI.PT~
• . . S . . . . . . . . . E . D . . A . . R . . . . . g . . F . D P . . . G . T P . . N . . . . . . . . . . . . . . . . Y . . . . . . PYI)P. NP • D ~ D Z ~'g ~ ' I LI~TFLG . TS. L ' ~ G V . BI I . . . t . ~ ~.f,G .
G . . . . . . . . - - - - . . . . D F . GI ]<TTI '~ . . V . - . . . . . . . . . . . . . . . . . . . . . Q . . . . . E • . Tr..ED.. ~ ,'~tJd'~I..~Dl'I..O~'~%T~. S ~ Y I ~ I ~ S E ! l I~G• G. gCS Sk'T~IRGVC. It
. P . . . . . . TI~. G t , 'E~ ' .P .GL. . I .L . .L . • . Y . . . . IP~'YIY~'gfdt.. ~'D .~V .L'G . . . . .V • . p~.VPRTDg'Dit. IYP • GLTD~INRV. b~YP~ 'T . I ~ I Y I T E ~ f I 3 ~ D E ~ . EN . . . . . T V
l Z • • : • t • : : • l m ~ l t • / n t • • t
• D . . R I , Y . • , H L . . V . . . I . D G . . . . GYF. V S L . DI~'FEI~A. C , T • Iq~FG. IB~ 'DI"L 'TQ. • DD. It~ DYVI~QgrL .V. & l ~ I • DG.434VIt GTF I ~SLND~FS~0SI~'G -YEKRTGI.F~DF. T~E
. . . . . R~.R.S..~,'f . . . . . . . . . . F. . . . . . . . . . . . . . . . . . i ~ I q ~ K S A . I,'~'1K, L A E T . , l • • - . . . . . . . . . . .
t l r ~ t t e
Fig. 5. Alignment ofthe consensus sequences oftype-I Bgi (five enzymes) and tyl~-! ~G~l (three enzymes). The consensus sequences reflect identity in 3 out of 5 aa at a given position for type-I Bgi, and 2 out of 3 aa for type l-~Gal. Positions where a consensus an could not be assigned are indicated by dots. Gaps introduced by the alignment are represented by dashes, in the alignment between the two consensus sequences, coin- cidence of a given aa is represented by asterisks, and conservative changes by colons•
Analogously, we can defme a type I pGal, including enzymes homologous to type-I Bgl. To this group belong three P-/~Gal from Staphylococcus aureus, Lactobacillus casei and Lactococcus lactis. Taking in consideration data ana- lyzed by Schmidt et al. (1989), it can be defined a second group ofpGal, type II, formed by enzymes related to each other, showing no homology to any of the previous types. This group includes/~Gal from Klebsiella pneumoniae and Lactobacillus bulgaricus, and other two encoded by the lacZ and ebgA genes of E. coli. Finally, a thermostable ~Gal encoded by the bgaB gene of Bacillus stearothennophilus represents a separate enzyme not homologous to any ofthe previous groups.
Fig. 5 shows the alignment of the consensus sequences established for type-I Bgi and type-I pGai, evidencing their common evolutionary origin suggested by de Vos and Gasson (1989).
ACKNOWLEDGEMENTS
This research was funded by grant CICYT ALI88-168 from the Spanish Comisi6n Interministerial de Ciencia y Tecnologta. L.G.C. was supported by a FPI fellowship from Ministerio de Educaci6n y Ciencia.
We thank Agustt Flors and Jos6 Vicente Carbonell for encouragement and support throughout the realization of this work. Fernando Gonz~lez-Candelas and the Servicio
38
de Bioinform~tica de la Universidad de Valencia are thanked for assistance with computer work, and Alfonso Navarre and Enrique Herrero for facilitating the use of equipment.
REFERENCES
Boizet, B., Villeval, D., Slog, P., Novel, M., Novel, G. and Mercenier, A.: Cloning and expression of the phospho-O-galactosidase gone from $~ococcus lacgs into Escherichia coli. Gone 62 (1988) 249-261.
Breidt Jr., F. and Stewart, G.C.: Nucleotide and deduced amino acid sequences of the Staphylococcus aureus phospho-p-gulactosidase gone. Appl. Environ. Microbinl. 53 (1987) 969-973.
Fahmy, F., Flossdoff, J. and Claus, D.: The DNA base composition of the type strains of the genus Bacglus. Syst. AppL Microbiol. 6 0985) 60-65.
Gonzblez-Candelas, L. Aristoy, M.C. Polaina J. and Flors, A.: Cloning and characterization oftwo genes from Bacilluspolymyxa expressing ~-glucosidase activity in Escherichla coll. Appl. Environ. Microbiol. 55 (1989) 3173-3177.
Gribnitz, F., R0cknagei, K.P., Seiss, M. and Staudenbaner W.L.: Nucleotide sequence of the Closuidlum thermoceilum bgIB gene en- coding thermostable #-81ucosidase B: homology to fungal p-giu¢o- sidases. Mol. Gun. Genet. 217 (1989) 70-76.
Henikoir, S.: Unidirectional digestion with exonuelease III creates targeted breakpoints for DNA sequencing. Gone 28 (1984) 351-359.
Hissing, D.G. end Sharp P.M.: CLUSTAL, a package for performing multiple sequence alignments on a microcomputer. Gone 73 (1988) 237-244.
Hissing, D.G. and Sharp, P.M.: Fast and sensitive multiple sequence alignments on a microcomputer, CABleS 5 (1989) 151-153.
Hoop, T.P. and Wood, K.R.: Prediction of protein antigenic determinants from amino acid sequences. Proc. Nail. Acad. Sci, USA 78 (1981) 3824-3828.
Kohchl, C. and Toh.e A.: Nucieotide sequence of Candida peillculosa p-81ucosidase gone. Nucleic Acids Rex, 13 (1985) 6273-6282,
Love, D,R., Fisher, R, and Bergquist, P,L.: Sequence, structure and expression of a cloned p.glucosldase gone from an extreme thermo. philo. Mol. Gun. Ganet. 213 (1988) 84-92.
Machida, M.00htsuki, I., Fukui, S. and Yamashita I.: Nucleotide sequences of Sacchammycopslsflbul~ra genes for extracellular P-81u-
cosidases as expressed in Saccharomyces cere~isiae. Appl. Environ. Microbiol. 54 (1988) 3147-3155.
McConnell, DJ., Cantwell, B.A., Devine, K.M., Forage, AJ., Laoide, B.M., O'Kane, C., Olliagton, J.F. and Sharp, P.M.: Genetic engineer- ing of extracellalar enzyme systems of Bacilli. Ann. N.Y. Acad. Sci. 469 (1986) 1-17.
Moran Jr., C.P., Lang, N., LeGriee, S.FJ., Lee, G., Stephens, M., Sonensheim, A.L., Pero, J. and Losick, R.: Nucleotide sequences that signal the initiation of transcription and translation in Bacillus subglis. Mol. Gun. Goner. 186 (1982) 339-346.
Murray, C.L. and Rabinowitz, J.C.: Nucleotide sequences of transcription and translation initiation regions in Bacillus phage ~29 early genes. J. Biol. Chem. 257 (1982) 1053-1062.
Porter, V. and Chassy, B.M.: Nucleotide sequence of the p-e-phospho- galactoside galactohydrolase gone ofLactobacillus casei: comparison to analogous pbg genes of other Gram-positive organisms. Gene 62 (1988) 263-276.
Raynal, A., Gerbaud, C., Francingues, M.C. and Gnerinean, M.: Sequence and transcription of the p-giucosidase gone of Eluyvero- myces fragilis cloned in Saocharomyees cerevisiae. C'urr. Genet. 12 (1987) 175-184.
Schmidt, B.F., Adams, R.M., Requadt, C., Power, S. and Mainzer, E.: Expression and nucleotide sequence of the Lactobacillus bulgaricus ~.galactosidase gene in Escherichia coll. J. Bacteriol. 171 (1989) 625-635.
Schnetz, K., Toloczyki, C. and Rak, B.: p-Glueosidase (bg/) operon of Eschertchia coli K-12: nucleotide sequence, genetic organization, and possible evolutionary relationship to regulatory components of two Bacillus subglis genes. J. Bacteriol. 169 (1987) 2579-2590.
Sharp, P.M., Cowe, E,, Higgins, D.G., Shields, D.C., Wolfe, K.H. and Wright, F.: Codon usage patterns in £scherlchla colt, Bacillus sub~ils, Saccharomyces cerevlsiae, $calzosaccharemyces pombe, Drosophila melanogaster, and Homo sapiens; a review ofthe considerable within- species diversity. Nucleic Acids Reg. 16 (1988) 8207-8211.
Shine, J., and Dalgarno, L.: The Y-terminal sequence of £. coil 16S ribosomal RNA: complementarity to nonsense triplets and ribosome binding sites. Prec. Natl. Acad. Sci. USA 71 (1974) 1342-1346.
de Vos, W.M, and Ga,on M.J.: Structure and expression ofthe Laaocoe- cus I~.'tis gone for phospho-p-galactosidue (lacO) in Eseherichla cell and L. lacgs, J. Gan. MicrnbioL 135 (1989) 1833-1846.
Wakarchuk, W.W., Greenberg, N.M., Kilburn D.G., Miller Jr., R.C, and Warren, R.AJ.: Structure and transcription analysis of the gem encoding a ceilobiase from A~,obaaerlum sp strain ATCC 21400. J. BacterioL 170 (1988) 301-307.