Supporting Information - pnas. · PDF file... (RT) to a point 100 C below. The two measures of...

7
DRAFT Supporting Information Dezhen Xue, et al. This supporting information was compiled on September 14, 2016 (x)= Ax + C (x)= A1x 2 + A2x + C 100 C dx = 0.3895 A Fig. S 1. Definitions of property. These included (a) a linear fitting of the phase boundary via the coefficient α in τ (x)= Ax + C (b) a quadratic fitting, τ (x)= A1x 2 + A2x + C and a linear composition interval dx from the phase boundary at room temperature (RT) to a point 100 C below. The two measures of the property, A and dx, are highly correlated with each other, as shown in the inset. Note1: Property Figure S1 shows a typical experimental phase diagram with tetragonal and rhombohedral ends. The MPB is a function of transition temperature τ and concentration x. We chose two ways to characterize the “verticality" of the MPB. These included (a) a linear fitting of the phase boundary via the coefficient A in τ (x)= Ax + C (b) a quadratic fitting, τ (x)= A1x 2 + A2x + C and a linear composition interval dx from the phase boundary at room temperature to a point 100K below. The two measures of the property, A and dx, are highly correlated with each other, as shown in the inset of Figure S1. Note2: Chemical Design Space and Training Data Design Space. We constrain our problem to Ba(ZrnTi1-n)O3 - x(Ba1-mCam)T iO3 and Ba(SnnTi1-n)O3 - x(Ba1-mCam)T iO3, where m and n are the concentrations of Ca 2+ and Zr 4+ or Sn 4+ , respectively. For Ca 2+ , the lower limit was set to 18% to maintain the pure cubic to tetragonal transition and the upper limit was 50%, above which secondary phases could appear, i.e., 18% m 50%. [1] Similarly, for Sn 4+ the range for n was 10% n 30%, and for Zr 4+ the range was 15% n 30%. These were selected to ensure a pure cubic to rhombohedral transition. [2, 3] Our experimental arrangement allowed us to vary the composition by 1%, thus different combinations of m and n gave rise to 693 phase diagrams for the Ba(SnnTi1-n)O3 - x(Ba1-mCam)T iO3 system and 529 phase diagrams for Ba(ZrnTi1-n)O3 - x(Ba1-mCam)T iO3. The total chemical space was thus 1222. Within those phase diagrams, we have 19 already known or reported, which served as our training data. The remaining phase diagrams formed a virtual space consisting of 1203 possible phase diagrams that we explored. Thus, summarizing, • search space size, N = 1222 • training set size, n = 19 out of N = 1222 • virtual set size, N - n = 1203 Note that to avoid complexity related to processing conditions, raw materials, microstructure, all samples both in the train- ing set and our newly made samples were synthesized and measured in our group under the same, controlled conditions. Training Data. Figure S2 shows 19 phase diagrams for the BaTiO3 based systems that were synthesized and measured. All the phase diagrams were characterized by a morphotropic phase boundary (MPB) separating tetragonal (T) phase and rhombohedral (R) phases and starting from a Cubic-T-R triple point. Among them, phase diagrams 1, 15, 16, 18 have been reported in the literature [47]. Phase diagram 20 in Figure S2 is our new phase diagram, the outcome of our prediction. The property of interest is related to a measure of the curvature of the MPB, as well as the slope. The former, denoted by dx, is a linear composition interval from the phase boundary at room temperature to a point 100K below, while the latter is obtained by a linear fitting to the MPB. Table S1 lists all the training data as well as the new phase diagram. Each data point represents a phase diagram and is identified by the combination of a tetragonal-end and a rhombohedral-end. The six features used in the present study that affect the slope of the MPB are also listed. Note3: Constrained Bayesian Linear Regression for Predicting Phase Transition Curves We propose a linear regression model to predict the phase transition boundaries of piezoelectric materials with two dopants based on the six dopant-dependent features: p = V ol, tf, δDT ,δDR,renc,ren). We consider a data set con- sisting of phase transition temperatures {τs,i,j } measured at different dopant compositions {xs,i,j }, where s ∈{RT,TC} de- notes the MPB or paraelectric-ferroelectric (PF ) phase bound- ary, i and j denote the j -th dopant composition of the i-th compound. Inspired by the domain knowledge that the differ- ence between the two phase boundaries is a quadratic function of relative dopant compositions, we consider a quadratic model for both boundaries of each compound: τs,i = as,i x 2 s,i + bs,i xs,i + cs,i , [1] where the component-specific coefficients gs,i ; g ∈{a, b, c} are assumed to be linearly dependent on the compound features, gs,i = f T i β g,s . [2] We construct the feature vector f i of each compound by concatenating the six physical properties {pm; m =1, ..., 6} 1

Transcript of Supporting Information - pnas. · PDF file... (RT) to a point 100 C below. The two measures of...

DRAFT

Supporting InformationDezhen Xue, et al.

This supporting information was compiled on September 14, 2016

⌧(x) = Ax + C

⌧(x) = A1x2 + A2x + C

100

� C

dx = 0.3895

A

Fig. S 1. Definitions of property. These included (a) a linear fitting of thephase boundary via the coefficient α in τ(x) = Ax + C (b) a quadratic fitting,τ(x) = A1x

2 + A2x + C and a linear composition interval dx from the phaseboundary at room temperature (RT) to a point 100 ◦C below. The two measures ofthe property, A and dx, are highly correlated with each other, as shown in the inset.

Note1: Property

Figure S1 shows a typical experimental phase diagram withtetragonal and rhombohedral ends. The MPB is a functionof transition temperature τ and concentration x. We chosetwo ways to characterize the “verticality" of the MPB. Theseincluded (a) a linear fitting of the phase boundary via thecoefficient A in τ(x) = Ax+ C (b) a quadratic fitting, τ(x) =A1x

2 + A2x + C and a linear composition interval dx fromthe phase boundary at room temperature to a point 100Kbelow. The two measures of the property, A and dx, are highlycorrelated with each other, as shown in the inset of Figure S1.

Note2: Chemical Design Space and Training Data

Design Space. We constrain our problem toBa(ZrnT i1−n)O3 − x(Ba1−mCam)T iO3 andBa(SnnT i1−n)O3 − x(Ba1−mCam)T iO3, where m andn are the concentrations of Ca2+ and Zr4+ or Sn4+,respectively. For Ca2+, the lower limit was set to 18% tomaintain the pure cubic to tetragonal transition and theupper limit was 50%, above which secondary phases couldappear, i.e., 18% ≤ m ≤ 50%. [1] Similarly, for Sn4+ therange for n was 10% ≤ n ≤ 30%, and for Zr4+ the range was15% ≤ n ≤ 30%. These were selected to ensure a pure cubic torhombohedral transition. [2, 3] Our experimental arrangementallowed us to vary the composition by 1%, thus differentcombinations of m and n gave rise to 693 phase diagrams forthe Ba(SnnT i1−n)O3 − x(Ba1−mCam)T iO3 system and 529phase diagrams for Ba(ZrnT i1−n)O3 − x(Ba1−mCam)T iO3.The total chemical space was thus 1222. Within those phasediagrams, we have 19 already known or reported, which servedas our training data. The remaining phase diagrams formed avirtual space consisting of 1203 possible phase diagrams thatwe explored.

Thus, summarizing,• search space size, N = 1222

• training set size, n = 19 out of N = 1222

• virtual set size, N - n = 1203

Note that to avoid complexity related to processing conditions,raw materials, microstructure, all samples both in the train-ing set and our newly made samples were synthesized andmeasured in our group under the same, controlled conditions.

Training Data. Figure S2 shows 19 phase diagrams for theBaTiO3 based systems that were synthesized and measured.All the phase diagrams were characterized by a morphotropicphase boundary (MPB) separating tetragonal (T) phase andrhombohedral (R) phases and starting from a Cubic-T-R triplepoint. Among them, phase diagrams 1, 15, 16, 18 have beenreported in the literature [4–7]. Phase diagram 20 in Figure S2is our new phase diagram, the outcome of our prediction. Theproperty of interest is related to a measure of the curvatureof the MPB, as well as the slope. The former, denoted by dx,is a linear composition interval from the phase boundary atroom temperature to a point 100K below, while the latter isobtained by a linear fitting to the MPB.

Table S1 lists all the training data as well as the new phasediagram. Each data point represents a phase diagram andis identified by the combination of a tetragonal-end and arhombohedral-end. The six features used in the present studythat affect the slope of the MPB are also listed.

Note3: Constrained Bayesian Linear Regression forPredicting Phase Transition Curves

We propose a linear regression model to predict the phasetransition boundaries of piezoelectric materials with twodopants based on the six dopant-dependent features: p =(∆V ol, tf, δDT , δDR, renc, ren). We consider a data set con-sisting of phase transition temperatures {τs,i,j} measured atdifferent dopant compositions {xs,i,j}, where s ∈ {RT, TC} de-notes the MPB or paraelectric-ferroelectric (PF ) phase bound-ary, i and j denote the j-th dopant composition of the i-thcompound. Inspired by the domain knowledge that the differ-ence between the two phase boundaries is a quadratic functionof relative dopant compositions, we consider a quadratic modelfor both boundaries of each compound:

τs,i = as,ix2s,i + bs,ixs,i + cs,i, [1]

where the component-specific coefficients gs,i; g ∈ {a, b, c} areassumed to be linearly dependent on the compound features,

gs,i = fTi βg,s. [2]

We construct the feature vector f i of each compound byconcatenating the six physical properties {pm;m = 1, ..., 6}

1

DRAFT

Fig. S 2. 20 phase diagrams used in the present study. 1 - 19 serve as the training data whereas 20 is the new system predicted by our Bayesian approachand adaptive design. All phase diagrams are characterized by a morphotropic phase boundary as shown in red symbols and curves.

2 | www.pnas.org/cgi/doi/10.1073/pnas.XXXXXXXXXX Xue

DRAFT

Table S 1. The data used in the present study. Each data point represents a phase diagram and is identified by a specific combination of acertain tetragonal-end and a rhombohedral-end.The properties of interest is the slope of the MPB, obtained by a linear fitting to the MPB, aswell as dx (the curvature or the composition change on varying the temperature by 100K from the triple point along the MPB), obtained by aquadratic fitting to the MPB. The six features used in the present work to predict the slope of the MPB are also listed.

No. T-end R-end slope dx ∆V ol tf renc ren ∆DT ∆DR

1 (Ba0.80Ca0.2)T iO3 Ba(T i0.88Sn0.12)O3 153.0 0.5277 -1.7949 1.8011 0.8405 0.7193 0.3059 0.16182 (Ba0.75Ca0.25)T iO3 Ba(T i0.88Sn0.12)O3 226.5 0.4043 -2.1459 1.7854 0.8212 0.7219 0.3411 0.16183 (Ba0.60Ca0.40)T iO3 Ba(T i0.88Sn0.12)O3 259.2 0.3098 -2.8442 1.7386 0.7635 0.7295 0.4468 0.16184 (Ba0.70Ca0.30)TiO3 Ba(T i0.78Sn0.22)O3 270.1 0.3094 -3.0299 1.7441 0.7928 0.7038 0.3764 0.14345 (Ba0.70Ca0.30)T iO3 Ba(T i0.88Sn0.12)O3 270.2 0.3362 -2.4957 1.7698 0.8020 0.7244 0.3764 0.16186 (Ba0.82Ca0.18)T iO3 Ba(T i0.78Sn0.22)O3 274.1 0.3541 -2.1882 1.7811 0.8384 0.6979 0.2918 0.14347 (Ba0.70Ca0.30)T iO3 Ba(T i0.86Sn0.14)O3 287.2 0.3306 -2.6023 1.7646 0.8001 0.7202 0.3764 0.15818 (Ba0.70Ca0.30)T iO3 Ba(Ti0.82Sn0.18)O3 297.3 0.3318 -2.8159 1.7543 0.7964 0.7119 0.3764 0.15089 (Ba0.60Ca0.40)T iO3 Ba(Ti0.84Sn0.16)O3 320.0 0.2918 -3.0575 1.7284 0.7600 0.7211 0.4468 0.154510 (Ba0.55Ca0.45)T iO3 Ba(Ti0.83Sn0.17)O3 370.0 0.2853 -3.1109 1.7103 0.7400 0.7215 0.4820 0.152611 (Ba0.50Ca0.50)T iO3 Ba(Ti0.88Sn0.12)O3 373.0 0.2593 -2.8442 1.7073 0.7250 0.7346 0.5173 0.161812 (Ba0.60Ca0.40)T iO3 Ba(Ti0.78Sn0.22)O3 408.0 0.1679 -3.3783 1.7133 0.7547 0.7088 0.4468 0.143413 (Ba0.82Ca0.18)T iO3 Ba(Ti0.70Sn0.30)O3 442.3 0.2327 -2.6177 1.7606 0.8308 0.6824 0.2918 0.128714 (Ba0.60Ca0.40)T iO3 Ba(Ti0.82Sn0.18)O3 452.1 0.2342 -3.1643 1.7233 0.7582 0.7170 0.4468 0.150815 (Ba0.80Ca0.20)T iO3 Ba(T i0.85Zr0.15)O3 312.0 0.2646 -2.5196 1.7801 0.7814 0.7540 0.3059 0.156316 (Ba0.70Ca0.30)T iO3 Ba(T i0.80Zr0.20)O3 464.8 0.1974 -3.6787 1.7324 0.7255 0.7623 0.3764 0.147117 (Ba0.60Ca0.40)T iO3 Ba(T i0.75Zr0.25)O3 491.6 0.1895 -4.4875 1.6856 0.6725 0.7707 0.4468 0.138018 (Ba0.60Ca0.40)T iO3 Ba(T i0.80Hf0.20)O3 404.6 0.2066 -3.6258 1.7382 0.6732 0.7611 0.3764 0.147219 (Ba0.82Ca0.18)T iO3 Ba(T i0.70Zr0.30)O3 418.8 0.2053 -3.7598 1.7357 0.7279 0.7617 0.2918 0.128820 (Ba0.50Ca0.50)T iO3 Ba(T i0.70Zr0.30)O3 659.0 0.1310 -4.9499 1.6397 0.6223 0.7791 0.5173 0.1288

and their products {pmpn;m,n = 1, ..., 6}. As such, the phasetransition curves in Equation 1 can be expressed as

τs,i = as,ix2s,i + bs,ixs,i + cs,i

= fTi βa,sx

2s,i + fT

i βb,sxs,i + fTi βc,s

=[fT

i x2s,i, f

Ti xs,i, f

Ti

] [βa,s;βb,s;βc,s

]. [3]

For each experimentally examined compound at composi-tion xs,i,j , we define Ds,i,j

.=[fT

i x2s,i,j , f

Ti xs,i,j , f

Ti

]. Then

the corresponding phase transition temperature τs,i,j can be

expressed asτs,i,j = Ds,i,jβs + εs, [4]

where βs

.=[βa,s;βb,s;βc,s

]is the compound-independent

regression coefficient vector and εs ∼ N(0, e−1

s

)denotes the

i.i.d. Gaussian noise. This allows for the following probabilitymodel of τs,i,j

π (τs,i,j |Ds,i,j ,βs, es) = N(Ds,i,jβs, e−1s ), [5]

and the corresponding joint distribution

π (τMP B , τP F |DMP B ,DP F ,βMP B ,βP F , eMP B , eP F ) = N(DMP BβMP B , e

−1MP BI

)N(DP FβP F , e

−1P F I

), [6]

where τ s andDs are formed by concatenating τs,i,j andDs,i,j

respectively from all data points. We assume a Gaussian priordistribution of βs, which is further constrained by the domainknowledge: for each compound, the two phase boundariesintersect at 0 ≤ li < x < ui ≤ 1, and the MPB is concavewith aMP B < 0. By mapping such domain knowledge to the

following linear constraints: ∀i[fT

i u2i , f

Ti ui, f

Ti

](βP F − βMP B) > 0,[

fTi l

2i , f

Ti li, f

Ti

](βP F − βMP B) < 0,

fTi βMP B < 0,

[7]

we obtain a truncated Gaussian distribution of βs,

π (βMP B ,βP F |αMP B , αP F ) =

{N(0, α−1

MP BI)N(0, α−1

P F I), if {βMP B ,βP F } ∈ Q,

0, otherwise,[8]

where Q is the set of {βMP B ,βP F } satisfying thelinear constraints Equation 7. From Equations 6

and 8, we obtain the posterior distribution of βs:

Xue PNAS | September 14, 2016 | vol. XXX | no. XX | 3

DRAFT

{π (βMP B ,βP F |yMP B ,yP F ) ∝ N (µMP B ,ΣP F )N (µP F ,ΣP F ) , if {βMP B ,βP F } ∈ Q;π (βMP B ,βP F |yMP B ,yP F ) = 0, otherwise,

[9]

Fig. S 3. Measured (x-axis) vs. Predicted (y-axis) dx from Bayesian linear regression.The red data point corresponds to the predicted value for BZT-m50-n30. The x-axiserror bar corresponds to the fitting error of the quadratic fit, τ(x) = A1x

2 +A2x+C.The y-axis error bar is from the Bayesian regression model.

where Σs =(αsI + esD

Ts Ds

)−1 and µs = esΣsDTs τ s.

The optimal values of the hyperparameters{eMP B , eP F , αMP B , αP F } in our Bayesian model aredetermined by a grid search based on leave-one-outcross-validation. Each time, we

• draw random samples from the posterior distribution(Equation 9) learned using 18 phase diagrams from thetraining data set

• use the resulting samples of βs to predict the ensembleof phase transition curves of the hold-out compound

• repeat this process for each compound from the trainingset

• pick {eMP B , eP F , αMP B , αP F } with the smallest ex-pected prediction error.

After determining the hyper parameters, we train the modelusing the full data set with 19 phase diagrams. It can beseen that the Bayesian model performs well on the trainingdata (see Figure S3). We then use the model to predict thephase transition curves for all candidates in the virtual setwith known six physical properties. From the mean phasetransition curves of each compound, we compute the changeof relative composition dx along the vertical phase transitioncurve when the temperature decreases by 100 K from theroom temperature. The compounds with the triple pointbelow room temperature are excluded from the analysis. Wepick the compound with the smallest dx as our best candidatefor the next experiment to examine its phase transition curves.The obtained data are added to the training data set for thenext round of model updates and best candidate prediction.We repeat this process until we obtain the desired materialsvalidated in experiments.

Note4: Data-driven machine learning

A. Regression. Our machine learning strategy is given in Fig-ure S4. We utilized several machine learning regression meth-ods including a simple linear regression (LIN) model andSupport Vector Regression with a radial basis function kernel(SVRrbf) and a linear kernel (SVRlin) [8–11]. We constructed

1000 “bootstrap” samples (sampling with replacement) [12] of19 data points each and trained 1000 regression models (en-semble of regressors). Each regression model gives a predictionfor A and therefore we have 1000 predictions of A for eachchemical composition. From these 1000 predictions, we esti-mate the mean (µ) and standard deviation (σ) for A for eachcomposition. Our results show that the ensemble of regressorsperforms fairly well on the training data without overfitting:The R2 value for LIN, SVRlin and SVRrbf are 0.7629, 0.7625and 0.8789, respectively (see Figure S5). In our next step, weapplied the ensemble of regressors to predict the A coefficientsfor all candidates in the virtual set and estimated the mean(µ) and standard deviation (σ) from those predictions.

B. Design using Selectors. The selectors choose the next ex-periment to perform by making optimal choices of mate-rials to test. We choose as our design approach EfficientGlobal Optimization (EGO) [13] to choose potential candi-dates by maximizing the “expected improvement” (EI), f(µ, σ)over the search space. The improvement, I, is defined bymax(Y − µ∗, 0), where Y is a random variable chosen from adistribution where the uncertainties are assumed to be nor-mally distributed and where the mean of the property is µwith standard deviation, σ, and µ∗ is the “best-so-far” valueof the property, assuming it to be a maximum. The expectedimprovement, defined as f(µ, σ) =

∫Iφ(z)dz, where φ(z) is

the standard normal distribution, gives the improvement onthe current best estimate of the target property by samplingfrom compounds in the search space. The integral is easilyevaluated and f assumes the following forms for the differenceselectors:

• Max: f(µ, σ) = µ − µ∗: Greedily chooses the materialwith maximum predicted value.

• EGO (efficient global optimization) [13]: Maximizes the“expected improvement” f(µ, σ) = σ[φ(z)+zΦ(z)], wherez = (µ − µ∗)/σ and µ∗ is the maximum value observedso far in the training set. φ(z) and Φ(z) are the standardnormal density and distribution functions, respectively.

• KG (knowledge gradient) [14]: f(µ, σ) = σ[φ(z) + zΦ(z)],where z = (µ− µ∗∗)/σ, where µ∗∗ is the maximum valueof either µ∗ or µ′; µ′ is the maximum predicted value inthe virtual set.

Note5: Adaptive Design

These regressors make fairly good predictions for the trainingdata without overfitting, as shown in Figure S5 (a1), (b1) and(c1). The predicted mean (µ) from SVRlin as a function ofm and n for both BZT and BTS systems are shown in thecontour plot Figure S6 (a). We use selectors to choose thenext experiment to be performed. Figure S6 (a) shows theEI as a function of various values of m and n for both BZTand BTS systems for SVRlin, which we are going to maximize.All regressor, selector combinations employing EI using EGOsuggested the same BZT-m50-n30 as the next experiment.

4 | www.pnas.org/cgi/doi/10.1073/pnas.XXXXXXXXXX Xue

DRAFT

A! B!

C!D!

Unexplored phase diagrams

Fig. S 4. Data-driven learning for materials design: (A) An initial experimental data set of 19 phase diagrams with known slope, and features or materials descriptors serves asinput to the inference model. (B) The model is trained and cross-validated with the initial piezoelectric data. The trained model in (B2) is applied to a data set of unexploredphase diagrams defined as the total search space for predicting the MPB slope. (C) The design chooses the “best” candidate for synthesis and characterization. (D) Thechosen phase diagrams with measured slope augments the initial data set to further improve the inference and design.

Table S 2. Results of different regressor:selector combinations.

regressor selector1st iteration 2nd iteration

candidate EI from EGO EI from KG µ candidate EI from EGO EI from KG µ

LIN MAX BZT-m50-n30 685.59 564.48 719.21 BZT-m50-n30 261.26 261.26 670.05LIN EGO BZT-m50-n30 685.59 564.48 719.21 BZT-m50-n30 261.26 261.26 670.05LIN KG BZT-m50-n30 685.59 564.48 719.21 BZT-m50-n30 261.26 261.26 670.05

SVRlin MAX BZT-m50-n30 131.88 57.44 605.98 BZT-m50-n30 10.65 10.65 631.33SVRlin EGO BZT-m50-n30 131.88 57.44 605.98 BZT-m50-n30 10.65 10.65 631.33SVRlin KG BZT-m50-n30 131.88 57.44 605.98 BZT-m50-n30 10.65 10.65 631.33SVRrbf MAX BZT-m50-n30 28.93 28.93 464.28 BZT-m50-n30 13.63 13.63 578.78SVRrbf EGO BZT-m50-n30 28.93 28.93 464.28 BZT-m50-n30 13.63 13.63 578.78SVRrbf KG BZT-m50-n30 28.93 28.93 464.28 BZT-m50-n30 13.63 13.63 578.78

Xue PNAS | September 14, 2016 | vol. XXX | no. XX | 5

DRAFT

0 1 0 0 2 0 0 3 0 0 4 0 0 5 0 0 6 0 0 7 0 0 8 0 00

1 0 0

2 0 0

3 0 0

4 0 0

5 0 0

6 0 0

7 0 0

8 0 00 1 0 0 2 0 0 3 0 0 4 0 0 5 0 0 6 0 0 7 0 0 8 0 00

1 0 0

2 0 0

3 0 0

4 0 0

5 0 0

6 0 0

7 0 0

8 0 0

0 1 0 0 2 0 0 3 0 0 4 0 0 5 0 0 6 0 0 7 0 0 8 0 00

1 0 0

2 0 0

3 0 0

4 0 0

5 0 0

6 0 0

7 0 0

8 0 0

0 1 0 0 2 0 0 3 0 0 4 0 0 5 0 0 6 0 0 7 0 0 8 0 00

1 0 0

2 0 0

3 0 0

4 0 0

5 0 0

6 0 0

7 0 0

8 0 00 1 0 0 2 0 0 3 0 0 4 0 0 5 0 0 6 0 0 7 0 0 8 0 00

1 0 0

2 0 0

3 0 0

4 0 0

5 0 0

6 0 0

7 0 0

8 0 0

0 1 0 0 2 0 0 3 0 0 4 0 0 5 0 0 6 0 0 7 0 0 8 0 00

1 0 0

2 0 0

3 0 0

4 0 0

5 0 0

6 0 0

7 0 0

8 0 0

0 1 0 0 2 0 0 3 0 0 4 0 0 5 0 0 6 0 0 7 0 0 8 0 00

1 0 0

2 0 0

3 0 0

4 0 0

5 0 0

6 0 0

7 0 0

8 0 00 1 0 0 2 0 0 3 0 0 4 0 0 5 0 0 6 0 0 7 0 0 8 0 00

1 0 0

2 0 0

3 0 0

4 0 0

5 0 0

6 0 0

7 0 0

8 0 00 1 0 0 2 0 0 3 0 0 4 0 0 5 0 0 6 0 0 7 0 0 8 0 00

1 0 0

2 0 0

3 0 0

4 0 0

5 0 0

6 0 0

7 0 0

8 0 0

Pred

icted

Value

S V R . l i n w i t h t r a i n i n g d a t a( b 1 )

Pred

icted

Value

l i n e a r r e g r e s s i o nw i t h t r a i n i n g d a t a

( a 1 )

l i n e a r r e g r e s s i o nw i t h f e e d b a c k

( a 3 )

M e a s u r e d V a l u e

S V R . r b f w i t h f e e d b a c k

( c 3 )

S V R . l i n w i t h f e e d b a c k( b 3 )

M e a s u r e d V a l u e

Pred

icted

Value

S V R . r b f w i t h t r a i n i n g d a t a( c 1 )

M e a s u r e d V a l u e

S V R . r b f p r e d i c t i o n( c 2 )

S V R . l i n p r e d i c t i o n( b 2 )

l i n e a r r e g r e s s i o np r e d i c t i o n

( a 2 )

Fig. S 5. The performance of the regressors: (a1), (b1), (c1) on the initial training data set. Uncertainties were obtained bybootstrap resampling the initial 19 data points 1000 times. (a2), (b2), (c2), The red points are the out of sample predictionsfrom the regressors with the initial training data set (in blue), (a3), (b3), (c3) The performance on the updated training setaugmented with the new data point.

Fig. S 6. (a) Predictions for all compositions in the virtual dataset using the SVRlin regressor trained on the initial data areplotted as a function of m and n in a contour plot for both BZT-BCT and BTS-BCT systems. After augmenting the trainingdata with the new data point, the SVRlin regressor is retrained;(b) Predictions after retraining SVRlin regressor are plotted as afunction of m and n for both BZT-BCT and BTS-BCT systems.

Table S2 summarizes the results obtained from the variousregressor:selector combinations we utilized.

After establishing the new phase diagram, the newly syn-thesized data was augmented into the training dataset. FigureS5 (a2), (b2) and (c2) show that the predicted versus mea-sured values for the BZT-m50-n30. And in Figure S5 (a3),(b3) and (c3), we show the performance of our regressors forthe updated training dataset. We show the predictions fromupdated SVRlin regressor in Figure S6 (b). Figure S7 (a) and(b) shows the EI as a function of m and n for both BZT andBTS systems before and after retraining the SVRlin regressor,respectively.

Fig. S 7. (a) EI (expected improvement) from Efficient GlobalOptimization (EGO) for all compositions in the virtual dataset from the SVRlin regressor that was trained using the initialdata as a function of m and n for both BZT-BCT and BTS-BCT systems. After augmenting the training set with the newdata point, SVRlin regressor is retrained; (b) EI from EGO forall compositions in the virtual set from the retrained SVRlinregressor as a function of m and n for both BZT-BCT and BTS-BCT systems.

6 | www.pnas.org/cgi/doi/10.1073/pnas.XXXXXXXXXX Xue

DRAFT

Note6: Methods and Materials

A. Experimental. The ceramic samples used in this studywere the Ba(ZrnTi1−n)O3−x(Ba1−mCam)TiO3 (BZT-xBCT),Ba(SnnTi1−n)O3−x(Ba1−mCam)TiO3 (BTS-xBCT). Thesamples were fabricated with a conventional solid-state reac-tion method with starting chemicals of BaZrO3 (98%), CaCO3(99.9%), BaCO3 (99.95%), SnO2 (99.9%) and TiO2 (99.9%).The calcining was performed at 1350◦C, and sintering wasdone at 1450◦C in air. The phase diagram were determinedfrom the dielectric permittivity (ε) versus temperature (T )curves. The piezoelectric constant d33 was measured using aBerlingcourt-type d33 meter for poled samples with a cylindri-cal shape and its temperature dependence was measured withan oil bath with the same meter.

B. Density-functional theory. Density-functional theory(DFT) calculations were performed with GGA calcula-tions using Perdew-Burke-Ernzerhof exchange-correlationfunctionals as implemented in the Quantum ESPRESSOplanewave pseudopotential package [15, 16]. The core andvalence electrons were treated with the normconservingpseudopotentials [17], which were generated using OPIUMcode [18]. solid solutions were modeled using the virtualcrystal approximation (VCA) [19]. We considered 60 Ryplane-wave cutoff for wavefunctions, 240 Ry kinetic energycutoff for charge density and potential. The atomic positionsand the cell volume were allowed to change until an energyconvergence threshold of 10−8 eV and Hellmann-Feynmanforces less than 2 meV/Å, respectively, were achieved. TheBrillouin zone integration was performed using a 8×8×8Monkhorst-Pack k-point mesh [20] centered at Γ-point.Polarization was calculated using the Berry phase method[21].

1. DURST G, GROTENHUIS M, BARKOW AG (1950) Solid Solubility Study of Barium, Stron-tium, and Calcium Titanates. Journal of the American Ceramic Society 33(4):133–139.

2. McQuarrie M, Behnke FW (1954) Structural and dielectric studies in the system (Ba, Ca)(Ti,Zr)O3 . Journal of the American Ceramic Society 37(11):539–543.

3. Wei X, Yao X (2007) Preparation, structure and dielectric property of barium stannate titanateceramics. Materials Science and Engineering: B 137(1):184–188.

4. Liu W, Ren X (2009) Large piezoelectric effect in Pb-free ceramics. Physical Review Letters103(25):257602.

5. Xue D et al. (2011) Large piezoelectric effect in Pb-free Ba(Ti,Sn)O3-x(Ba, Ca)TiO3 ceram-ics. Applied Physics Letters 99(12):122901.

6. Zhou C et al. (2012) Triple-point-type morphotropic phase boundary based large piezo-electric Pb-free material Ba(Ti0.8Hf0.2)O3-(Ba0.7Ca0.3)TiO3 . Applied Physics Letters100(22):222910.

7. Bao H, Zhou C, Xue D, Gao J, Ren X (2010) A modified lead-free piezoelectric BZT–xBCTsystem with higher TC . Journal of Physics D: Applied Physics 43(46):465401.

8. Smola AJ, Schölkopf B (2004) A tutorial on support vector regression. Statistics and Com-puting 14:199–222.

9. Karatzoglou A, Meyer D, Hornik K (2006) Support Vector Machines in R. Journal of StatisticalSoftware 15(9):1–28.

10. Vapnik V (2000) The Nature of Statistical Learning Theory. (Springer-Verlag New York).11. R Core Team (2012) R: A Language and Environment for Statistical Computing (R Foundation

for Statistical Computing, Vienna, Austria). ISBN 3-900051-07-0.12. MacKinnon DP, Lockwood CM, Williams J (2004) Confidence Limits for the Indirect Effect: Dis-

tribution of the Product and Resampling Methods. Multivariate Behavioral Research 39(1):99–128.

13. Jones DR, Schonlau M, Welch WJ (1998) Efficient global optimization of expensive black-boxfunctions. Journal of Global optimization 13(4):455–492.

14. Frazier P, Powell W, Dayanik S (2009) The Knowledge-Gradient Policy for Correlated NormalBeliefs. INFORMS Journal on Computing 21(4):599–613.

15. Giannozzi P et al. (2009) QUANTUM ESPRESSO: a modular and open-source softwareproject for quantum simulations of materials. Journal of Physics: Condensed Matter21(39):395502 (19pp).

16. Perdew JP, Burke K, Ernzerhof M (1996) Generalized Gradient Approximation Made Simple.Phys. Rev. Lett. 77:3865–3868.

17. Hamann DR, Schlüter M, Chiang C (1979) Norm-Conserving Pseudopotentials. Phys. Rev.Lett. 43:1494–1497.

18. OPIUM (2014) http://opium.sourceforge.net. .

19. de Gironcoli S (1992) Phonons in Si-Ge systems: An ab initio interatomic-force-constantapproach. Phys. Rev. B 46:2412–2419.

20. Monkhorst HJ, Pack JD (1976) Special points for Brillouin-zone integrations. Phys. Rev. B13:5188–5192.

21. Bernardini F, Fiorentini V, Vanderbilt D (1997) Spontaneous polarization and piezoelectricconstants of III-V nitrides. Phys. Rev. B 56:R10024–R10027.

Xue PNAS | September 14, 2016 | vol. XXX | no. XX | 7