SUPPLEMENTARY MATERIAL · Web view Table SM 2.14. Confusion matrix generated with 8 input...

Click here to load reader

  • date post

    02-Aug-2020
  • Category

    Documents

  • view

    2
  • download

    0

Embed Size (px)

Transcript of SUPPLEMENTARY MATERIAL · Web view Table SM 2.14. Confusion matrix generated with 8 input...

SUPPLEMENTARY MATERIAL

Selection Rules for Estimating the Solubility of C4-Hydrocarbons in Imidazolium Ionic Liquids Determined by Machine-Learning Tools

Ahsan Jalal,†,‡, ξ Elif Can,⊥,ξ Seda Keskin,†,‡,* Ramazan Yildirim,⊥,* Alper Uzun†,‡,§,*

†Department of Chemical and Biological Engineering, Koç University

Rumelifeneri Yolu, Sariyer 34450, Istanbul, Turkey

‡Koç University TÜPRAŞ Energy Center (KUTEM), Koç University

Rumelifeneri Yolu, Sariyer 34450, Istanbul, Turkey

⊥Department of Chemical Engineering, Bogazici University, Bebek 34342, Istanbul, Turkey

*Corresponding Authors:

E-mail Addresses: skeskin@ku.edu.tr (S. Keskin), yildirra@boun.edu.tr (R. Yıldırım), auzun@ku.edu.tr (A. Uzun)

ξ These authors contributed equally.

Contents

Page

0. Databases

SM-2

1. Additional Information for Computational Section

SM-3

2. Additional Results for Decision Tree Analysis

SM-7

3. Additional Results for Correlation Analysis

SM-20

4. Additional Results for Association Rule Mining Algorithm

SM-22

0. Database

The attached Microsoft Office Excel file includes list of cations and anions used in this study along with label, 2-D structures and structural descriptors of cations and anions. Moreover, the attached file also contains capacities of 13BD, 1B, C2B, and T2B, i-But, and i-B in 3267 ILs at different temperatures.

1. Additional Information for Computational Section

Table SM 1.1. Distribution of instances among classes and upper/lower limit of solubility values for 13BD dataset

total 1745 data points

min

max

# of instances

name

0.7*

0.88

597

C

0.89

1.15

566

B

1.16

70

582

A

* 1522 instances were excluded from dataset since they have relatively lower solubility values than remaining instances

Table SM 1.2. Distribution of instances among classes and upper/lower limit of solubility values for 1B dataset

Total 1328 data points

min

max

# of instances

name

0.5*

0.65

456

C

0.66

0.90

421

B

0.91

80

451

A

* 1939 instances were excluded from dataset since they have relatively lower solubility values than remaining instances

Table SM 1.3. Distribution of instances among classes and upper/lower limit of solubility values for B dataset

Total 1852 data points

min

max

# of instances

name

0.0995*

0.195

629

C

0.196

0.39

604

B

0.40

7.2

619

A

* 1415 instances were excluded from dataset since they have relatively lower solubility values than remaining instances

Table SM 1.4. Distribution of instances among classes and upper/lower limit of solubility values for C2B dataset

Total 1245 data points

min

max

# of instances

name

0.5*

0.64

418

C

0.65

0.89

410

B

0.90

78

417

A

* 2022 instances were excluded from dataset since they have relatively lower solubility values than remaining instances

Table SM 1.5. Distribution of instances among classes and upper/lower limit of solubility values for T2B dataset

Total 1583 data points

min

max

# of instances

name

0.4*

0.55

559

C

0.56

0.78

498

B

0.79

83

526

A

* 1684 instances were excluded from dataset since they have relatively lower solubility values than remaining instances

Table SM 1.6. Distribution of instances among classes and upper/lower limit of selectivity values for 13BD/1B dataset

Total 936 data points

min

max

# of instances

name

1.996*

2.22

329

C

2.23

2.65

288

B

2.66

12.5

319

A

* 2331 instances were excluded from dataset since they have relatively lower selectivity values than remaining instances

Table SM 1.7. Distribution of instances among classes and upper/lower limit of solubility values for i-butane dataset

Total 1872 data points

min

max

# of instances

name

0.5*

0.70

655

C

0.71

1.0

585

B

1.1

80.6

632

A

* 1395 instances were excluded from dataset since they have relatively lower solubility values than remaining instances

Table SM 1.8. Distribution of instances among classes and upper/lower limit of solubility values for i-butene dataset

Total 1814 data points

min

max

# of instances

name

0.4*

0.57

611

C

0.58

0.82

579

B

0.83

85

624

A

* 1453 instances were excluded from dataset since they have relatively lower solubility values than remaining instances

Table SM 1.9. Distribution of instances among classes and upper/lower limit of selectivity values for 13BD/B dataset

Total 1507 data points

min

max

# of instances

name

4.95*

6.94

524

C

6.95

11.97

480

B

11.98

120

503

A

* 1760 instances were excluded from dataset since they have relatively lower solubility values than remaining instances

Table SM 1.10. Distribution of instances among classes and upper/lower limit of selectivity values for C2B dataset

Total 1066 data points

min

max

# of instances

name

1.989*

2.23

369

C

2.24

2.66

331

B

2.67

65

372

A

* 2201 instances were excluded from dataset since they have relatively lower solubility values than remaining instances

Table SM 1.11. Distribution of instances among classes and upper/lower limit of selectivity values for 13BD/T2B dataset

Total 1301 data points

min

max

# of instances

name

1.99*

2.26

445

C

2.27

2.70

413

B

2.71

9.5

443

A

* 1966 instances were excluded from dataset since they have relatively lower solubility values than remaining instances

Table SM 1.12. Distribution of instances among classes and upper/lower limit of selectivity values for 13BD/i-butane dataset

Total 1467 data points

min

max

# of instances

name

1.3*

1.40

529

C

1.41

1.52

443

B

1.53

6.71

495

A

* 1800 instances were excluded from dataset since they have relatively lower solubility values than remaining instances

Table SM 1.13. Distribution of instances among classes and upper/lower limit of selectivity values for 13BD/i-butene dataset

Total 1660 data points

min

max

# of instances

name

1.6*

1.80

564

C

1.81

2.10

480

B

2.11

6.7

503

A

* 1607 instances were excluded from dataset since they have relatively lower solubility values than remaining instances

SM-1

2. Additional Information for Decision Tree Analysis

13BD Solubility

Table SM 2.1. Testing error and accuracy values for each test set (1/4 of dataset) and the accuracy values for classes in those sets (This table belongs to the models of 13BD dataset and 8 ultimate variables were used in those models)

# of Set

CV

Error

CV

Accuracy

Accuracy of class A

Accuracy of class B

Accuracy of class C

1st

0.36

0.64

0.80

0.45

0.64

2nd

0.37

0.63

0.81

0.36

0.74

3rd

0.34

0.66

0.72

0.61

0.63

4th

0.36

0.64

0.82

0.37

0.74

AVG

0.36

0.64

0.79

0.45

0.69

Figure SM 2.1. Decision tree generated with 8 input variables (Polarizability, HBA Count, HBD Count, CPK Area for both anion and cation) for the solubility of 13BD

Table SM 2.2. Confusion matrix generated with 8 input variables (Polarizability, HBA Count, HBD Count, CPK Area for both anion and cation) for the solubility of 13BD

A

B

C

pred

A

372

95

64

B

65

219

57

C

14

107

335

sum

451

421

456

accuracy

0.82

0.52

0.73

1B Solubility

Table SM 2.3. Testing error and accuracy values for each test set (1/4 of dataset) and the accuracy values for classes in those sets (This table belongs to the models of 1B dataset and 8 ultimate variables were used in those models)

# of Set

CV

Error

CV Accuracy

Accuracy of class A

Accuracy of class B

Accuracy of class C

1

0.37

0.63