QSAR: Brief and Incomplete Introduction · 2020. 2. 4. · Dawn of QSAR •Crum-Brown and Fraser:...

QSAR:Brief and Incomplete Introduction

Dr Olena Mokshyna

Shift of Paradigm

A. Agrawal and A. Choudhary APL Mater. 4 053208 (2016); https://www.bigmax.mpg.de2

Why do we need modeling?

• Cut costs

• Increase speed

• Decrease human efforts

• Reduce animal testing

• Find something new!

Dawn of QSAR

• Crum-Brown and Fraser: Φ = f(C)• Meyer and Overton: suggested that the narcotic

(depressant) action of a group of organic compounds paralleled their olive oil/water partition coefficients

• Hammett: “σ−ρ” culture• Taft : separating polar, steric, and resonance effects• 1962 - Hansch et al. published their study on the

structure-activity relationships of plant growth regulators and their dependency on Hammett constants and octanol-water partition coefficient

• Hansch and Fujita: QSAR paradigm• Free-Wilson approach

Gramatica, Paola. (2011). A SHORT HISTORY OF QSAR EVOLUTIONA. Cherkasov et al. (J Med Chem, 2014) QSAR modeling: where have you been? Where are you going to? doi: 10.1021/jm4004285

QSAR = Quantitative Structure-Activity RelationshipQSPR = Quantitative Structure-Property Relationship

QSAR step-by-step: general workflow

Properties

Structures Descriptors

Machine Learning algorithm

Model ready!

• Step 1: Describe the molecules

Descriptors• Calculated or

Empirical (i.e. lipophilicity, shifts in NMR) • Integral (MW, surface area) or

Fragment (fingerprints, simplexes)• 1D, 2D, 3D or even 4D• Also pharmacophore, physico-chemical,

quantum and many, many more

Requirements Towards Descriptors

• Invariance to molecular numbering or labeling• Invariance of a descriptor value to any translation or rotation of the

molecule (for 3D)• Clear structural interpretation• Not based on experimental properties• Able to discriminate among isomers• Simple• Smooth structure-property landscape

• Step 1: Describe the molecules• Step 2: Choose machine learning algorithm

Exciting World of Machine Learning

No labels With labels

Classification / Regression

https://medium.com/free-code-camp/using-machine-learning-to-predict-the-quality-of-wines-9e2e13d7480d 12

Class 0 / Class 1 y = [0.1, 3.4, 0.11,…]

Training a Regression Model

Simple case: linear regression with one variable

𝑦 = ℎ 𝑥

ℎ 𝑥 = 𝜃& + 𝜃(𝑥

Learning algorithm

Training a Regression ModelCost (or loss) function:

Task: optimize coefficients to minimize cost function

Use gradient descent

Optimization:

The size of each step is determined by the parameter α, which is called the learning rate

Training a Classification Model

y∈{0,1}

Sigmoid functionAlgorithm: logistic regression

𝑧 = 𝜃& + 𝜃(𝑥

ℎ+ 𝑥 = 𝑔(𝑧 𝑥 )

𝑔 𝑧 =1

1 + 𝑒12

Support Vector Machines (SVM)

Random Forest(RF)

Gradient Boosting Machines (GBM) 16

SVM Classification

Optimal separation?

The largest possible margin = A greater chance of new data being classified correctly 17

SVM: Kernel trick

w - weights19

RFFirst let’s grow a tree

Breiman L (2001). "Random Forests". Machine Learning. 45 (1): 5–32. 20

• Step 1 − Select random samples from a dataset.• Step 2 − Construct a

decision tree for every sample. Get the prediction from every decision tree.• Step 3 − Trees vote for every

predicted result.• Step 4 − Select the most

voted prediction

https://vas3k.com/blog/machine_learning/?ref=hn

GBMhttps://sefiks.com/2019/02/24/machine-learning-wars-deep-learning-vs-gbm/

GBM vs Deep Learning:

• Add new models to the ensemble sequentially• At each particular iteration, a new weak, base-learner model is trained with respect to the

error of the whole ensemble learnt so far

https://vas3k.com/blog/machine_learning/?ref=hn

Hyperparameters Tuning

• Model parameters = learned during the model training (e.g. weights in Linear Regression).

• Hyperparameters = are all the parameters set by the user before starting training (e.g. number of estimators)

n_estimators=100max_depth=10

Hyperparameter Tuning: Search

• Manual Search• Grid Search• Random Search• Automated Hyperparameter

Tuning (Bayesian Optimization, Genetic Algorithms)• Artificial Neural Networks (ANNs)

Tuning

Model is trained!Profit?Not so fast

• Step 1: Describe the molecules• Step 2: Choose machine learning algorithm• Step 3: Validate your model

Model Validation

Train / Cross-validation / Test Sets

Data-driven design of metal–organic frameworks for wet flue gas CO2 capture. Boyd et al. Nature, 2019doi:10.1038/s41586-019-1798-7

Training/test ~ 70%/30%K-fold cross-validation

High Bias vs High Variance

Question: Which case is underfitting, which case is overfitting?

High Bias vs High Variance: Cure?

• Get more (diverse!) training examples• Try less features• Play with hyperparameters (i.e., decrease

the depth of the trees in RF)

• Try more new features• Play with feature engineering using old features• Play with hyperparameters (i.e., increase the depth

of the trees in RF)

Model Metrics: Regression

Model Metrics: Classification

𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =𝑇𝑃 + 𝑇𝑁

𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 =𝑇𝑁

𝑇𝑁 + 𝐹𝑃

𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 =𝑇𝑃

𝑇𝑃 + 𝐹𝑁

𝐵𝑎𝑙𝑎𝑛𝑐𝑒𝑑 𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 + 𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦

• Step 0: Check and curate your data• Step 1: Describe the molecules• Step 2: Choose machine learning algorithm• Step 3: Validate your model

Databases

Data Curation

• Errors in structures

• Errors in values• Errors in units

i.e. mg/kg instead of mol/kg

38Fourches et al. Trust, But Verify: On the Importance of Chemical Structure Curation in Cheminformatics and QSAR Modeling Research. J.Chem.Inf.Model, 2010. doi: 10.1021/ci100176x

Preparation of Features

Lab Course

Scikit-learn• Module for Python to perform

machine learninghttps://www.scikit-learn.org/

RDKit• Module for Python (C, java) to

do variety of cheminformatics tasks https://www.rdkit.org/

Jupyter notebookPython

QSAR: Brief and Incomplete Introduction · 2020. 2. 4. · Dawn of QSAR •Crum-Brown and Fraser:...

Documents

Transcript of QSAR: Brief and Incomplete Introduction · 2020. 2. 4. · Dawn of QSAR •Crum-Brown and Fraser:...

CSC 331 Review Game 1. Teams Katie, Mark, Alex Z, Dan B., Zane Dan C, Alex D, Dawn, Marshall, Zack Colby, Benjamin, Lisa, Riley, Andrew Catharine, Ryan,

Tutorial 2: QSAR modeling of compounds profiling

Η Χαραυγή - Dawn Bible Students Association · άνεση της ανθρωπότητας αφού έζησε περισσότερους από είκοσι αιώνες θλίψεων.

QSAR D 1-4 FIX

International Journal of Molecular Sciences 3D-QSAR, Molecular Docking and Molecular... · silico methods including 3D-QSAR, homology modeling, molecular docking and molecular dynamics

3D- QSAR. QSAR A QSAR is a mathematical relationship between a biological activity of a molecular system and its physicochemical parameters. QSAR attempts.

Synthesis and QSAR study of novel α-methylene-γ ... · 1 Title: Synthesis and QSAR study of novel α-methylene-γ-butyrolactone derivatives as antifungal agents Authorship: Authors:

Design, Synthesis, Docking and 2D QSAR studies of novel 3,5-diaryl Pyrazole Derivatives and their evaluation as Antioxidants and as Immunomodulators, inhibitors.

Molecular Modeling and 3D QSAR Studies · An important approach to cancer therapy is the design of small molecule modulators that interfere with microtubule dynamics through their

QSAR and docking studies of inhibition activity of 5,6 ...journals.tubitak.gov.tr/chem/issues/kim-11-35-3/kim-35-3-12-0901-33.… · QSAR and docking studies of inhibition activity

MTDLs Design on AChE (Acetylcholinesterase) and · 2015. 10. 20. · MTDLs Design on AChE (Acetylcholinesterase) and β-Secretase (BACE-1): 3D-QSAR and Molecular Docking Studies 490

QSAR Analysis of Purine-Type and Propafenone-Type ...cdn.intechopen.com/pdfs/44545/InTech-Qsar_analysis_of_purine_type... · caffeine , verapamil ... of action of all studied analogs;

The dawn of galaxy formation from a · The dawn of galaxy formation from a ... • Cosmic dawn stars in massive galaxies are metal-rich, α-enhanced stars. • Beware of opacity changes

CHAPTER 19 Narcotic Analgesics. I General Consideration 【 action mechanism 】 ligands opioids receptor Gi inhibiting adenylate cyclase increasing potassium.

Η Χαραυ γή - Dawn Bible

Hermetic Order of the Golden Dawn® - Archive

Research Article In silico design of Plasmodium falciparum … · 2018. 6. 7. · a 3D-QSAR pharmacophore protocol was used to prepare a four-feature pharmacophore (PH4) model from

3D-QSAR and docking studies of pentacycloundecylamines at ...

List I Narcotic Drugs Strictly Limited for Circulation ...embassies.gov.il/tbilisi/ConsularServices/Documents/Prohibited... · Narcotic Drugs Strictly Limited for Circulation №

ANNUAL ESTIMATES OF REQS. OF NARCOTIC DRUGS