Download - Department of Pharmaceutical Chemistry - Open PHACTS · PDF filePharmacoinformatics Research Group Department of Pharmaceutical Chemistry Topliss batchwise scheme reviewed in the era

Transcript
Page 1: Department of Pharmaceutical Chemistry - Open PHACTS · PDF filePharmacoinformatics Research Group Department of Pharmaceutical Chemistry Topliss batchwise scheme reviewed in the era

Pharmacoinformatics Research Group Department of Pharmaceutical Chemistry

Topliss batchwise scheme reviewed in the era of Open Data

Lars Richter, Gerhard F. Ecker Dept. of Pharmaceutical Chemistry

[email protected]

pharminfo.univie.ac.at

Page 2: Department of Pharmaceutical Chemistry - Open PHACTS · PDF filePharmacoinformatics Research Group Department of Pharmaceutical Chemistry Topliss batchwise scheme reviewed in the era

subst. π σ -σ π+σ Es

3,4-Cl2 1 1 5 1 2-5

4-Cl 2 2 4 2 2-5

4-CH3 3 4 2 3 2-5

4-OCH3 4-5 5 1 5 2-5

H 4-5 3 3 4 1

Topliss batchwise scheme

Topliss ranking schemes Topliss substituent

proposals

scheme new substituent selection1

π 3-CF3, 4-Cl; 3-CF3, 4-NO2; 4-

CF3; 2,4-C12; 4-c-C5H9; 4-c-

C6H11; 4-CH(CH3)2; 4-C(CH3)3;

3,4-(CH3)2; 4-O(CH3),CH3; 4-

OCH2Ph; 4-N(C2H5)

σ 3-CF3, 4-Cl; 3-CF3, 4-NO2; 4-

CF3; 2,4-C12; 4-c-C5H9; 4-c-

C6H11

-σ 4-N(C2H5)2; 4-N(CH3)2; 4-NH2;

4-NHC4H9; 4-OH; 4-

OCH(CH3)2; 3-CH3,4-OCH3

π+σ 3-CF3, 4-Cl; 3-CF3, 4-NO2; 4-

CF3; 2,4-C12; 4-c-C5H9; 4-c-

C6H11

1Topliss et al. J Med Chem 1977

Page 3: Department of Pharmaceutical Chemistry - Open PHACTS · PDF filePharmacoinformatics Research Group Department of Pharmaceutical Chemistry Topliss batchwise scheme reviewed in the era

Topliss batchwise scheme

substituion EC50 rank

3,4-Cl2 0.150 5

4-Cl 0.132 4

4-CH3 0.063 2

4-OCH3 0.045 1

H 0.079 3

Series of five phenyl-substituted propafenone

derivatives measured against P-Glycoprotein

Which compound should be synthesized next?

Page 4: Department of Pharmaceutical Chemistry - Open PHACTS · PDF filePharmacoinformatics Research Group Department of Pharmaceutical Chemistry Topliss batchwise scheme reviewed in the era

subst. π σ -σ π+σ Es

3,4-Cl2 1 1 5 1 2-5

4-Cl 2 2 4 2 2-5

4-CH3 3 4 2 3 2-5

4-OCH3 4-5 5 1 5 2-5

H 4-5 3 3 4 1

Topliss batchwise scheme

Topliss ranking schemes Topliss substituent

proposals

scheme new substituent selection

π 3-CF3, 4-Cl; 3-CF3, 4-NO2; 4-

CF3; 2,4-C12; 4-c-C5H9; 4-c-

C6H11; 4-CH(CH3)2; 4-C(CH3)3;

3,4-(CH3)2; 4-O(CH3),CH3; 4-

OCH2Ph; 4-N(C2H5)

σ 3-CF3, 4-Cl; 3-CF3, 4-NO2; 4-

CF3; 2,4-C12; 4-c-C5H9; 4-c-

C6H11

-σ 4-N(C2H5)2; 4-N(CH3)2; 4-NH2;

4-NHC4H9; 4-OH; 4-

OCH(CH3)2; 3-CH3,4-OCH3

π+σ 3-CF3, 4-Cl; 3-CF3, 4-NO2; 4-

CF3; 2,4-C12; 4-c-C5H9; 4-c-

C6H11

substituion EC50 rank

3,4-Cl2 0.150 5

4-Cl 0.132 4

4-CH3 0.063 2

4-OCH3 0.045 1

H 0.079 3

propafenone dataset

Page 5: Department of Pharmaceutical Chemistry - Open PHACTS · PDF filePharmacoinformatics Research Group Department of Pharmaceutical Chemistry Topliss batchwise scheme reviewed in the era

Topliss batchwise scheme

• 4-N(CH3)2 derivative was

synthesized and tested

• no affinity increase

substituion EC50 rank

3,4-Cl2 0.150 5

4-Cl 0.132 4

4-CH3 0.063 2

4-OCH3 0.045 1

H 0.079 3

4-N(CH3)2

How often do Topliss schemes

(π, σ, -σ, π+σ, Es) occur in large databases?

How useful do Topliss schemes prove in

activity optimization?

propafenone dataset

Page 6: Department of Pharmaceutical Chemistry - Open PHACTS · PDF filePharmacoinformatics Research Group Department of Pharmaceutical Chemistry Topliss batchwise scheme reviewed in the era

www.openphacts.org

Page 7: Department of Pharmaceutical Chemistry - Open PHACTS · PDF filePharmacoinformatics Research Group Department of Pharmaceutical Chemistry Topliss batchwise scheme reviewed in the era

www.openphacts.org

Page 8: Department of Pharmaceutical Chemistry - Open PHACTS · PDF filePharmacoinformatics Research Group Department of Pharmaceutical Chemistry Topliss batchwise scheme reviewed in the era

1. Return 3,4-dichloro substituted compounds

in postgresql ChEMBL 20 using RDKit cartridge

2a. For each 3,4-Cl2 substituent check for availablity of

4-Cl, 4-OCH3, 4-CH3 and H substitutions

3. Check for each compound series for bioactivity data

(pChEMBL) measured in

- same target in same assay

- activity type = IC50 or Ki

- plus, if available, activity

for new subst. selection

3nM 5nM 8nM 9nM 10nM

1 2 3 4 5

540 x

9312 cpds

SQL query

200 series

How often do Topliss patterns occur?

new substitution

selection

1108 bioactivity data

for additional substituents

Page 9: Department of Pharmaceutical Chemistry - Open PHACTS · PDF filePharmacoinformatics Research Group Department of Pharmaceutical Chemistry Topliss batchwise scheme reviewed in the era

1108 bioactivity data

for additional substituents

new substitution

selection

Raw data output after mining ChEMBL

3nM 5nM 8nM 9nM 10nM

1 2 3 4 5

200 series

Page 10: Department of Pharmaceutical Chemistry - Open PHACTS · PDF filePharmacoinformatics Research Group Department of Pharmaceutical Chemistry Topliss batchwise scheme reviewed in the era

How often do Topliss patterns occur?

subst. π σ -σ π+σ Es

3,4-Cl2 1 1 5 1 2-5

4-Cl 2 2 4 2 2-5

4-CH3 3 4 2 3 2-5

4-OCH3 4-5 5 1 5 2-5

H 4-5 3 3 4 1

# of series 13 7 3 2 34

distribution of 200 series

π

σ

π+σ

Es

others

3nM 5nM 8nM 9nM 10nM

1 2 3 4 5

200 series

57 of 200 series (29%) extracted from ChEMBL 20

follow a Topliss pattern

Page 11: Department of Pharmaceutical Chemistry - Open PHACTS · PDF filePharmacoinformatics Research Group Department of Pharmaceutical Chemistry Topliss batchwise scheme reviewed in the era

Topliss

pattern

# of

series

substituent

selection [1]

more

active [2]

percent

age

π 13 29 9 31 %

σ 7 9 1 11 %

-σ 3 5 1 20 %

π+σ 2 2 1 50 %

[1] For each series, bioactivity for substituents, proposed

by Topliss new substituent selection were collected from

ChEMBL 20, if available.

[2] Check whether proposed substituents lead to more

active cpds

scheme new substituent selection

π 3-CF3, 4-Cl; 3-CF3, 4-NO2; 4-

CF3; 2,4-C12; 4-c-C5H9; 4-c-

C6H11; 4-CH(CH3)2; 4-

C(CH3)3; 3,4-(CH3)2; 4-

O(CH3),CH3; 4-OCH2Ph; 4-

N(C2H5)

σ 3-CF3, 4-Cl; 3-CF3, 4-NO2; 4-

CF3; 2,4-C12; 4-c-C5H9; 4-c-

C6H11

-σ 4-N(C2H5)2; 4-N(CH3)2; 4-NH2;

4-NHC4H9; 4-OH; 4-

OCH(CH3)2; 3-CH3,4-OCH3

π+σ 3-CF3, 4-Cl; 3-CF3, 4-NO2; 4-

CF3; 2,4-C12; 4-c-C5H9; 4-c-

C6H11

How useful do Topliss prove in activity optimization?

poor performance of -σ

is in agreement with

propafenone data

Topliss approach seems to have difficulties for

series following the σ scheme in activity

optimization for the series found in ChEMBL.

Page 12: Department of Pharmaceutical Chemistry - Open PHACTS · PDF filePharmacoinformatics Research Group Department of Pharmaceutical Chemistry Topliss batchwise scheme reviewed in the era

How useful do Topliss prove in activity optimization?

Topliss proposal for propafenone dataset,

4-N(CH3)2, did not show activity gain.

Are there -σ series in ChEMBL with

bioactivity data for 4-N(CH3)2

substitution?

target Type 4-OCH3 (nM) 4-N(CH3)2 (nM)

P-Glycoprotein EC50 45 82

Alpha-1a adrenergic receptor

(ChEMBL)

Ki 0.3 0.8

µ-opioid receptor (ChEMBL) Ki 0.50 63

substituion EC50 rank

3,4-Cl2 0.150 5

4-Cl 0.132 4

4-CH3 0.063 2

4-OCH3 0.045 1

H 0.079 3

propafenone dataset

Also in the two cases of ChEMBL the -σ proposal 4-N(CH3)2

failed to increase activity.

Page 13: Department of Pharmaceutical Chemistry - Open PHACTS · PDF filePharmacoinformatics Research Group Department of Pharmaceutical Chemistry Topliss batchwise scheme reviewed in the era

substituion EC50 rank

3,4-Cl2 0.522 5

4-Cl 0.190 4

4-CH3 0.063 1

4-OCH3 0.180 3

H 0.079 2

non Topliss

propafenone aryloxy

Topliss batchwise scheme

Ranking pattern 5 4 1 3 2 in

this dataset can‘t be assigned

to an existing Topliss scheme

How often does the pattern 5 4 1 3 2 occur

in ChEMBL?

In general, which other, non Topliss pattern

occur frequently in ChEMBL?

Page 14: Department of Pharmaceutical Chemistry - Open PHACTS · PDF filePharmacoinformatics Research Group Department of Pharmaceutical Chemistry Topliss batchwise scheme reviewed in the era

Which non Topliss pattern occur in

ChEMBL?

subst. new1 new2 new3 aryloxy

3,4-Cl2 1 5 5 5

4-Cl 2 2 3 4

4-CH3 4 4 1 1

4-OCH3 3 1 4 3

H 5 3 2 2

# series 6 4 4 0

distribution of 200 series

π

σ

π+σ

Es

new1

new2

new3

The pattern found in aryloxy

dataset, does not occur in ChEMBL

However: High similarity to new3

Do we find an underlying physicochemical

driving force in the new3 pattern?

Can we extrapolate to aryloxy dataset?

Page 15: Department of Pharmaceutical Chemistry - Open PHACTS · PDF filePharmacoinformatics Research Group Department of Pharmaceutical Chemistry Topliss batchwise scheme reviewed in the era

Correlation analysis within new3

series

target name pattern # of cpds in series [1] r (π) r (σ ) r (vdw_area)

Prostanoid EP 1 rec 5 3 1 4 2 5 + 8 -0.81**

Adenosine A3 rec 5 3 1 4 2 5 + 8 -0.54*

Adenosine A3 rec 5 3 1 4 2 5 + 8 -0.67**

Chymase 5 3 1 4 2 5 + 13 -0.49**

P-Glycoprotein 5 4 1 3 2 5

[1] Next to the 5 datapoints from 3,4-Cl2, 4-Cl, 4-OCH3, 4-CH3 and

H, bioactivity data from other substituents listed in Topliss et al

1977 were selected for correlation analysis.

Correlation analyses were undertaken to calculate the Pearson

correlation coefficient (r) between physicochemical features π,

σ , vdw_area and the respective bioactivity data.

** p < 0.05 , * p < 0.10

Statistically significant negative vdw_area correlations

indicate that new3 pattern & aryloxy bind to a tight pocket

Page 16: Department of Pharmaceutical Chemistry - Open PHACTS · PDF filePharmacoinformatics Research Group Department of Pharmaceutical Chemistry Topliss batchwise scheme reviewed in the era

There are 120 (5!) ranking possibilites (patterns)

(1,2,3,4,5), (2,1,3,4,5), (1,3,2,4,5), … (5,4,3,2,1)

Calculation of Spearman’s rank correlation

distance matrix for 120 possibilities

(R function corDist)

Spherical MDS to represent the distance matrix

on the surface of a sphere (R function

smacofSphere), Kruksal-Stress = 0.15

Each point represents a pattern (e.g. 1,2,3,4,5)

similar patterns are in vincinity to each other

Discover the ranking globe

How to look at the ranking space globally?

Frequency contour map

Color coding based on

frequency of patterns.

Red = high frequency

Blue = low frequency

Page 17: Department of Pharmaceutical Chemistry - Open PHACTS · PDF filePharmacoinformatics Research Group Department of Pharmaceutical Chemistry Topliss batchwise scheme reviewed in the era

Map analysis

*Es

Van der Waals contour map

Color coding based on

vdw_area correlations with

bioactivity.

Only series with activity data

for five additional derivatives

(e.g. 4-CF3, 4-OH ...) are used

in correlation analysis

(n>=10). Resulting

correlations with p > 0.1 were

omitted.

The remaining coefficients

were used for color coding.

Red ... positive correlation

Blue ... negative correlation

π and σ continent steric island

steric island π and σ continent

- σ

aryloxy

Frequency contour map

Color coding based on

frequency of patterns.

Red = high frequency

Blue = low frequency

Only Topliss patterns (π, σ, π+σ, Es ) and rankings patterns with

four or more series (new1, new2, new3) are schown.

trench

• surrounded by Es

pattern

• lies in area with

negative vdw_area

correlation

• Only three –σ

pattern in ChEMBL

• In the investigated

cases, poor

predictability of –σ

scheme

Page 18: Department of Pharmaceutical Chemistry - Open PHACTS · PDF filePharmacoinformatics Research Group Department of Pharmaceutical Chemistry Topliss batchwise scheme reviewed in the era

Summary & Outlook

• Open medicinal chemistry data such as those in ChEMBL allow

analysis of complex SAR patterns

• Connecting these data with data from pathways and diseases like

implemented in the Open PHACTS Discovery Platform will

open up completely new possibilities for linking chemical SAR

patterns to biological endpoints

• Quality of data is key for the analysis (assays)

Next steps

• Look for X-ray structures of complexes

• Analyse with respect to target classes

Page 19: Department of Pharmaceutical Chemistry - Open PHACTS · PDF filePharmacoinformatics Research Group Department of Pharmaceutical Chemistry Topliss batchwise scheme reviewed in the era

Pharmacoinformatics Research Group Department of Pharmaceutical Chemistry

Page 20: Department of Pharmaceutical Chemistry - Open PHACTS · PDF filePharmacoinformatics Research Group Department of Pharmaceutical Chemistry Topliss batchwise scheme reviewed in the era

Pharmacoinformatics Research Group Department of Pharmaceutical Chemistry

ChEMBL 20 postgreSQL > 13 000 000 activities

RDKit Chemoinformatics

toolkit 2014.03

RDKit cartridge

SQL query: get all 3,4-Cl2 compounds

SMILES

Data processing in python

200 series

Page 21: Department of Pharmaceutical Chemistry - Open PHACTS · PDF filePharmacoinformatics Research Group Department of Pharmaceutical Chemistry Topliss batchwise scheme reviewed in the era

Pharmacoinformatics Research Group Department of Pharmaceutical Chemistry

-> 120 ranking possibilies are created -> Spearman ranking distance matrix calculated -> Spherical MDS is undertaken -> X,Y,Z coordinates are exported as CSV file

Spherical MDS in R software

Coordinates.csv Python data preprossesing

2D - EquidistantCylindrical

Projections

3D - Orthographic

Basemap toolkit

• provides list of globe projections • create contour maps

Page 22: Department of Pharmaceutical Chemistry - Open PHACTS · PDF filePharmacoinformatics Research Group Department of Pharmaceutical Chemistry Topliss batchwise scheme reviewed in the era

Pharmacoinformatics Research Group Department of Pharmaceutical Chemistry

For each series bioactivity data for 3,4-Cl2, 4-Cl, 4-CH3, 4-OCH3 and 4-H is available

• For the majority of the series (91%) there are bioactivity data for more substituents e.g. 4-CF3, 4-OH, 4-F, ... available. (Substituents taken from „new substituent selection“) • More than 57% of the series have activity data for five or more additional substituents.

Series_8 3,4-Cl2 4-Cl 4-CH3 4-OCH3 4-H

pIC50

6.3 7.0 7.4 7.6 8

vdw_area

134 117 116 131 99

4-CF3 4-F 4-OH 3,4-(CH3)2 4-C(CH3)3

6.9 7.7 6.6 7 6.1

129 103 109 134 152

For series with 5 or more additional substituents (n>=10) correlation analysis were run:

In this example: R = -0.70, p = 0.03 Series 8 with pattern 5 4 3 2 1, has R(vdw) = -0.7

Page 23: Department of Pharmaceutical Chemistry - Open PHACTS · PDF filePharmacoinformatics Research Group Department of Pharmaceutical Chemistry Topliss batchwise scheme reviewed in the era

Pharmacoinformatics Research Group Department of Pharmaceutical Chemistry

First 2D MDS bad Kruksal-Stress-1 > 0.2 Second 3D MDS good Kruksal-Stress-1 = 0.11 but visualization not helpful Third Spherical MDS moderate Kruksal-Stress-1 = 0.15, good visualization

get120Possibilities() ... creates a vector with 120 rankings [(1,2,3,4,5), (2,1,3,4,5) ...] corDist () ... calculates Spearman‘s rank correlation distance smacofSphere() ... runs spherical MDS, type=„ordinal“ because we have rankings, algorithm=„primal“ ... handling of ties xyz.120 ... x,y,z – coordinates of the MDS run Coordinates (xyz.120) are exported to CSV file and are the input for Basemap

Details to Multidimensional Scaling with

Page 24: Department of Pharmaceutical Chemistry - Open PHACTS · PDF filePharmacoinformatics Research Group Department of Pharmaceutical Chemistry Topliss batchwise scheme reviewed in the era

Pharmacoinformatics Research Group Department of Pharmaceutical Chemistry

• Potentielle Fragen:

• -> Wie lange dauert so eine Suche wenn der Workflow steht ~ 1 Tag (4 Prozessoren Rechner, 8GB RAM)

• -> Wie werden Salze behandelt? Skript ist so geschrieben dass diese nicht berücksichtigt werden. Soll heißen es wäre potentiell möglich dass die diChloro verbindung ein Natriumsalz ist und das Methylderivat ein Kaliumsalz. Wie auch immer in den 200 Serien war dies nie zu finden und spielt somit keine Rolle.

• -> Wie steht es um Chiralität. Ich habe die Chiralität nicht berücksichtigt in der Query. Dies wäre möglich gewesen aber da die Codierung von Chiralitäten in ChEMBL nicht umfassend ist habe ich es nicht berücksichtigt.

• -> wie groß muss den Unterschied sein zwischen den Bioaktivitäten damit es als Serie anerkannt wurde? Im Topliss paper findet man rankings mit log >0.1 zwischen den Verbindungen. Wir haben darauf keine Rücksicht genommen und alle Daten verwendet (so wie es übrigens auch die Gruppe die 2014 eine ähnliche Analyse auch gemacht haben)

• Die Datenanalyse zeigt von den 200 serien: Haben 43 eine Differenz von mindestens „>0.1 log“ zwischen den rankings. 77 series haben 1 verstoß dieser regel, d.h. die differnz zwischen 2 rankings ist ein mal kleiner 0.1. 80 haben dann 2 oder mehr verstöße.

• Warum habt ihr die anderen pattern 2pi-pi^2, pi-sigma usw. nicht berücksichtigt? Die Komplexität wäre deutlich höher gewesen ohne dass es einen nennenswerten Informationsgewinn gegeben hätte. Zur Abgrenzung, die neuen pattern „new 1, new 2, new 3) fallen in keines der von Topliss postulierten pattern auch nicht in die erweiterte Auswahl (2pi- pi^2, pi-3sigma, usw.)