Department of Pharmaceutical Chemistry - Open PHACTS nbsp; Pharmacoinformatics Research Group...

download Department of Pharmaceutical Chemistry - Open PHACTS nbsp; Pharmacoinformatics Research Group Department of Pharmaceutical Chemistry Topliss batchwise scheme reviewed in the era

of 24

  • date post

    31-Jan-2018
  • Category

    Documents

  • view

    216
  • download

    0

Embed Size (px)

Transcript of Department of Pharmaceutical Chemistry - Open PHACTS nbsp; Pharmacoinformatics Research Group...

  • Pharmacoinformatics Research Group Department of Pharmaceutical Chemistry

    Topliss batchwise scheme reviewed in the era of Open Data

    Lars Richter, Gerhard F. Ecker Dept. of Pharmaceutical Chemistry

    gerhard.f.ecker@univie.ac.at

    pharminfo.univie.ac.at

    mailto:gerhard.f.ecker@univie.ac.at

  • subst. - + Es

    3,4-Cl2 1 1 5 1 2-5

    4-Cl 2 2 4 2 2-5

    4-CH3 3 4 2 3 2-5

    4-OCH3 4-5 5 1 5 2-5

    H 4-5 3 3 4 1

    Topliss batchwise scheme

    Topliss ranking schemes Topliss substituent

    proposals

    scheme new substituent selection1

    3-CF3, 4-Cl; 3-CF3, 4-NO2; 4-CF3; 2,4-C12; 4-c-C5H9; 4-c-

    C6H11; 4-CH(CH3)2; 4-C(CH3)3;

    3,4-(CH3)2; 4-O(CH3),CH3; 4-

    OCH2Ph; 4-N(C2H5)

    3-CF3, 4-Cl; 3-CF3, 4-NO2; 4-CF3; 2,4-C12; 4-c-C5H9; 4-c-

    C6H11

    - 4-N(C2H5)2; 4-N(CH3)2; 4-NH2; 4-NHC4H9; 4-OH; 4-OCH(CH3)2; 3-CH3,4-OCH3

    + 3-CF3, 4-Cl; 3-CF3, 4-NO2; 4-CF3; 2,4-C12; 4-c-C5H9; 4-c-C6H11

    1Topliss et al. J Med Chem 1977

  • Topliss batchwise scheme

    substituion EC50 rank

    3,4-Cl2 0.150 5

    4-Cl 0.132 4

    4-CH3 0.063 2

    4-OCH3 0.045 1

    H 0.079 3

    Series of five phenyl-substituted propafenone

    derivatives measured against P-Glycoprotein

    Which compound should be synthesized next?

  • subst. - + Es

    3,4-Cl2 1 1 5 1 2-5

    4-Cl 2 2 4 2 2-5

    4-CH3 3 4 2 3 2-5

    4-OCH3 4-5 5 1 5 2-5

    H 4-5 3 3 4 1

    Topliss batchwise scheme

    Topliss ranking schemes Topliss substituent

    proposals

    scheme new substituent selection

    3-CF3, 4-Cl; 3-CF3, 4-NO2; 4-CF3; 2,4-C12; 4-c-C5H9; 4-c-

    C6H11; 4-CH(CH3)2; 4-C(CH3)3;

    3,4-(CH3)2; 4-O(CH3),CH3; 4-

    OCH2Ph; 4-N(C2H5)

    3-CF3, 4-Cl; 3-CF3, 4-NO2; 4-CF3; 2,4-C12; 4-c-C5H9; 4-c-

    C6H11

    - 4-N(C2H5)2; 4-N(CH3)2; 4-NH2; 4-NHC4H9; 4-OH; 4-OCH(CH3)2; 3-CH3,4-OCH3

    + 3-CF3, 4-Cl; 3-CF3, 4-NO2; 4-CF3; 2,4-C12; 4-c-C5H9; 4-c-C6H11

    substituion EC50 rank

    3,4-Cl2 0.150 5

    4-Cl 0.132 4

    4-CH3 0.063 2

    4-OCH3 0.045 1

    H 0.079 3

    -

    propafenone dataset

  • Topliss batchwise scheme

    -

    4-N(CH3)2 derivative was

    synthesized and tested

    no affinity increase

    substituion EC50 rank

    3,4-Cl2 0.150 5

    4-Cl 0.132 4

    4-CH3 0.063 2

    4-OCH3 0.045 1

    H 0.079 3

    4-N(CH3)2

    How often do Topliss schemes

    (, , -, +, Es) occur in large databases?

    How useful do Topliss schemes prove in

    activity optimization?

    propafenone dataset

  • www.openphacts.org

  • www.openphacts.org

  • 1. Return 3,4-dichloro substituted compounds

    in postgresql ChEMBL 20 using RDKit cartridge

    2a. For each 3,4-Cl2 substituent check for availablity of

    4-Cl, 4-OCH3, 4-CH3 and H substitutions

    3. Check for each compound series for bioactivity data

    (pChEMBL) measured in

    - same target in same assay

    - activity type = IC50 or Ki - plus, if available, activity

    for new subst. selection

    3nM 5nM 8nM 9nM 10nM

    1 2 3 4 5

    540 x

    9312 cpds

    SQL query

    200 series

    How often do Topliss patterns occur?

    new substitution

    selection

    1108 bioactivity data

    for additional substituents

  • 1108 bioactivity data

    for additional substituents

    new substitution

    selection

    Raw data output after mining ChEMBL

    3nM 5nM 8nM 9nM 10nM

    1 2 3 4 5

    200 series

  • How often do Topliss patterns occur?

    subst. - + Es

    3,4-Cl2 1 1 5 1 2-5

    4-Cl 2 2 4 2 2-5

    4-CH3 3 4 2 3 2-5

    4-OCH3 4-5 5 1 5 2-5

    H 4-5 3 3 4 1

    # of series 13 7 3 2 34

    distribution of 200 series

    -

    +

    Es

    others

    3nM 5nM 8nM 9nM 10nM

    1 2 3 4 5

    200 series

    57 of 200 series (29%) extracted from ChEMBL 20

    follow a Topliss pattern

  • Topliss

    pattern

    # of

    series

    substituent

    selection [1]

    more

    active [2]

    percent

    age

    13 29 9 31 %

    7 9 1 11 %

    - 3 5 1 20 %

    + 2 2 1 50 %

    [1] For each series, bioactivity for substituents, proposed

    by Topliss new substituent selection were collected from

    ChEMBL 20, if available.

    [2] Check whether proposed substituents lead to more

    active cpds

    scheme new substituent selection

    3-CF3, 4-Cl; 3-CF3, 4-NO2; 4-CF3; 2,4-C12; 4-c-C5H9; 4-c-

    C6H11; 4-CH(CH3)2; 4-

    C(CH3)3; 3,4-(CH3)2; 4-

    O(CH3),CH3; 4-OCH2Ph; 4-

    N(C2H5)

    3-CF3, 4-Cl; 3-CF3, 4-NO2; 4-CF3; 2,4-C12; 4-c-C5H9; 4-c-

    C6H11

    - 4-N(C2H5)2; 4-N(CH3)2; 4-NH2; 4-NHC4H9; 4-OH; 4-

    OCH(CH3)2; 3-CH3,4-OCH3

    + 3-CF3, 4-Cl; 3-CF3, 4-NO2; 4-CF3; 2,4-C12; 4-c-C5H9; 4-c-

    C6H11

    How useful do Topliss prove in activity optimization?

    poor performance of - is in agreement with

    propafenone data

    Topliss approach seems to have difficulties for

    series following the scheme in activity

    optimization for the series found in ChEMBL.

  • How useful do Topliss prove in activity optimization?

    Topliss proposal for propafenone dataset,

    4-N(CH3)2, did not show activity gain.

    Are there - series in ChEMBL with bioactivity data for 4-N(CH3)2

    substitution?

    target Type 4-OCH3 (nM) 4-N(CH3)2 (nM)

    P-Glycoprotein EC50 45 82

    Alpha-1a adrenergic receptor

    (ChEMBL)

    Ki 0.3 0.8

    -opioid receptor (ChEMBL) Ki 0.50 63

    substituion EC50 rank

    3,4-Cl2 0.150 5

    4-Cl 0.132 4

    4-CH3 0.063 2

    4-OCH3 0.045 1

    H 0.079 3

    -

    propafenone dataset

    Also in the two cases of ChEMBL the - proposal 4-N(CH3)2

    failed to increase activity.

  • substituion EC50 rank

    3,4-Cl2 0.522 5

    4-Cl 0.190 4

    4-CH3 0.063 1

    4-OCH3 0.180 3

    H 0.079 2

    non Topliss

    propafenone aryloxy

    Topliss batchwise scheme

    Ranking pattern 5 4 1 3 2 in

    this dataset cant be assigned

    to an existing Topliss scheme

    How often does the pattern 5 4 1 3 2 occur

    in ChEMBL?

    In general, which other, non Topliss pattern

    occur frequently in ChEMBL?

  • Which non Topliss pattern occur in

    ChEMBL?

    subst. new1 new2 new3 aryloxy

    3,4-Cl2 1 5 5 5

    4-Cl 2 2 3 4

    4-CH3 4 4 1 1

    4-OCH3 3 1 4 3

    H 5 3 2 2

    # series 6 4 4 0

    distribution of 200 series

    -

    +

    Es

    new1

    new2

    new3

    The pattern found in aryloxy

    dataset, does not occur in ChEMBL

    However: High similarity to new3

    Do we find an underlying physicochemical

    driving force in the new3 pattern?

    Can we extrapolate to aryloxy dataset?

  • Correlation analysis within new3

    series

    target name pattern # of cpds in series [1] r () r ( ) r (vdw_area)

    Prostanoid EP 1 rec 5 3 1 4 2 5 + 8 -0.81**

    Adenosine A3 rec 5 3 1 4 2 5 + 8 -0.54*

    Adenosine A3 rec 5 3 1 4 2 5 + 8 -0.67**

    Chymase 5 3 1 4 2 5 + 13 -0.49**

    P-Glycoprotein 5 4 1 3 2 5

    [1] Next to the 5 datapoints from 3,4-Cl2, 4-Cl, 4-OCH3, 4-CH3 and

    H, bioactivity data from other substituents listed in Topliss et al

    1977 were selected for correlation analysis.

    Correlation analyses were undertaken to calculate the Pearson

    correlation coefficient (r) between physicochemical features ,

    , vdw_area and the respective bioactivity data.

    ** p < 0.05 , * p < 0.10

    Statistically significant negative vdw_area correlations

    indicate that new3 pattern & aryloxy bind to a tight pocket

  • There are 120 (5!) ranking possibilites (patterns)

    (1,2,3,4,5), (2,1,3,4,5), (1,3,2,4,5), (5,4,3,2,1)

    Calculation of Spearmans rank correlation

    distance matrix for 120 possibilities

    (R function corDist)

    Spherical MDS to represent the distance matrix

    on the surface of a sphere (R function

    smacofSphere), Kruksal-Stress = 0.15

    Each point represents a pattern (e.g. 1,2,3,4,5)

    similar patterns are in vincinity to each other

    Discover the ranking globe

    How to look at the ranking space globally?

    Frequency contour map

    Color coding based on

    frequency of patterns.

    Red = high frequency

    Blue = low frequency

  • Map analysis

    *Es

    Van der Waals contour map

    Color coding based on

    vdw_area correlations with

    bioactivity.

    Only series with activity data

    for five additional derivatives

    (e.g. 4-CF3, 4-OH ...) are used

    in correlation analysis

    (n>=10). Resulting

    correlations with p > 0.1 were

    omitted.

    The remaining coefficients

    were used for color coding.

    Red ... positive correlation

    Blue ... negative correlation

    and continent steric island

    steric island and continent

    -

    aryloxy

    Frequency contour map

    Color coding based on

    frequency of patterns.

    Red = high frequency

    Blue = low frequency

    Only Topliss patterns (, , +, Es ) and rankings patterns with

    four or more series (new1,