Final Project Summary

10
Duquesne University URP 2014 The Project Pharmacophore Discovery Aimed at Inhibiting the Spi-1C/EBPβ Interaction Author: Emily Cribas [email protected] Penn State University Supervisors: Dr. Jeffry D. Madura Dr. Philip E. Auron July 24, 2014

description

Duquesne

Transcript of Final Project Summary

Page 1: Final Project Summary

Duquesne University

URP 2014

The Project

Pharmacophore Discovery Aimed atInhibiting the Spi-1•C/EBPβ Interaction

Author:Emily [email protected] State University

Supervisors:Dr. Jeffry D. Madura

Dr. Philip E. Auron

July 24, 2014

Page 2: Final Project Summary

I. Introduction

A. Significance

The interleukin gene encodes for the IL-1β protein, which is important for cell signaling. In this case,the protein helps module immune responses, including fever and inflammation. The gene ”turns on” withthe help of an enhancer and a promoter.[1] Notably, the complex that forms between one of the enhancersand one of the proteins causes overstimulation of the gene, which in turn, can lead to inflammatory diseasessuch as gout, rheumatoid arthritis, and inflammatory bowl disease.[2]

B. Project Summary

Spi-1 is a member of the ETS family of transcription factors, which are distinguished by their DNAbinding domain (DBD) that bind to a piece of DNA containing a specific recognition sequence. In the caseof Spi-1, the DBD is found in α helix 3 of its winged helix-turn-helix (wHTH) structure.[3]. For clarification,only the DBD of Spi-1 (specifically, Arg232 and Arg235) binds to the recognition sequence (in this case, astring of 4 purine bases) in the interleukin gene. To reiterate, this project focuses solely on the DBD portionof the Spi-1 transcription factor.

The interleukin gene is transduced through direct interactions with Spi-1 and C/EBPβ at the promoterregion. Through a glutamic acid and an arginine, these two factors interact via hydrogen bonding.[4] Oncethis interaction occurs, transcription is activated and the RNA polymerase proceeds to transcribe DNA intoRNA.

C/EBPβ, on the other hand, is a class of enhancer transcription factors that contain a bZIP, or basicleucine zipper, which includes a basic domain that binds to the major groove in DNA. Only five pairs ofleucines are required to stabilize its structure. The remaining leucines and α helices contribute to its bindingto Spi-1, based on electrostatic interactions.[5]

The pertinent literature section describes in further detail how Arg232 on Spi-1 shows the clearest andstrongest form of interaction between the complex and the enhancer transcription factor.[4] Therefore, thearea surrounding the residue seems like the most reasonable place to explore, in terms of inhibitory bindingcapabilities.

C. Pertinent Literature

Kodandapani’s paper[3] generated the x-ray crystal structure of the Spi-1/DNA complex, and thereforefigured out the winged helix-turn-helix motif of Spi-1, which characterizes its mode of binding to DNA. Toour interest, they discovered the DNA recognition sequence to which Spi-1 binds as well as the binding ofR232 to the DNA.

Listman’s paper[4] discovered how C/EBPβ binds to the Spi-1 protein to, cooperatively, support tran-scription of IL1β. Its experiments proved that a single substitution at the arginine 232 residue results in an80% reduction in C/EBPβ bZIP binding, again proving the importance of the R232 residue in their bind-ing. It also said that residues 330−345 at the COOH-terminal end (leucine bZIP domain) of the C/EBPβmediates binding to the Spi-1 ETS domain to support Spi-1 function on the promoter.

As for the computational pertinent literature, Cheatham’s paper[6] on molecular modeling for nucleicacids provides clear guidelines as to which force fields are best for different types of DNA (in this case, B)as well as how to setup and analyze nucleic acids in MD. Also, it touches on the important factors that cancharacterize DNA movement, such as the effects a ”salty” or polar environment could have on conformationalchanges.

1

Page 3: Final Project Summary

II. Hypothesis and Specific Aims

A. Hypothesis

The region near Arg232(R232) of the Spi-1/DNA complex can accommodate a small molecule throughhydrogen bonding. The purpose of this project is to try to inhibit binding between Spi-1 and C/EBPβbecause, together, they act as a cooperative unit to begin transcription of the Interleukin gene, which canencode for a protein that can lead to inflammatory disease. Importantly, the binding between these twotranscription factors is most prominent around the R232 pocket. Therefore finding a molecule that can bestbind to this pocket will inhibit binding of the enhancer transcription factor, C/EBPβ.[4]

B. Specific Aims

i. Characterize the region near R232 of the Spi-1/DNA complex. This will be done through molecularmodeling techniques and running MD calculations using NAMD for three separate simulations: thenucleic acid, the protein, and the complex.

ii. Develop a pharmacophore model. After finding the optimal structure of the complex, we can useMOE to develop a model with desirable molecular electronic and steric features that will be critical incharacterizing the inhibitory molecule.

iii. Virtually screen a library of small molecules. Through testing each molecule, we can pick the one thatexhibits the strongest form of binding, so that future continuations of this project can include comparingbinding strengths to those of C/EBPβ to see if these predicted molecules can serve as effective pharma-cophores by inhibiting Spi-1/C/EBPβ interactions.

III. Methods

A. Setup

To prepare the complex, the PDB (ID:1PUE) was downloaded from the Protein Data Bank and the struc-ture was prepared and optimized using MMTSB tools[7] which specify commands to add missing atoms(inthis case, hydrogens), solvate the complex, add ions to neutralize the system, and finally generate a PSFwhich will be utilized for equilibration.

B. Equilibration

The equilibration phase is made up of: minimization and molecular dynamics (MD) simulations. NAMD2.9, a parallel molecular dynamics code designed for high-performance simulation of large biomolecularsystems[8] is required for both steps as we are dealing with a 40,000+ atom system with numerous forcesthat must be minimized efficiently. Ideally, no restraints would be desired to best mimic the physiologicalenvironment of the complex, but because of the size of the system, restraints are required in the beginningto control the forces and minimize the large energies.

C. Analysis and Pharmacophore Development

R[9] and VMD1.9.1[10] were used to monitor the progress of equilibration and stability of the complex. Byplotting values such as potential energy and volume in R, proximity to near-convergence can be calculated.Through use of the moviemaker option in VMD, the overall complex movement and conformational changescan be seen on a wider scale to avoid highly detrimental instabilities and possible denaturation.

In this case, harmonic constraints with force constants of 50 and 100 kJ/mol, respectively, were placedon the system and were run separately to expedite convergence.

MOE[11] was used to create a validated pharmacophore query with five features based on optimal bindinglocations within the R232 binding pocket. A library was chosen arbitrarily to cross-reference any molecules

2

Page 4: Final Project Summary

that may have any of the desired features of the inhibitory molecule and were ranked based upon rmsd,compared to the location of the features’ annotation points, and rscore, a sum of all pharmacophore featurepoints.

IV. Limitations and Accuracy

Since this is an explicit solvent system with over 42,000 atoms, the complexity and computational costis much greater than a typical MD simulation.

Additionally, these MD simulations can get trapped in metastable conformational states that may notbe representative of reality[6] and this is due to the fact that it is not currently possible to fully sample allthermal conformations of a complex because potential energy is never going to be completely constant[6]

Even more limiting, ion and protein parameters tend to underestimate interactions with DNA. Ion pa-rameters, or AMBER adapted Aqvist parameters, for example, may underestimate the free energy of salva-tion.[12]

Finally, since the Interleukin gene represents only a short strand of the entire DNA sequence, character-istics of the DNA may be misrepresented. For example, the flexibility and structure of DNA depends on amultitude of factors including base pairing. Adding base pairs to a sequence of DNA, even more if they areGC base pairs (they have triple hydrogen bonds and better π stacking interactions) can increase stability ofthe structure, and make it less prone to other conformations.

V. Data and Interpretations

A. Simulation Analysis

For each of the three simulations, RMSD and RMSF values were compared to ensure there were nodiscrepancies in the behavior of the two macromolecules.

To clarify, an RMSD value is a change in displacement between the position of a particular set of atoms(for the protein, they were: Cα,C1,N, and O due to the stability and regularity of a peptide bond, and forthe nucleic acid, they were: N1/N9, C4, and C1 due to the regularity of the glycosydic bond) at a certaintime frame with respect to a reference position of those same atoms at a reference frame (usually at time=0)averaged for each of the chosen particles, giving average displacements of all of those atoms for each timeframe. These values are important in determining whether a protein or nucleic acid has underwent a con-formational change or has reached equilibration.

(a) Bound Spi-1, Mean RMSD: 0.774A (b) Unbound Spi-1, Mean RMSD: 1.377A

Figure 1: RMSD of Backbone Atoms in Spi-1

(a) Illustrates the near-convergence of complexed Spi-1 20 ns into the simulation in terms of movement,seeing as the RMSD value is relatively stable. Besides a peak at about 28 ns that could signify a confor-mational change, there are no unaccounted for discrepancies in the graph. However, (b) free Spi-1 does notdisplay a converged RMSD value, at least not as quickly as bound Spi-1 did. The protein in our complex hasreached a stable and lower RMSD value, and so, its final structure can be examined for binding potentialsnear R232.

3

Page 5: Final Project Summary

(a) Bound DNA, Mean RMSD:0.948A (b) Unbound DNA, Mean RMSD:1.446A

Figure 2: RMSD of Glycosidic Atoms in DNA

(a) Displays a large peak at around 82 ns into the simulation, possibly meaning the DNA in complex hasundergone a conformational change, so running the simulation for a longer period of time until the RMSDvalue has further stabilized is desirable. Interestingly, (b) the free DNA has seemed to reach a stable, buthigher RMSD value during the same time frame. Further analysis using programs such as Curves+[13] andCanal[14] should be done to further analyze the behavior of the nucleic acid in its bound and unbound statethroughout the simulation.

An RMSF value, however, is a change between the position of a particle at a certain time frame withrespect to the position of that particle at a reference frame averaged over time, giving average displacementsfor each atom per residue or base. These values are important for understanding how ”floppy” a certainresidue may be compared to the rest of the protein or how unstable a base may be compared to the entirestrand of DNA.

Figure 3: RMSF per Residue of Spi-1

RMSF values for the unbound and bound states of Spi-1 are very similar for the majority of residues inthe DBD, but differ dramatically at the beginning and ending residues. This difference can be accounted forthe fact that since unbound Spi-1 does not have a DNA complexed to it, it has nothing to limit its ends frommoving, because it has nothing on its ends to bind to. Notably, the region near residue 232 is similar in bothstates, meaning that its movement has remained undisturbed, and there are no irregularities throughout theentire simulation.

4

Page 6: Final Project Summary

Figure 4: RMSF per Base of DNA

Again, both RMSF values are comparable to each other with no major discrepancies, and the lowerRMSF values for the end bases can be accounted for by the harmonic restraints placed on the ends in bothsimulations. It would be interesting to measure RMSF values for these states without restraints, but theywere originally placed to avoid unraveling or disruption of the gene.

B. Pharmacophore Modeling

(a) Binding Pocket with Features (b) Pharmacophore Features

Figure 5: Pharmacophore Features in the R232 Binding Pocket

(a) Highlights the R232 binding pocket in pink and green(hydrophilic and hydrophobic areas) with thepharmacophore features from (b) depicted as spheres. Many of the features in our model contain hydrogenbonding to mostly water molecules found in the pocket, highlighting the importance of waters in this region.The features were chosen based on areas with high binding accessibility to the DNA, waters, or the protein,excluding any type of binding to Arg232 to avoid any perterbance in the bonding found in the complex.

5

Page 7: Final Project Summary

C. Library Screening

(a) Molecule 1 (b) Molecule 2 (c) Molecule 3

Figure 6: Stereo View of Screened Molecules

(a) Molecule 1 (b) Molecule 2

(c) Molecule 3

Figure 7: Ligand Interaction Maps of Screened Molecules

Figures 7 and 8 display the nature of the top 3 hits obtained from the arbitrarily chosen database af-ter cross-referencing with our pharmacophore features. Each molecule contains solvent contacts and somecontain important hydrogen bonds to residues in Spi-1 and/or nucleic acid bases. Table 1 further describesbinding features of each molecule.

6

Page 8: Final Project Summary

# molecule rmsd rscore

1 0.328 3.690

2 0.448 3.693

3 1.091 11.936

Table 1: Library of Small Molecules

This table displays only 3 of the thousands of hits obtained from screening results. The rmsd is thecalculated distance from the center of the feature (annotation point) to the atom of the molecule con-taining that particular feature.The rscore denotes the sum of individual feature rscores, the acceptor ordonor strength of matching atoms per pharmacophore feature. Importantly, a low RMSD and high rscoreare desirable characteristics of our molecule. In this case, the correct balance between the two must be found.

VI. Conclusions

The analysis of the three simulations validates our final Spi-1:DNA structure in the fact that they provethat, for the most part, our complex has converged, and is behaving as it should, in terms of movement.Further analysis on the nucleic acid should be conducted to investigate the unconverged nature of the boundDNA.

After defining our R232 pocket, we have determined that it has provided us with a viable pharmacophoremodel, because, from it, we have been able to screen a library of small molecules, and had a positive outcomefrom it.

Each of our five pharmacophore features found in the binding pocket are either hydrogen acceptor ordonor features, and so, our screened molecules contain at least one of these features, supporting our hypoth-esis that our inhibitory molecule will act through hydrogen bonding.

7

Page 9: Final Project Summary

VII. Future Work

In the future, we hope to refine our screening results by using different libraries and possibly refining ourpharmacophore query to include a more accurate and limited library of small molecules.

After we have narrowed down our results, we can use molecular docking techniques to attach eachmolecule to its corresponding binding region (separately) and conduct MD simulations for each molecule inthe complex.

Finally, we can compare the binding strengths of each of these molecules to those of C/EBPβ to deter-mine whether the molecule could, in fact, inhibit the binding of this protein.

Additionally, obtaining a crystal structure of the two proteins on DNA would be extremely helpful incarrying out more accurate computational simulations, modeling, and screening procedures.

VIII. Acknowledgements

• National Science Foundation, Major Research Instrumentation (MRI) Grant Number: CHE-1126465

• National Institutes of Health R25, National Institute on Drug Abuse (NIDA) Grant Number: 1 R25DA032519-01

• Duquesne University Undergraduate Research Program (URP)

• Madura Research Group

• Auron Research Group

• Scott Boesch

• Emilio Esposito

8

Page 10: Final Project Summary

References

1. Adamik, J.; Wang, K. Z. Q.; Unlu, S.; Su, A.-J. a.; Tannahill, G. M.; Galson, D. L.; O’Neill, L. a.;Auron, P. E. PloS one Jan. 2013, 8, e70622.

2. Hazuda, J.; Simon, L. 1990.

3. Kodandapani, R.; Pio, F.; Ni, C.-Z.; Piccialli, G.; Al, E. English Nature Apr. 1996, 380, 456–Kodandapani, R., Pio, F., Ni, C.–Z., Piccialli.

4. Listman, J. a.; Wara-aswapati, N.; Race, J. E.; Blystone, L. W.; Walker-Kopp, N.; Yang, Z.; Auron,P. E. The Journal of biological chemistry Dec. 2005, 280, 41421–41428.

5. Tahirov, T. H.; Sato, K.; Ichikawa-Iwata, E.; Sasaki, M.; Inoue-Bungo, T.; Shiina, M.; Kimura, K.;Takata, S.; Fujikawa, A.; Morii, H.; Kumasaka, T.; Yamamoto, M.; Ishii, S.; Ogata, K. Cell Jan. 2002,108, 57–70.

6. Cheatham, T. E.; Young, M. a. Biopolymers 2000, 56, 232–56.

7. Feig, M.; Karanicolas, J.; Brooks, C. L. Journal of molecular graphics and modeling May 2004, 22,377–95.

8. Phillips, J. C.; Braun, R.; Wang, W.; Gumbart, J.; Tajkhorshid, E.; Villa, E.; Chipot, C.; Skeel, R. D.;Kale, L.; Schulten, K. Journal of computational chemistry Dec. 2005, 26, 1781–802.

9. R Core Team R: A Language and Environment for Statistical Computing.; R Foundation for StatisticalComputing, Vienna, Austria, 2014.

10. Humphrey, W.; Dalke, A.; Schulten, K. VMD: Visual Molecular Dynamics., 1996.

11. Inc., C. C. G. Molecular Operating Environment (MOE), 2011.10.

12. Cheatham, T. E.; Young, M. A. Biopolymers Jan. 2000, 56, 232–256.

13. Blanchet, C.; Pasi, M.; Zakrzewska, K.; Lavery, R. Nucleic acids research July 2011, 39, W68–73.

14. Lavery, R.; Moakher, M.; Maddocks, J. H.; Petkeviciute, D.; Zakrzewska, K. Nucleic acids researchSept. 2009, 37, 5917–29.

9