Tuesday, June 16th, 2009 Introduction to SHELX C/D/E and ...

33
Tuesday, June 16 th , 2009 Introduction to SHELX C/D/E and HKL2MAP EMBO / MAX-INF2 Practical Course Tim Grüne http://shelx.uni-ac.gwdg.de

Transcript of Tuesday, June 16th, 2009 Introduction to SHELX C/D/E and ...

Tuesday, June 16th, 2009

Introduction to SHELX C/D/E and HKL2MAPEMBO / MAX-INF2 Practical Course

Tim Grüne

http://shelx.uni-ac.gwdg.de

SHELX C/D/E and HKL2MAP Tim Grüne

Overview

Starring:

SHELXC

SHELXD

SHELXE

HKL2MAP

Overview 1/32

SHELX C/D/E and HKL2MAP Tim Grüne

Context of SHELX C/D/E within Structure Solution

xdsconv

f2mtz

shelxc shelxd

DataIntegration

HKL2000 substructure sol.+density modification

mtz2various

mtz2scaMosflm

XDSxds2sad+sadabs

shelxe

sharpcoot

arp/warp

RefinementModel Building

Where are we? 2/32

SHELX C/D/E and HKL2MAP Tim Grüne

Prelude: Data Conversion as Input for SHELXC

SHELXC can read hkl-format (a simple textfile containing one entry for H,K,L, F, σ(F ) each, or sca-format,i.e. the output format from denzo respectively scalepack (HKL2000)).

To convert from mtz-format after MOSFLM/ SCALA, one can use mtz2various (CCP4) or mtz2sca (SHELXhomepage).

XDS ASCII.HKL can be converted to sca-format or SHELX hkl-format using either XDSCONV (distributed withXDS) or the duett xds2sad and sadabs (only available with Bruker Hardware - unfortunately)

Data Conversion 3/32

SHELX C/D/E and HKL2MAP Tim Grüne

The Main Characters

SHELXC, SHELXD, and SHELXE are programs written by G. M. Sheldrick suitable for solving macromolecularstructures with nearly all available techniques other than Molecular Replacement, i.e.

MAD, SAD, SIR, MIR, SIRAS, RIP, . . .

They are command line programs driven by input scripts or options provided on the command line.

HKL2MAP is a GUI by T. Pape and T. Schneider designed to facilitate the usage of the SHELXprograms bothby setting up the required scripts and by providing graphical output to help decision making between steps.

Main Characters 4/32

SHELX C/D/E and HKL2MAP Tim Grüne

SHELXC

SHELXC 5/32

SHELX C/D/E and HKL2MAP Tim Grüne

SHELXC - design goal

SHELXC was originally designed as an alternative to xprep to quickly and non-interactively set-up an input scriptfor SHELXD and SHELXE so it can be used in high-throughput pipelines.

The job of SHELXC is to “mimic” the theoretical crystal of the substructure (see yesterday’s lecture) so thatSHELXD can calculate the substructure coordinates.

SHELXC 6/32

SHELX C/D/E and HKL2MAP Tim Grüne

SHELXC - file setup

SHELXC calculates and sets up the three input files required by SHELXD/SHELXE:

1. name fa.hkl contains the non-anomalous data for the substructure

H K L FA σ(FA) α

which is used by SHELXD (see yesterday’s lecture)

2. name fa.ins input script with instructions for SHELXD, including symmetry operators, cell, number of heavyatoms to look for, . . .

3. name.hkl the experimental data in HKLF4 format, which means: each line contains the entries

H K L F2obs σ(F2

obs)

It is read by SHELXE and can later also be used for refinement.

SHELXC 7/32

SHELX C/D/E and HKL2MAP Tim Grüne

SHELXC - sample session

This is an example of how to run SHELX C/D/E for SAD-phasing from the command line

#> shelxc tln << eof

SAD tln.sca

CELL 0.93400 92.598 92.598 128.906 90.000 90.000 120.000

SPAG P6122

FIND 6

MIND -2 -0.1

SHEL 999 2.0

SFAC Zn

NTRY 400

eof

#> shelxd tln_fa

#> shelxe tln tln_fa -s0.46 -h5 -a2 -m30 -l3 -b

#> shelxe tln tln_fa -s0.46 -h5 -a2 -m30 -l3 -b -i

SHELXC 8/32

SHELX C/D/E and HKL2MAP Tim Grüne

SHELXC - Control keywords etc.

• “tln” replaces the placeholder “name”, i.e. SHELXC will create the three files tln fa.hkl, tln fa.ins,and tln.hkl.

• the keyword SAD in the first row of the input tells SHELXC that it should prepare the data for SHELXD froma SAD-experiment. Other keywords are nearly self-explanatory and include:

– “SIR” for Single Isomorphous Replacement

– “SIRA” for the combination of sir with sad phasing (SIRAS)

– for MAD, the keywords are “PEAK”, “HREM”, “LREM”, “INFL” for the peak, high energy remote, lowenergy remote, and inflection point, two of which must be provided at least

– for RIP, the keywords are “BEFORE” and “AFTER” respectively.

Each can be complemented by “NAT” for a native data set which will be used by SHELXE for phase exten-sion.

SHELXC 9/32

SHELX C/D/E and HKL2MAP Tim Grüne

SHELXC - Control keywords etc.

• “SHEL 999 2.0” is the resolution range to be included in the substructure search, not necessarily the reso-lution of the data set.

• “SFAC” atom type to look for. SHELXD is designed to only search for one single atom types. the programis not going to search for mixtures, e.g. a SeMet protein crystal grown in the presence of Co but deal allpeaks as Se-atoms.

SHELXC 10/32

SHELX C/D/E and HKL2MAP Tim Grüne

SHELXC - Control keywords etc.

• “FIND” tells SHELXD how many positions to look for. This number should be within about 20% of the correctnumber of sites. In the case of e.g. a halide soak where the number is difficult to guess it is worth varyingif the solution is not found at first guess.

• “MIND” takes one or two options. The first number is the minimum distance between two peaks. This avoidslarge peaks to be packed with peaks where there should really only be one.

If the number is negative, the meaning remains the same, but now the PATFOM (figure of merit of thepatterson map) is printed as well as the “crossword” table with the quality and distance between selectedpeaks.

It is useful in borderline cases in order to check whether a solution is useful or not. While still being thedefault, there is very often no need the check the crossword table for sensible (biological) distances.

A second number marks the minimal allowed distance between symmetry related peaks. If this number isnegative, peaks are NOT allowed to sit on a special position. This should be switched on for most casessince macromolecules do not possess any symmetry and are therefore not allowed on special position.

SHELXC 11/32

SHELX C/D/E and HKL2MAP Tim Grüne

SHELXC - Resolution Cut-off

SHELXC prints a lot of tables that analyse the statistics of the data set. The most interesting parts are thefigures that are useful for estimating the actual resolution of the anomalous signal.

Resl. Inf - 8.0 - 6.0 - 5.0 - 4.0 - 3.5 - 3.0 - 2.5 - 2.1 - 1.9 - 1.7 - 1.51

N(data) 398 540 651 1439 1429 2502 4870 7833 6687 10136 15425

<I/sig> 98.3 93.1 94.7 103.1 99.6 85.1 68.4 51.2 35.5 20.1 9.8

%Complete 88.2 99.3 99.5 99.6 99.8 99.8 99.8 99.9 99.9 99.7 98.4

<d"/sig> 2.07 1.82 1.58 1.38 1.25 1.31 1.21 1.09 0.97 0.88 0.80

The last line < d”/sig > = ||F+|−|F−||esd shows the strength of the anomalous signal in terms of the difference

of the Bijvoet pairs. Unless you have more information, as would be the case for MAD instead of SAD, cut thedata where this figure reaches 1.3

SHELXC 12/32

SHELX C/D/E and HKL2MAP Tim Grüne

SHELXD

SHELXD 13/32

SHELX C/D/E and HKL2MAP Tim Grüne

SHELXD

#> shelxd tln_fa

This command tells SHELXD to use the (anomalous) data found in the file tln fa.hkl and to read its instruc-tions from the file tln fa.ins, both of which were set up by SHELXC.

SHELXD creates a file tln fa.res which contains the heavy atom positions (in fractional coordinates), whichcan be read into coot for inspection, and a file tln fa.lst which is the log-file of what SHELXD did.

SHELXD 14/32

SHELX C/D/E and HKL2MAP Tim Grüne

SHELXD - CC and CCweak

For each trial SHELXD prints a short summary.

PSUM 5.69 PSMF Peaks: 269 257 245 219 199 198 195 166 145 141 138 125 124

Try 48:18 Peaks 99 89 86 73 67 64 61 60

R = 0.475, Min.fun. = 0.655, <cos> = 0.165, Ra = 0.570

Try 48, CC All/Weak 17.66 / 6.75, best 36.33 / 21.66, best PATFOM 11.17

CCweak is calculated from 30% of reflections not used for refinement (like the Rfree for refinement).

• SAD: CC > 30% indicates a solution has been found

• MAD: CC > 40% indicates a solution has been found

Caveat: The worse the resolution cut-off the higher CC

SHELXD 15/32

SHELX C/D/E and HKL2MAP Tim Grüne

SHELXD - Solution Criteria (1/4)

The file name fa.ins contains one line

SHEL dmin dmax

In the above example for tln:

SHEL 999 2.0

It restricts SHELXD to using only data between 999Å and 2.0Å.

SHELXD 16/32

SHELX C/D/E and HKL2MAP Tim Grüne

SHELXD - Solution Criteria (2/4)

SHELXD writes the current best solution (i.e. with the highest CC) to the file name fa.res

REM TRY 72 CC 36.33 CC(weak) 21.66 TIME 171 SECS

TITL tln15_fa.ins SAD in P6122

CELL 0.93400 92.60 92.60 128.91 90.00 90.00 120.00

[...]

ZN01 1 0.882660 0.551689 0.054363 1.0000 0.2

ZN02 1 0.558273 0.434631 0.122867 0.2870 0.2

ZN03 1 0.867584 0.625603 -0.065709 0.2290 0.2

ZN04 1 0.859467 0.616768 -0.036969 0.2284 0.2

ZN05 1 0.782043 0.488747 -0.081860 0.2252 0.2

ZN06 1 0.867943 0.514267 0.058010 0.0587 0.2

ZN07 1 0.694687 0.392235 0.050586 0.0540 0.2

ZN08 1 0.844887 0.527946 0.055911 0.0260 0.2

x y z rel. occ.

SHELXD 17/32

SHELX C/D/E and HKL2MAP Tim Grüne

SHELXD - Solution Criteria (3/4)

If you cannot find a solution with SHELXD straight away, it is worth running SHELXD over a range of resolutionswith a large number of trials each:

1. #> cp tln_fa.hkl tln16_fa.hkl

#> cp tln_fa.ins tln16_fa.ins

2. change SHEL 999 2.0 to SHEL 999 1.6 in tln16 fa.ins

3. change NTRY 100 to NTRY 10000 in tln16 fa.ins

4. run this script

#> shelxd tln16_fa

Repeat this for resolutions 1.6,1.7,1.8, . . .2.5. This may take a few minutes.

SHELXD 18/32

SHELX C/D/E and HKL2MAP Tim Grüne

SHELXD - Solution Criteria (4/4)

SHELXD 19/32

SHELX C/D/E and HKL2MAP Tim Grüne

SHELXE

SHELXE 20/32

SHELX C/D/E and HKL2MAP Tim Grüne

SHELXE - command line options

#> shelxe tln tln_fa -s0.46 -h5 -a2 -m30 -l3 -b (-i)

1. read data from tln.hkl and the substructure coordinates from tln fa.res (order is important!)

Options (no space between option letter and value!)

2. -s0.46: assume a solvent content of 46% (rule of thumb: 140Å3 / residue, 380Å3 / base )

3. -h5: the heavy atoms are present in the data from tln.hkl (i.e. no native data set) and only use the first5 atoms found in tln fa.res.

SHELXE 21/32

SHELX C/D/E and HKL2MAP Tim Grüne

SHELXE - command line options

#> shelxe tln tln_fa -s0.46 -h5 -a2 -m30 -l3 -b (-i)

4. -a2: carry out 2 cycles of backbone autobuilding (current β-version only!). 2-5 cycles are usually sufficient.The coordinates can be found in tln.pdb and are updated after each cycle.

5. -m20: run 20 cycles of density modification.

6. -l3: allocate memory for 3,000,000 reflections (default: 2,000,000)

7. -b: refine the substructure positions based on the improved phases. Improved coordinates written totln.hat. It has the same format as tln fa.res, and improved phases can be achieved by renamingtln.hat to tln fa.res and re-running the same SHELXE command.

SHELXE 22/32

SHELX C/D/E and HKL2MAP Tim Grüne

SHELXE - command line options

#> shelxe tln tln_fa -s0.46 -h5 -a2 -m30 -l3 -b (-i)

8. -i: invert hand of substructure: In most space groups, the substructure and the substructure with theinverted hand (mirrored about a point) will produce exactly the same data in SHELXD. I.e., SHELXD cannotdistinguish between the coordinates in tln fa.res and their inverted image. It does, however, make adifference which hand you use for calculating the electron density map. Therefore SHELXE usually has tobe run twice, once for the coordinates as found in tln fa.res and once for these tln fa.res invertedabout some point. The point is space group specific and in most cases simply the origin. SHELXE takescare of all the maths behind this.

For the run where -i was given at the command line, all output files will have an i appended to theirfilenames.

SHELXE 23/32

SHELX C/D/E and HKL2MAP Tim Grüne

SHELXE - did it work? (1/2)

The ultimate and simplest answer to this question is to look at the output from SHELXE with your favourite modelbuilding program (mine is coot).

The files to load are:

• name.pdb, name i.pdb the result from model building - does it look like a protein?

• name.phs, name i.phs to calculate the corresponding electron density map. The pdb-file must be loadedbeforehand for the phs-file does not contain symmetry information or the cell, required to create the map.

• name.hat, name i.hat improved coordinates of the heavy atoms - do their positions make chemicalsense?

SHELXE 24/32

SHELX C/D/E and HKL2MAP Tim Grüne

SHELXE - did it work? (2/2)

TLN: backbone has clear secondary structure elements,density show side chains (Tyrosin)

TLN inverted hand: no secondary structure, non-continuous chain, no clear peaks for heavy atoms . . .

SHELXE 25/32

SHELX C/D/E and HKL2MAP Tim Grüne

HKL2MAP - SHELXC main window

Main Window:• Select among SAD, SIR, SIRAS,

and MAD• load data files• run shelxc• check graphics

HKL2MAP 26/32

SHELX C/D/E and HKL2MAP Tim Grüne

HKL2MAP - SHELXC: resolution cut-off?

<d”/sig> > 1.3 not a very strong indica-tor for resolution cut-off

SIRAS, MAD: compare anomalous signalbetween datasets: CC>30-40% is a goodmarker for cut-off

HKL2MAP 27/32

SHELX C/D/E and HKL2MAP Tim Grüne

HKL2MAP - SHELXD main window

• Select number of expected heavy atoms• Select type of atom• Set the low resolution cut-off• Go!

HKL2MAP 28/32

SHELX C/D/E and HKL2MAP Tim Grüne

HKL2MAP - SHELXD graphics

The graphics are being updated automatically while SHELXD runs. Solutions become obvious when they arefound.

If the occupancy of the heavy atoms is closeto one (SeMet prep, but not e.g. a halidesoak), a clear drop in occupancy shows thata solution has been found.

For good data there is usually a gap be-tween correct and false solutions. The betterthe data the more often a correct solution isfound.

HKL2MAP 29/32

SHELX C/D/E and HKL2MAP Tim Grüne

HKL2MAP - SHELXE main window

• HKL2MAP calculates the solvent con-tent from the number of residues• it simply assumes an average volume of

140Å3/residue (protein)• for nucleic acids this is approx.

380Å3/base• a strong difference in contrast already

shows which hand is correct

HKL2MAP 30/32

SHELX C/D/E and HKL2MAP Tim Grüne

HKL2MAP - Summary

• Good for most cases

• Writes input scripts in the working directory

• fine tuning only by editing scripts

• Very helpful graphics

HKL2MAP 31/32

SHELX C/D/E and HKL2MAP Tim Grüne

References

References 32/32