SASBDB Small Angle Scattering Biological Data Bank

Post on 16-Oct-2021

3 views 0 download

Transcript of SASBDB Small Angle Scattering Biological Data Bank

SASBDB Small Angle Scattering Biological Data Bank

Erica ValentiniDmitri Svergun group

Solution Scattering from biological macromolecules EMBO course 2014

Index

1. Introduction:– What is SAS?

– Do we need a SAS database?

2. SASBDB:– Features

– Usage

– Quality check

– Missing

3. Conclusions

2SAS EMBO Course 201411/2/2014

Index

1. Introduction:– What is SAS?

– Do we need a SAS database?

2. SASBDB:– Features

– Usage

– Quality check

– Missing

3. Conclusions

3SAS EMBO Course 201411/2/2014

What is SAS?SAS Experiment

2θs

|s| = 4π sinθ/λ

s scattering vector2θ scattering angleλ wavelengthI(s) intensity

X-ray/Neutron beam

Low resolution Model

ATSAS

Scattering In

tensity, Lo

g I(s)

4SAS EMBO Course 201411/2/2014

What is SAS?ATSAS Package

Rg

MM

Dmax

Volume

Shape

Rigid bodymodelling

Missingfragments

Oligomericmixtures

FlexibleSystem

5SAS EMBO Course 201411/2/2014

Do we need a SAS DB?SA(X)S advantages

Increasing popularity of SAXS

Solution

Broad size range

New developments

in software and hardware

From few kDa to GDa

Fast experiments: μor m seconds. Small amount of sample: 5-30 μl.

Monitor alteration in environmental conditions.

6SAS EMBO Course 201411/2/2014

Do we need a SAS DB?SAS database motivations

7SAS EMBO Course 2014

• Increasing number of publications about SAS and the ATSAS package.

• Increasing amount of data collected with a single experiment.

• Importance of making the data underlying scientific publications available for the community.

Graewert, M. a and Svergun, D.I. (2013) Impact and progress in small and wide angle X-ray scattering (SAXS and WAXS). Curr. Opin. Struct. Biol., 23, 748–54.Franke, D., Kikhney, A.G. and Svergun, D.I. (2012) Automated acquisition and analysis of small angle X-ray scattering data. Nucl. Instruments Methods Phys. Res. Sect. A Accel. Spectrometers, Detect. Assoc. Equip., 689, 52–59.Collins, F.S. and Tabak, L. a (2014) Policy: NIH plans to enhance reproducibility. Nature, 505, 612–3.

0

50

100

150

200

250

300

350

400

2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012

Number of publications referring to biological SAS

ATSAS

bioSAS

.

11/2/2014

Do we need a SAS DB?wwPDB SAS task force

SAS EMBO Course 2014 8

Trewhella, J., Hendrickson, W.A., Kleywegt, G.J., Sali, A., Sato, M., Schwede, T., Svergun, D.I., Tainer, J.A., Westbrook, J. and Berman, H.M. (2013) Report of the wwPDB Small-Angle Scattering Task Force: Data Requirements for Biomolecular Modeling and the PDB. Structure, 21, 875–881.

“…a global repository is needed that holds standard format X-ray and neutron SAS data that is searchable and freely accessible for download”

Database and small angle scattering experts

SASBDB11/2/2014

Do we need a SAS DB?Existing DB including SAS data

Database SAS data included Missing

47 models where SAS was used for refinement

Primary data used to calculate the models

Scattering curves from 20.000 pdb structures

Models and possibility to deposit SAS data.

SAXS data and models Complete search, cross-references to other databases, quality check on data

Scattering curves and ensembles models fromdisordered proteins

SAS data and models from “not disordered proteins”

9SAS EMBO Course 201411/2/2014

Do we need a SAS DB?Existing DB including SAS data

Database SAS data included Missing

47 models where SAS was used for refinement

Primary data used to calculate the models

Scattering curves from 20.000 pdb structures

Models and possibility to deposit SAS data.

SAXS data and models Complete search, cross-references to other databases, quality check on data

Scattering curves and ensembles models fromdisordered proteins

SAS data and models from “not disordered proteins”

10SAS EMBO Course 2014

Berman, H., Henrick, K., Nakamura, H. and Markley, J.L. (2007) The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data. Nucleic Acids Res., 35, D301–3.

11/2/2014

Do we need a SAS DB?Existing DB including SAS data

Database SAS data included Missing

47 models where SAS was used for refinement

Primary data used to calculate the models

Scattering curves from 20.000 pdb structures

Models and possibility to deposit SAS data.

SAXS data and models Complete search, cross-references to other databases, quality check on data

Scattering curves and ensembles models fromdisordered proteins

SAS data and models from “not disordered proteins”

11SAS EMBO Course 2014

dara.embl-hamburg.deSokolova, A. V, Volkov, V. and Svergun, D. I. (2003) Prototype of a database for rapid protein classification based on solution scattering data. Conference papers classification based on solution scattering data. 1, 865–868.

11/2/2014

Do we need a SAS DB?Existing DB including SAS data

Database SAS data included Missing

47 models where SAS was used for refinement

Primary data used to calculate the models

Scattering curves from 20.000 pdb structures

Models and possibility to deposit SAS data.

SAXS data and models Complete search, cross-references to other databases, quality check on data

Scattering curves and ensembles models fromdisordered proteins

SAS data and models from “not disordered proteins”

12SAS EMBO Course 2014

Hura, G.L., Menon, A.L., Hammel, M., Rambo, R.P., Poole, F.L., Tsutakawa, S.E., Jenney, F.E., Classen, S., Frankel, K. a, Hopkins, R.C., et al. (2009) Robust, high-throughput solution structural analyses by small angle X-ray scattering (SAXS). Nat. Methods, 6, 606–12.

11/2/2014

Do we need a SAS DB?Existing DB including SAS data

Database SAS data included Missing

47 models where SAS was used for refinement

Primary data used to calculate the models

Scattering curves from 20.000 pdb structures

Models and possibility to deposit SAS data.

SAXS data and models Complete search, cross-references to other databases, quality check on data

Scattering curves and ensembles models fromdisordered proteins

SAS data and models from “not disordered proteins”

13SAS EMBO Course 2014

Varadi, M., Kosol, S., Lebrun, P., Valentini, E., Blackledge, M., Dunker, a K., Felli, I.C., Forman-Kay, J.D., Kriwacki, R.W., Pierattelli, R., et al. (2014) pE-DB: a database of structural ensembles of intrinsically disordered and of unfolded proteins. Nucleic Acids Res., 42, D326–35.

11/2/2014

Index

1. Introduction:– What is SAS?

– Do we need a SAS database?

2. SASBDB:– Features

– Usage

– Quality check

– Missing

3. Conclusions

14SAS EMBO Course 201411/2/2014

SASBDB features:

1. Entries

2. Cross links

3. Searching

4. Browsing

5. Benchmark

6. Plots

7. Interactivity

8. Availability

SAS EMBO Course 2014 1511/2/2014

1. Entries

SAS EMBO Course 2014 16

1. Entries2. Cross links3. Searching4. Browsing5. Benchmark6. Plots7. Interactivity8. Availability

11/2/2014

www.sasbdb.org

1. Entries

SAS EMBO Course 2014 17

1. Entries2. Cross links3. Searching4. Browsing5. Benchmark6. Plots7. Interactivity8. Availability

11/2/2014

2. Cross links

SAS EMBO Course 2014 18

1. Entries2. Cross links3. Searching4. Browsing5. Benchmark6. Plots7. Interactivity8. Availability

11/2/2014

3. Searching1. Simple search:

SAS EMBO Course 2014 19

1. Entries2. Cross links3. Searching4. Browsing5. Benchmark6. Plots7. Interactivity8. Availability

11/2/2014

3. Searching1. Simple search:

SAS EMBO Course 2014 20

2. Advanced search:

1. Entries2. Cross links3. Searching4. Browsing5. Benchmark6. Plots7. Interactivity8. Availability

11/2/2014

3. Searching

SAS EMBO Course 2014 21

Browsing unit

1. Entries2. Cross links3. Searching4. Browsing5. Benchmark6. Plots7. Interactivity8. Availability

11/2/2014

4. Browsing

SAS EMBO Course 2014 22

Scattering curve

Model

Kratky plot

Experiment information

Publication

Structural parametersUnique code

format: SASXXXN

1. Entries2. Cross links3. Searching4. Browsing5. Benchmark6. Plots7. Interactivity8. Availability

11/2/2014

4. Browsing

SAS EMBO Course 2014 23

Chronological order

Browse according to the selected field

1. Entries2. Cross links3. Searching4. Browsing5. Benchmark6. Plots7. Interactivity8. Availability

11/2/2014

5. Benchmark

SAS EMBO Course 2014 24

Benchmark

1. Entries2. Cross links3. Searching4. Browsing5. Benchmark6. Plots7. Interactivity8. Availability

11/2/2014

5. Benchmark

SAS EMBO Course 2014 25

• 17 Entries from a set of 14 “standard proteins”

• SAXS and WAXS data

• Extra purification steps

• Benchmark for algorithm testing proposes

• Dissemination

Dissemination

1. Entries2. Cross links3. Searching4. Browsing5. Benchmark6. Plots7. Interactivity8. Availability

11/2/2014

6. Plots

SAS EMBO Course 2014 26

Scattering plot

Guinierregion

Kratky plot

P(r) distribution

1. Entries2. Cross links3. Searching4. Browsing5. Benchmark6. Plots7. Interactivity8. Availability

11/2/2014

vRadius of Gyration

Maximum Distance

MWs & Porod

Volume

vRadius of Gyration

27SAS EMBO Course 2014

6. Plots

1. Entries2. Cross links3. Searching4. Browsing5. Benchmark6. Plots7. Interactivity8. Availability

11/2/2014

Fitting 1 Model 1

Fitting 2 Model 2

28SAS EMBO Course 2014

7. Interactivity

1. Entries2. Cross links3. Searching4. Browsing5. Benchmark6. Plots7. Interactivity8. Availability

11/2/2014

Fitting 3 Model 1

Model 2

Model 3

29SAS EMBO Course 2014

Model 4

7. Interactivity

1. Entries2. Cross links3. Searching4. Browsing5. Benchmark6. Plots7. Interactivity8. Availability

11/2/2014

Experim

ental

details

Mo

lecule

details

30SAS EMBO Course 2014

7. Interactivity

1. Entries2. Cross links3. Searching4. Browsing5. Benchmark6. Plots7. Interactivity8. Availability

11/2/2014

8. Availability

SAS EMBO Course 2014 31

1. Entries2. Cross links3. Searching4. Browsing5. Benchmark6. Plots7. Interactivity8. Availability

11/2/2014

8. Availability

SAS EMBO Course 2014 32

1. Entries2. Cross links3. Searching4. Browsing5. Benchmark6. Plots7. Interactivity8. Availability

11/2/2014

8. Availability

• Possibility to log in using ATSAS account

• Submission form

• Users can choose between:– “on hold”

– “public”

33SAS EMBO Course 2014

1. Entries2. Cross links3. Searching4. Browsing5. Benchmark6. Plots7. Interactivity8. Availability

11/2/2014

Index

1. Introduction:– What is SAS?

– Do we need a SAS database?

2. SASBDB:– Features

– Usage

– Quality check

– Missing

3. Conclusions

34SAS EMBO Course 201411/2/2014

SASBDB Usage

SAS EMBO Course 2014 35

More than 500 users from August 2014We are currently monitoring also search items and number of downloads

11/2/2014

SASBDB Usage: use cases

11/2/2014 SAS EMBO Course 2014 36

SAS userSAS novice Article referee

11/2/2014 SAS EMBO Course 2014 37

SASBDB Usage: use cases

11/2/2014 SAS EMBO Course 2014 38

SASBDB Usage: use cases

11/2/2014 SAS EMBO Course 2014 39

SASBDB Usage: use cases

11/2/2014 40

SASBDB Usage: use cases

11/2/2014 41

SASBDB Usage: use cases

11/2/2014 42

SASBDB Usage: use cases

SAS EMBO Course 2014

11/2/2014 43

SASBDB Usage: use cases

11/2/2014 44

SASBDB Usage: use cases

SAS EMBO Course 2014

11/2/2014 45

SASBDB Usage: use cases

SAS EMBO Course 2014

11/2/2014 46

SASBDB Usage: use cases

SAS EMBO Course 2014

Index

1. Introduction:– What is SAS?

– Do we need a SAS database?

2. SASBDB:– Features

– Usage

– Quality check

– Missing

3. Conclusions

47SAS EMBO Course 201411/2/2014

SASBDB Quality check:Difference Rg (Guinier) and Rg (p(r))

11/2/2014 SAS EMBO Course 2014 48

A B

SASBDB Quality check:Difference Rg (Guinier) and Rg (p(r))

11/2/2014 SAS EMBO Course 2014 49

A B

SASBDB Quality check:Difference MW (expected) and MW (experimental)

11/2/2014 SAS EMBO Course 2014 50

A B

SASBDB Quality check:Quality p(r) distribution

11/2/2014 SAS EMBO Course 2014 51

A B

SASBDB Quality check:Quality Guinier region

11/2/2014 SAS EMBO Course 2014 52

A B

SASBDB Quality check:Quality of the fit

11/2/2014 SAS EMBO Course 2014 53

A B

SASBDB Quality check:Quality of the data

11/2/2014 SAS EMBO Course 2014 54

A B

SASBDB Quality check:Quality of the data

11/2/2014 SAS EMBO Course 2014 55

A B

SASBDB Quality check:

11/2/2014 SAS EMBO Course 2014 56

A B

A B

• Difference between structural parameters• Quality of the Guinier region• Quality of the p(r) distribution• Discrepancy between expected and experimental MW• Overall quality of the data• Goodness of fit of the model

Quality score based on the comparison between the selected entry and all the other entries.

Index

1. Introduction:– What is SAS?

– Do we need a SAS database?

2. SASBDB:– Features

– Usage

– Quality check

– Missing

3. Conclusions

57SAS EMBO Course 201411/2/2014

SASBDB: missing

Network of SAS databases

Validation/Quality check

Pipeline to compare

values

Assessment of the

angular range

Difference between

curves

Validation of models

Standard format

sasCIF

Submission interface

Automatic

SAS EMBO Course 2014 5811/2/2014

SASBDB: missing

Network of SAS databases

Validation/Quality check

Pipeline to compare

values

Assessment of the

angular range

Difference between

curves

Validation of models

Standard format

sasCIF

Submission interface

Automatic

SAS EMBO Course 2014 5911/2/2014

Berman, H., Henrick, K., Nakamura, H. and Markley, J.L. (2007) The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data. Nucleic Acids Res., 35, D301–3.

SASBDB: missing

Network of SAS databases

Validation/Quality check

Pipeline to compare

values

Assessment of the

angular range

Difference between

curves

Validation of models

Standard format

sasCIF

Submission interface

Automatic

SAS EMBO Course 2014 60

Read, R.J., Adams, P.D., Arendall, W.B., III, Brunger, A.T., Emsley, P., Joosten, R.P., Kleywegt, G.J., Krissinel, E.B., Lutteke, T., Otwinowski, Z., Perrakis, A., Richardson, J.S., Sheffler, W.H., Smith, J.L., Tickle, I.J., Vriend, G., Zwart, P.H.. (2011) A new generation of crystallographic validation tools for the Protein Data Bank. Structure 19: 1395-1412.

11/2/2014

SASBDB: missing

Network of SAS databases

Validation/Quality check

Pipeline to compare

values

Assessment of the

angular range

Difference between

curves

Validation of models

Standard format

sasCIF

Submission interface

Automatic

SAS EMBO Course 2014 61

Franke, D., Kikhney, A.G. and Svergun, D.I. (2012) Automated acquisition and analysis of small angle X-ray scattering data. Nucl. Instruments Methods Phys. Res. Sect. A Accel. Spectrometers, Detect. Assoc. Equip., 689, 52–59.

11/2/2014

SASBDB: missing

Network of SAS databases

Validation/Quality check

Pipeline to compare

values

Assessment of the

angular range

Difference between

curves

Validation of models

Standard format

sasCIF

Submission interface

Automatic

SAS EMBO Course 2014 62

Konarev, P. and Svergun, D.I. (2014) Submitted.

11/2/2014

SASBDB: missing

Network of SAS databases

Validation/Quality check

Pipeline to compare

values

Assessment of the

angular range

Difference between

curves

Validation of models

Standard format

sasCIF

Submission interface

Automatic

SAS EMBO Course 2014 63

Franke, D., Jeffries, C.M. and Svergun, D.I. (2014) Submitted.

11/2/2014

SASBDB: missing

Network of SAS databases

Validation/Quality check

Pipeline to compare

values

Assessment of the

angular range

Difference between

curves

Validation of models

Standard format

sasCIF

Submission interface

Automatic

SAS EMBO Course 2014 64

Tuukkanen, A. and Svergun, D.I. (2015) In preparation.

11/2/2014

SASBDB: missing

Network of SAS databases

Validation/Quality check

Pipeline to compare

values

Assessment of the

angular range

Difference between

curves

Validation of models

Standard format

sasCIF

Submission interface

Automatic

SAS EMBO Course 2014 65

Malfois, M. and Svergun, D.I. (2000) sasCIF: an extension of core Crystallographic Information File for SAS. J. Appl. Crystallogr., 33, 812–816.

11/2/2014

SASBDB: missing

Network of SAS databases

Validation/Quality check

Pipeline to compare

values

Assessment of the

angular range

Difference between

curves

Validation of models

Standard format

sasCIF

Submission interface

Automatic

SAS EMBO Course 2014 66

Yang, H., Guranovic, V., Dutta, S., Feng, Z., Berman, H. M. & Westbrook, J. D. (2004). Automated and accurate deposition of structures solved by X-ray diffraction to the Protein Data Bank. Acta Cryst. D60, 1833-1839.

11/2/2014

Index

1. Introduction:– What is SAS?

– Do we need a SAS database?

2. SASBDB:– Features

– Usage

– Quality check

– Missing

3. Conclusions

67SAS EMBO Course 201411/2/2014

SASBDB: Conclusions

• With 100 entries and 163 models SASBDB is currently the largest repository of SAS data available.

• Entirely browsable according to different criteria.• Highly flexible search.• Embedded Javascript to display interactive 3D models.• Set of SAXS and WAXS data from “standard proteins”.• Cross links to other biological databases.• Aimed at different types of users• Several validation methods under development.• Development of the standard format: sasCIF.• Network of interconnected SAS databases.• Paper about SASBDB in N.A.R. 2015 Database issue.

68SAS EMBO Course 201411/2/2014

Thanks for your attention!

69SAS EMBO Course 201411/2/2014