Big Data Analytics in Healthcare: Promise and Potentialuocpga.gr/ddw1/tsiknakis_ddw.pdf · Pharma -...

38
1 1 Big Data Analytics in Healthcare: Promise and Potential Καθ. Μ. Τσικνάκης Τμ. Μηχανικών Πληροφορικής, ΤΕΙ Κρήτης & Επισκέπτης Καθηγητής, CBML, ΙΠ-ΙΤΕ

Transcript of Big Data Analytics in Healthcare: Promise and Potentialuocpga.gr/ddw1/tsiknakis_ddw.pdf · Pharma -...

Page 1: Big Data Analytics in Healthcare: Promise and Potentialuocpga.gr/ddw1/tsiknakis_ddw.pdf · Pharma - Drug repositioning Key application area of IR and TM. Scope of Drug repositioning:

1 1

Big Data Analytics in Healthcare: Promise and Potential

Καθ. Μ. Τσικνάκης Τμ. Μηχανικών Πληροφορικής, ΤΕΙ Κρήτης

& Επισκέπτης Καθηγητής, CBML, ΙΠ-ΙΤΕ

Page 2: Big Data Analytics in Healthcare: Promise and Potentialuocpga.gr/ddw1/tsiknakis_ddw.pdf · Pharma - Drug repositioning Key application area of IR and TM. Scope of Drug repositioning:

2 2

Hypothesis vs Data driven Science.

Contemporary Data Explosion.

The Data-Driven Discovery Science.

Promises and Potential. Clinical Care

Understanding disease etiology

Pharmaceutical industry

Challenges & Pitfalls.

Opportunities for Computer and Data Scientists.

Conclusions.

Contents & Structure

Saturday, June 11, 2016 1st Data Workshop "Big Data World", Heraklion, Crete

Page 3: Big Data Analytics in Healthcare: Promise and Potentialuocpga.gr/ddw1/tsiknakis_ddw.pdf · Pharma - Drug repositioning Key application area of IR and TM. Scope of Drug repositioning:

3 3

As scientists, we have been raised with the view that the best research is hypothesis-driven.

This belief, part of our scientific culture, is passed from one generation of scientists to the next and has become deeply ingrained.

When training graduate students and fellows, evaluating dissertations and overseeing the writing of scientific papers, we strive to communicate the importance of enunciating a clear

hypothesis, defining its scientific antecedents, describing the best path for testing it and reporting on the results in terms of the original hypothesis.

Hypothesis Driven Research

Saturday, June 11, 2016 1st Data Workshop "Big Data World", Heraklion, Crete

Page 4: Big Data Analytics in Healthcare: Promise and Potentialuocpga.gr/ddw1/tsiknakis_ddw.pdf · Pharma - Drug repositioning Key application area of IR and TM. Scope of Drug repositioning:

4 4

Hypothesis vs Data driven science

The question is not whether hypothesis-driven research should be one way to conduct science but whether it should be the only way.

Data driven Science*.

Data-driven science is not necessarily new.

A compelling argument can be made that the astronomer Tycho Brahe and his assistant Johannes Kepler were doing data-driven science, at least by the scale of their time.

Kepler published the Rudolphine Tables in 1627, some twenty-six years after Brahe's death.

The tables, a catalog of stars and planets, were largely based on Brahe's observations, which were considered to be the most accurate and detailed of the time.

*Ref.: Peter Murray-Rust, "Data-Driven Science: A Scientist's View," NSF/JISC Repositories Workshop position paper, April 10, 2007, <http://www.sis.pitt.edu/~repwkshop/papers/murray.html>.

Saturday, June 11, 2016 1st Data Workshop "Big Data World", Heraklion, Crete

Page 5: Big Data Analytics in Healthcare: Promise and Potentialuocpga.gr/ddw1/tsiknakis_ddw.pdf · Pharma - Drug repositioning Key application area of IR and TM. Scope of Drug repositioning:

5 5

Every 60 seconds there are:

72 hours of footage uploaded in YouTube

216,000 Instagram posts

204,000,000 emails sent

80% of data growth is videos, images and documents.

90% of data generated is unstructured

Including tweets, photos, customer purchase orders

Some figures

Page 6: Big Data Analytics in Healthcare: Promise and Potentialuocpga.gr/ddw1/tsiknakis_ddw.pdf · Pharma - Drug repositioning Key application area of IR and TM. Scope of Drug repositioning:

6 6

Data Volume

Every day, we create 2.5 quintillion (πεντάκις εκατομμύρια) bytes of data.

A full 90% of all the data in the world has been generated over the last two years.

This data comes from everywhere: sensors used to gather climate information, posts to social media, digital pictures and videos, and cell phone GPS signals to name a few.

This data is big data.

Saturday, June 11, 2016 1st Data Workshop "Big Data World", Heraklion, Crete

Page 7: Big Data Analytics in Healthcare: Promise and Potentialuocpga.gr/ddw1/tsiknakis_ddw.pdf · Pharma - Drug repositioning Key application area of IR and TM. Scope of Drug repositioning:

7 7

Dimensions of Big Data A collection of large and

complex data sets which are difficult to process using common database management tools or traditional data processing applications.

The challenges include capturing, storing, searching, sharing & analyzing.

“Big data refers to the tools, processes and procedures allowing an organization to create, manipulate, and manage very large data sets and storage facilities” – according to zdnet.com

The four dimensions (V’s) of Big Data

The 5th is VALUE

BIG DATA

Volume

Veracity

Variety Velocity

Saturday, June 11, 2016 1st Data Workshop "Big Data World", Heraklion, Crete

Page 8: Big Data Analytics in Healthcare: Promise and Potentialuocpga.gr/ddw1/tsiknakis_ddw.pdf · Pharma - Drug repositioning Key application area of IR and TM. Scope of Drug repositioning:

8 8

Exponential growth of data (size & importance)

Parallels between the growth in size and decay in value of large heterogeneous datasets.

The horizontal axis represents time, whereas the vertical axis shows the value of data.

As we acquire more data at an ever faster rate, its size and value exponentially increase (black curve).

The color curves indicate the exponential decay of the value of data from the point of its fixation (becoming static).

Saturday, June 11, 2016 1st Data Workshop "Big Data World", Heraklion, Crete

Page 9: Big Data Analytics in Healthcare: Promise and Potentialuocpga.gr/ddw1/tsiknakis_ddw.pdf · Pharma - Drug repositioning Key application area of IR and TM. Scope of Drug repositioning:

9 9

Volume of Biomedical Data

The healthcare industry historically has generated large amounts of data.

Reports say that data from the U.S. healthcare system alone reached, in 2011, 150 exabytes.

At this rate of growth, big data for U.S. healthcare will soon reach the zettabyte scale and, not long after, the yottabyte.

Note: 1 ZB = 1000 exabytes = 1 million petabytes = 1 billion terabytes = 1 trillion gigabytes.

Kaiser Permanente, the California-based health network, which has more than 9 million members, is believed to have between 26.5 and 44 petabytes of potentially rich data from EHRs, including images and annotations.

IHTT: Transforming Health Care through Big Data Strategies for leveraging big data in the health care industry; 2013. http://ihealthtran.com/ wordpress/2013/03/iht%C2%B2-releases- big-data-research-reportdownload- today/

Saturday, June 11, 2016 1st Data Workshop "Big Data World", Heraklion, Crete

Page 10: Big Data Analytics in Healthcare: Promise and Potentialuocpga.gr/ddw1/tsiknakis_ddw.pdf · Pharma - Drug repositioning Key application area of IR and TM. Scope of Drug repositioning:

10 10

Variety of Biomedical Data

Big data in healthcare is overwhelming not only because of its volume but also because of the diversity of data types and the speed at which it must be managed. Molecular

Imaging

Organ

EHR (person) related data

Population

Ref.: Frost & Sullivan: Drowning in Big Data? Reducing Information Technology Complexities and Costs for Healthcare Organizations.

Saturday, June 11, 2016 1st Data Workshop "Big Data World", Heraklion, Crete

Page 11: Big Data Analytics in Healthcare: Promise and Potentialuocpga.gr/ddw1/tsiknakis_ddw.pdf · Pharma - Drug repositioning Key application area of IR and TM. Scope of Drug repositioning:

11 11

Coding of data.

Effectively integrating and efficiently analyzing various forms of healthcare data over a period of time can answer many of the impending healthcare problems.

EHR Data: Collection, Coding and Analysis

Saturday, June 11, 2016 1st Data Workshop "Big Data World", Heraklion, Crete

Jensen, Peter B., Lars J. Jensen, and Soren Brunak. "Mining electronic health records: towards better research applications and clinical care." Nature Reviews Genetics (2012).

Page 12: Big Data Analytics in Healthcare: Promise and Potentialuocpga.gr/ddw1/tsiknakis_ddw.pdf · Pharma - Drug repositioning Key application area of IR and TM. Scope of Drug repositioning:

12 12

Biomedical (Genomic) Data Volume

Genomics poses some of the same challenges as astronomy, atmospheric science, crop science & particle physics.

Biologists worn that storing and processing genome data will exceed the computing challenges of running YouTube and Twitter, biologists warn.

The report says that this outstrips YouTube’s projected annual

storage needs of 1,2 exabytes of video by 2025 and

Twitter’s projected 1,7 petabytes per year.

It even exceeds the 1 exabyte per year projected for what will be the world’s largest astronomy project, the Square Kilometer Array, to be sited in South Africa and Australia.

SOURCE: EMBL–EBI

Saturday, June 11, 2016 1st Data Workshop "Big Data World", Heraklion, Crete

Ref.: Frost & Sullivan: Drowning in Big Data? Reducing Information Technology Complexities and Costs for Healthcare Organizations.

Page 13: Big Data Analytics in Healthcare: Promise and Potentialuocpga.gr/ddw1/tsiknakis_ddw.pdf · Pharma - Drug repositioning Key application area of IR and TM. Scope of Drug repositioning:

13 13

Flagship projects: The 100,000 Genome project

Our genome contains all our 20,000 genes.

It is all 3,2 billion letters of our DNA.

One sequenced genome equals 2 billion bytes or 200 GBytes of data.

It is estimated that half of all Britons will get some form of cancer at some point in their lives.

A rare disease is one that affects 1 in 2,000 or less.

Saturday, June 11, 2016 1st Data Workshop "Big Data World", Heraklion, Crete

Source: http://www.genomicsengland.co.uk/ the-100000-genomes-project-by-numbers/

Plan: 100,000 genomes project.

There are over 100 rare diseases included in the effort.

70,000 patients and their families.

21 petabytes of data 1 petabyte of music would

take 2,000 years to play on an MP3 player.

Page 14: Big Data Analytics in Healthcare: Promise and Potentialuocpga.gr/ddw1/tsiknakis_ddw.pdf · Pharma - Drug repositioning Key application area of IR and TM. Scope of Drug repositioning:

14 14

Biomedical Text – mining and analysis

Medline currently contains 18 Million abstracts (scientific papers).

An average of about 40,000 to 50,000 abstracts are added every month.

To turn these data into medical or biological insight or interpretation of the collected data remains a key challenge.

Two IT fields have a lot to offer and plays a promising role Information Retrieval (IR) and Text mining (TM).

IR is concerned with the automatic identification of relevant documents from large text collections.

TM is the application of techniques from machine learning in conjunction with natural language processing, and statistical/mathematical approaches to extract useful knowledge from text.

Both have been applied successfully* to various problems such as: Intelligent Information Retrieval Biomedical text sub-classification

and clustering Biomedical concept identification Concept relation extraction, etc.

Saturday, June 11, 2016 1st Data Workshop "Big Data World", Heraklion, Crete

*Ref: P. Sfakianaki, M. Tsiknakis, et al, Semantic biomedical resource discovery: a Natural Language Processing framework, BMC Medical Informatics and Decision Making201515:77

Page 15: Big Data Analytics in Healthcare: Promise and Potentialuocpga.gr/ddw1/tsiknakis_ddw.pdf · Pharma - Drug repositioning Key application area of IR and TM. Scope of Drug repositioning:

15 15

Watson is a collection of overlapping reasoning algorithms that address specific portions of the pipeline used for problem understanding and problem-solving.

For example, a specific instances of Watson might have specialized algorithms for: understanding the question

• Natural Language Processing, query expansion with synonyms, dictionaries, ontologies, language translation, speech translation, spelling correction.

making hypotheses • Indexing a corpus of data, searching for relevant passages, concept

annotations, passage expansion, passage filtering, passage scoring.

answer selection and scoring • Deep parsing, semantic matching, answer similarity, lexical matching,

temporal reasoning, geospatial reasoning, negation, knowledge graphs.

machine learning • Logistical regression, Bayesian networks, similarity learning.

Natural Language Processing Analytics for Unstructured Data (IBM’s Watson)

Saturday, June 11, 2016 1st Data Workshop "Big Data World", Heraklion, Crete

Page 16: Big Data Analytics in Healthcare: Promise and Potentialuocpga.gr/ddw1/tsiknakis_ddw.pdf · Pharma - Drug repositioning Key application area of IR and TM. Scope of Drug repositioning:

16 16

Pharma - Drug repositioning

Key application area of IR and TM.

Scope of Drug repositioning: identifying and developing new uses for existing drugs.

Despite enormous increases in spending in novel technologies over the last several years, R&D productivity has actually decreased since the mid-1990s, as measured either by the number of new drugs approved per year.

Saturday, June 11, 2016 1st Data Workshop "Big Data World", Heraklion, Crete

Page 17: Big Data Analytics in Healthcare: Promise and Potentialuocpga.gr/ddw1/tsiknakis_ddw.pdf · Pharma - Drug repositioning Key application area of IR and TM. Scope of Drug repositioning:

17 17

Drug Repositioning: Methods

Saturday, June 11, 2016 1st Data Workshop "Big Data World", Heraklion, Crete

ICT have a key role to play

Page 18: Big Data Analytics in Healthcare: Promise and Potentialuocpga.gr/ddw1/tsiknakis_ddw.pdf · Pharma - Drug repositioning Key application area of IR and TM. Scope of Drug repositioning:

18 18

A very innovative Greek biotechnology company.

Leading the drug repositioning market through exploitation of their literature mining SW platform.

Exploitation Cases – BIOVISTA S.A.

Saturday, June 11, 2016 1st Data Workshop "Big Data World", Heraklion, Crete

http://www.biovista.com/

Page 19: Big Data Analytics in Healthcare: Promise and Potentialuocpga.gr/ddw1/tsiknakis_ddw.pdf · Pharma - Drug repositioning Key application area of IR and TM. Scope of Drug repositioning:

19 19

Exploiting big social network data

Saturday, June 11, 2016 1st Data Workshop "Big Data World", Heraklion, Crete

Social Determinants of Health

Page 20: Big Data Analytics in Healthcare: Promise and Potentialuocpga.gr/ddw1/tsiknakis_ddw.pdf · Pharma - Drug repositioning Key application area of IR and TM. Scope of Drug repositioning:

20 20

Need to evaluate the feasibility and potential of exploring Social Determinants of Health (SDH1) and Community Vital Signs2 (community and geographic determinants of health) to build new hypotheses and help predict and improve patient outcomes.

Methods for information extracted from Social Networks3 can strengthen the analysis made with the more reliable and coherent

cohort data and can provide relevant hypotheses to be further verified with the cohort

datasets.

Social Determinants of Health

Saturday, June 11, 2016 1st Data Workshop "Big Data World", Heraklion, Crete

1 https://www.healthypeople.gov/2020/topics-objectives/topic/social-determinants-of-health 2 Andrew W Bazemore et al,, “Community Vital Signs”: Incorporating geocoded social

determinants into electronic records to promote patient and population health.

http://dx.doi.org/10.1093/jamia/ocv088 ocv088. First published online: 13 July 2015. 3 Bian J, Topaloglu U, Yu F, Yu F: Towards Large-scale Twitter Mining for Drug related Adverse

Events. Maui, Hawaii: SHB; 2012.

Page 21: Big Data Analytics in Healthcare: Promise and Potentialuocpga.gr/ddw1/tsiknakis_ddw.pdf · Pharma - Drug repositioning Key application area of IR and TM. Scope of Drug repositioning:

21 21

Multi-scale modelling

Biological systems span many orders of magnitude through the scales in a continuous way, from the smallest microscopic scales up to the largest macroscopic ones.

The sequence from the genome, proteome, metabolome, physiome to health comprises multi-scale, multi-science systems.

In many cases, we can select an appropriate scale at which we wish to study a natural system.

The history of science has shown how fruitful this approach has been.

In recent years the computational biology community has developed extremely powerful methods to model and simulate fundamental processes of a natural system on a multitude of separate scales.

The wealth of experimental data that has become available has made such in silico experimenting a viable methodology, which should allow for testing hypotheses and formulating predictions to be further tested in in vitro or in vivo studies

Saturday, June 11, 2016 1st Data Workshop "Big Data World", Heraklion, Crete

Page 22: Big Data Analytics in Healthcare: Promise and Potentialuocpga.gr/ddw1/tsiknakis_ddw.pdf · Pharma - Drug repositioning Key application area of IR and TM. Scope of Drug repositioning:

22 22

Data-driven modelling of biological multi-scale processes

Saturday, June 11, 2016 1st Data Workshop "Big Data World", Heraklion, Crete

Predictive modeling of biological processes and drugs becomes significantly more sophisticated and widespread.

By leveraging the diversity of available molecular and clinical data, predictive modeling could help identify new potential-

candidate molecules with a high probability of being successfully developed into drugs that act on biological targets safely and effectively.

Individualize therapy.

ViroLab (http://www.virolab.org/) is a multi-scale modelling, simulation and datamining environment for infectious Diseases.

Page 23: Big Data Analytics in Healthcare: Promise and Potentialuocpga.gr/ddw1/tsiknakis_ddw.pdf · Pharma - Drug repositioning Key application area of IR and TM. Scope of Drug repositioning:

23 23

Selection of well-known multi-scale models

Biological scales captured by the model, mathematical modelling approaches, topics and keywords.

Saturday, June 11, 2016 1st Data Workshop "Big Data World", Heraklion, Crete

Page 24: Big Data Analytics in Healthcare: Promise and Potentialuocpga.gr/ddw1/tsiknakis_ddw.pdf · Pharma - Drug repositioning Key application area of IR and TM. Scope of Drug repositioning:

24 24

Examples: Computational Horizons in Cancer (CHIC) Computational

Horizons In Cancer (CHIC): Developing Meta- and Hyper-Multiscale Models and Repositories for In Silico Oncology.

7th Framework Programme of the European Commission - ICT - Large-scale Integrating Project (IP)

http://chic-vph.eu/project/

Saturday, June 11, 2016 1st Data Workshop "Big Data World", Heraklion, Crete

Page 25: Big Data Analytics in Healthcare: Promise and Potentialuocpga.gr/ddw1/tsiknakis_ddw.pdf · Pharma - Drug repositioning Key application area of IR and TM. Scope of Drug repositioning:

25 25

Use of semantics for model retrieval, alignment and integration

Demanding computational issues

Marvin Schulz, et al, Retrieval, alignment, and clustering of computational models based on semantic annotations, Molecular Systems Biology 7; Article number 512; doi:10.1038/msb.2011.41

Saturday, June 11, 2016 1st Data Workshop "Big Data World", Heraklion, Crete

Page 26: Big Data Analytics in Healthcare: Promise and Potentialuocpga.gr/ddw1/tsiknakis_ddw.pdf · Pharma - Drug repositioning Key application area of IR and TM. Scope of Drug repositioning:

26 26

PM-04–2016: Networking and optimising the use of population and patient cohorts at EU level

Cohort autonomy – legal framework.

Efficient cohort exploration.

Easy cohort definition and sharing with visual interfaces.

Cohort refinement and expansion. Clinical domain experts should be able to easily constrain and/or expand cohorts based on discovered findings as part of their exploration.

Flexible visualization.

Flexible and advanced analytics.

Iterative analysis. The above requirements should be supported within an iterative process that allows refinement and exploration during an open-ended investigation.

Advanced computational infrastructure. The above complex requirements and analytical pipelines demand that an appropriate high-performing computational and storage infrastructure is at place.

Participating cohorts, population and disease registries

Saturday, June 11, 2016 1st Data Workshop "Big Data World", Heraklion, Crete

Page 27: Big Data Analytics in Healthcare: Promise and Potentialuocpga.gr/ddw1/tsiknakis_ddw.pdf · Pharma - Drug repositioning Key application area of IR and TM. Scope of Drug repositioning:

27 27

Efficient exploration and visualization

Saturday, June 11, 2016 1st Data Workshop "Big Data World", Heraklion, Crete

Retrospective use of clinical data by clinical researchers covers a wide variety of research investigations including assessment of

clinical trial feasibility,

generation and retrospective validation of research hypotheses,

data exploration, quality assessment, etc.

Demand for advanced visualisation tools

Page 28: Big Data Analytics in Healthcare: Promise and Potentialuocpga.gr/ddw1/tsiknakis_ddw.pdf · Pharma - Drug repositioning Key application area of IR and TM. Scope of Drug repositioning:

28 28

Advanced, secure computational framework The type and complexity of the

analytical and knowledge discovery services pose huge demands on the computing platform.

Need to establish elastically scalable big data clusters that can respond to varying workload demands.

Integrating Hadoop (http://hadoop.apache.org/) and Spark (http://spark.apache.org) data processing on OpenStack (https://www.openstack.org/).

Ref.: DePristo MA, et al: A framework for variation discovery and genotyping using NGS data. Nat Genet 2011, 43:491–8.

Saturday, June 11, 2016 1st Data Workshop "Big Data World", Heraklion, Crete

Page 29: Big Data Analytics in Healthcare: Promise and Potentialuocpga.gr/ddw1/tsiknakis_ddw.pdf · Pharma - Drug repositioning Key application area of IR and TM. Scope of Drug repositioning:

29 29

OHDSI – Open Source Big Data Analytics in Healthcare

Variety of novel computational and analytical tools

International Flagship Initiatives - OHDSI

Saturday, June 11, 2016 1st Data Workshop "Big Data World", Heraklion, Crete

Page 30: Big Data Analytics in Healthcare: Promise and Potentialuocpga.gr/ddw1/tsiknakis_ddw.pdf · Pharma - Drug repositioning Key application area of IR and TM. Scope of Drug repositioning:

30 30

Healthcare Analytics in the Electronic Era

Old way: Data are expensive and small Input data are from clinical trials, which is small

and costly Modeling effort is small since the data is limited

Big Data era: Data are cheap and large Broader patient population Noisy data Heterogeneous data Diverse scale Longitudinal records

Saturday, June 11, 2016 1st Data Workshop "Big Data World", Heraklion, Crete

Page 31: Big Data Analytics in Healthcare: Promise and Potentialuocpga.gr/ddw1/tsiknakis_ddw.pdf · Pharma - Drug repositioning Key application area of IR and TM. Scope of Drug repositioning:

31 31

Healthcare Institutions are moving to Big Data Architectures

Enterprise Big Data Architectures

Saturday, June 11, 2016 1st Data Workshop "Big Data World", Heraklion, Crete

Page 32: Big Data Analytics in Healthcare: Promise and Potentialuocpga.gr/ddw1/tsiknakis_ddw.pdf · Pharma - Drug repositioning Key application area of IR and TM. Scope of Drug repositioning:

32 32

Big Data Landscape

Saturday, June 11, 2016 1st Data Workshop "Big Data World", Heraklion, Crete

The big data landscape for computer and information engineers

Page 33: Big Data Analytics in Healthcare: Promise and Potentialuocpga.gr/ddw1/tsiknakis_ddw.pdf · Pharma - Drug repositioning Key application area of IR and TM. Scope of Drug repositioning:

33 33

Over the past century, scientific advances in medicine have generally been made using a “frequentist” approach to statistical analysis:

Samples of populations are studied and the results from the samples are extrapolated to estimate the effects of the intervention being studied.

For most types of experiment, sampling data is sufficient to build an effective picture of the entire dataset and, statistically, we can give high levels of accuracy to predictions based on relatively small samples.

Data collected in this way is often of very high quality.

To ensure the sample is representative and accurate, the data is collected and ‘cleaned’ with great care.

This extra care is often very expensive, however, and over the last few decades we have seen the costs of running large randomized control trials spiral upwards.

The science of Big Data

Saturday, June 11, 2016 1st Data Workshop "Big Data World", Heraklion, Crete

Page 34: Big Data Analytics in Healthcare: Promise and Potentialuocpga.gr/ddw1/tsiknakis_ddw.pdf · Pharma - Drug repositioning Key application area of IR and TM. Scope of Drug repositioning:

34 34

Big Data offers a potential solution to this issue. Although data produced from such sources as

social networking communities, EHR systems, and wearable devices

are generally of much lower quality than data carefully collected by researchers looking to answer specific questions, the sheer volume of the data may outweigh their messiness. In addition, there is also a trend to higher quality ‘big data’ collection

such as the data produced in genomic analysis and structured data that can be generated from standard-compliant EHR systems.

As the percentage of the population being sampled approaches 100%, messy data can have greater predictive power than highly cleaned and carefully collected data that might only be a sample of 1% of the target population for the researcher.

The quantity of data alters the way and approaches used to relate, utilize, and understand data.

The science of Big Data

Saturday, June 11, 2016 1st Data Workshop "Big Data World", Heraklion, Crete

Page 35: Big Data Analytics in Healthcare: Promise and Potentialuocpga.gr/ddw1/tsiknakis_ddw.pdf · Pharma - Drug repositioning Key application area of IR and TM. Scope of Drug repositioning:

35 35

While researchers are still debating the definitions and boundaries of Big Data in health, benefits of health-related Big Data have been demonstrated in several areas so far, namely: design new therapies and candidate drugs,

Prevention of disease,

Identification of modifiable risk factors for disease, and

designing interventions for health behavior change.

Benefits

Saturday, June 11, 2016 1st Data Workshop "Big Data World", Heraklion, Crete

Page 36: Big Data Analytics in Healthcare: Promise and Potentialuocpga.gr/ddw1/tsiknakis_ddw.pdf · Pharma - Drug repositioning Key application area of IR and TM. Scope of Drug repositioning:

36 36

Page 37: Big Data Analytics in Healthcare: Promise and Potentialuocpga.gr/ddw1/tsiknakis_ddw.pdf · Pharma - Drug repositioning Key application area of IR and TM. Scope of Drug repositioning:

37 37

Big data denotes our capacity to gain insights from (in relative terms!) large amounts of data that we could not have had by just looking at samples.

Our difficulty in working with data has shaped our methods in the small data age.

As these limitations with respect to data diminish, we will have to rethink and adjust our scientific methods.

In return, we will gain a wealth of new insights, perhaps leading towards a new golden era of scientific discovery.

The power of Big Data demands, however, that we also are aware of its limitations and the significant dangers of abusing it.

Conclusions

Saturday, June 11, 2016 1st Data Workshop "Big Data World", Heraklion, Crete

Ref: Mayer-Schönberger V, Cukie K. Big data: A revolution that will transform how we live, work, and think. Houghton Mifflin Harcourt; 2013. Chapter 2.

Page 38: Big Data Analytics in Healthcare: Promise and Potentialuocpga.gr/ddw1/tsiknakis_ddw.pdf · Pharma - Drug repositioning Key application area of IR and TM. Scope of Drug repositioning:

38 38

Q&A