Painting the Future of Big Data with Apache Spark and MongoDB

46
James Kerr Senior Solutions Architect [email protected] Conquering Data Proliferation

Transcript of Painting the Future of Big Data with Apache Spark and MongoDB

Page 1: Painting the Future of Big Data with Apache Spark and MongoDB

James KerrSenior Solutions Architect

[email protected]

Conquering Data Proliferation

Page 2: Painting the Future of Big Data with Apache Spark and MongoDB

2

Part 2 In The Data Management Series

Data integration

Capture data changes

Engaging with your data

From RelationalTo MongoDB

ConqueringData Proliferation

BulletproofData Management

çΩ

Part1

Part2

Part3

Page 3: Painting the Future of Big Data with Apache Spark and MongoDB

3

Agenda

• Today's Problem• Systems of Engagement• Single View of…• Changing Data• Summary

Page 4: Painting the Future of Big Data with Apache Spark and MongoDB

Today's Problem

Page 5: Painting the Future of Big Data with Apache Spark and MongoDB

5

Page 6: Painting the Future of Big Data with Apache Spark and MongoDB

6

Result

• Data walled off in "silos"• Can't get a complete picture • Have to "swivel chair" system to system• Hard to find new avenues to add value• Frustrated ops• Frustrated customers

Page 7: Painting the Future of Big Data with Apache Spark and MongoDB

7

Example

• 20+ million Veterans in the US today• 250,000+ employees at Veterans Affairs• $3.9 billion for IT in 2015 budget

• What happens when a Veteran has to change their address with the VA?

• How does a doctor see a single view of a Veteran's health record?

Page 8: Painting the Future of Big Data with Apache Spark and MongoDB

Systems of Engagement

Page 9: Painting the Future of Big Data with Apache Spark and MongoDB

9

Next Big Wave of Change

Today's Systems of Record were yesterday's Systems of Engagement!

Enterprise IT Transition From• Systems of Record

To the Next Stage• Systems of Engagement

Page 10: Painting the Future of Big Data with Apache Spark and MongoDB

10

Definition

• Incorporate technologies which encourage peer interactions

• More decentralized• More options for infrastructure especially cloud• Enable new / faster interactions

Page 11: Painting the Future of Big Data with Apache Spark and MongoDB

11

Notional Architecture

Systems of Engagement

Dat

a S

ervi

ces

Data Processing Integration,

Analytics, etc.

Systems of Record

Master Data

Raw Data

Integrated Data

Page 12: Painting the Future of Big Data with Apache Spark and MongoDB

12

Many Complexities to Tackle

• Data Extraction (ETL)• Change Data Capture (CDC)• Data Governance• Data Lineage

– Versioning– Merging changes

• Security / Entitlements

Page 13: Painting the Future of Big Data with Apache Spark and MongoDB

13

Focus for Today

• Data Extraction (ETL)• Change Data Capture (CDC)• Data Governance• Data Lineage

– Versioning– Merging changes

• Security / Entitlements

Page 14: Painting the Future of Big Data with Apache Spark and MongoDB

Getting Started

Page 15: Painting the Future of Big Data with Apache Spark and MongoDB

15

Don't Boil the Ocean

• Information is often spread across multiple systems of record

• Start with a read-only view of that information• Target high value/impact data – "moments of

engagement"

Page 16: Painting the Future of Big Data with Apache Spark and MongoDB

16

Example – Single View of a Health Record

• Veteran's view• Doctor's view• Case worker's view

Page 17: Painting the Future of Big Data with Apache Spark and MongoDB

17

Single View Architecture

Systems of Engagement

Dat

a S

ervi

ces

Data Processing Integration,

Analytics, etc.

Systems of Record

Master Data

Raw Data

Integrated Data

ETLrecord

record

Page 18: Painting the Future of Big Data with Apache Spark and MongoDB

18

• Dynamic schema• Rich querying• Aggregation framework• High scale/performance• Auto-sharding• Map-reduce capability (Native MR or Hadoop Connector)• Enterprise Security Features

Single View – Why MongoDB?

Page 19: Painting the Future of Big Data with Apache Spark and MongoDB

19

Systems of Record Data Model

• Continuity of Care (CCR) XML docs• Pulled some examples from

http://googlehealthsamples.googlecode.com/svn/trunk/CCR_samples

... <Immunizations> <Immunization> <CCRDataObjectID>BB0022</CCRDataObjectID> <DateTime> <Type> <Text>Start date</Text> </Type> <ExactDateTime>1998-06-13T05:00:00Z</ExactDateTime> </DateTime> <Source> <Actor> <ActorID>Jane Smith</ActorID> <ActorRole> <Text>Ordering clinician</Text> </ActorRole> </Actor> </Source>...

Page 20: Painting the Future of Big Data with Apache Spark and MongoDB

20

Systems of Record Data Model

... <Medications> <Medication> <CCRDataObjectID>52</CCRDataObjectID> <DateTime> <Type> <Text>Prescription Date</Text> </Type> <ExactDateTime>2007-03-09T12:00:00Z</ExactDateTime> </DateTime> <Type> <Text>Medication</Text> </Type> <Source> <Actor> <ActorID>Rx History Supplier</ActorID> </Actor> </Source> <Product> <ProductName> <Text>TIZANIDINE HCL 4 MG TABLET TEV</Text> <Code> <Value>-1</Value>

<CodingSystem>omi-coding</CodingSystem> <Version>2005</Version>

...

Page 21: Painting the Future of Big Data with Apache Spark and MongoDB

21

Engagement Data Model

• Leverage dynamic schema / flexible data model• Use an envelope/wrapper pattern

Source Data

Master Data / Common Data Model

Metadata

Integrated Data

Metadata

Page 22: Painting the Future of Big Data with Apache Spark and MongoDB

22

Data Flow

1. Read most recent CCRs from each source system

2. Create a source document for each CCR in our system of engagement database

1. Transform XML to JSON for the source data

2. Record the system and date in the metadata

3. Pull out the patient's identifying information to the common data

4. Generate an Id for the raw file

3. Store the original CCR XML into GridFS

4. After each source document is created, update the integrated document for the patient

Page 23: Painting the Future of Big Data with Apache Spark and MongoDB

23

Engagement Data Model - Metadata

{

_id : ObjectId("556b92b83f7e775b8e92b30a"),

meta : {

system : "EHR1",

lastUpdate : ISODate(...)

...

},

common : { ... },

source : { ... }

raw_id : "..."

}

Page 24: Painting the Future of Big Data with Apache Spark and MongoDB

24

Engagement Data Model - Source

{

_id : ObjectId("556b92b83f7e775b8e92b30a"),

...

source : {

...

Immunizations : { Immunization : { CCRDataObjectID :"BB0022", DateTime : { Type : { Text :"Start date" }, ExactDateTime :"1998-06-13T05:00:00Z" }, Source : {

Actor : { ActorID :"Jane Smith", ActorRole : {

Text :"Ordering clinician" } } }, ...

},

...

}

Page 25: Painting the Future of Big Data with Apache Spark and MongoDB

25

Engagement Data Model - Common

{

_id : ObjectId("556b92b83f7e775b8e92b30a"),

...

common : {

patient : "D6E5D510-592D-C613-DB46..."

},

...

}

Page 26: Painting the Future of Big Data with Apache Spark and MongoDB

26

Engagement Data Model - Integrated

{

_id : ObjectId("556b92b83f7e775b8e92b30d"),

...

meta : {

lastUpdate : ISODate(...)

integrated : [

{ _id : ObjectId("...a"),

{ _id : ObjectId("...b")

]

},

common : { ... }

...

}

Page 27: Painting the Future of Big Data with Apache Spark and MongoDB

27

Engagement Data Model - Integrated

{

_id : ObjectId("556b92b83f7e775b8e92b30d"),

...

common : {

patient : "D6E5D510-592D-C613-DB46...",

CCRs : [

{

...

Medication : {

Product : {

ProductName :

"TIZANIDINE HCL 4 MG TABLET TEV"

}

}

...

},

{

...

Immunizations : { ... },

...

}

]

}

...

}

Page 28: Painting the Future of Big Data with Apache Spark and MongoDB

Engage!

Page 29: Painting the Future of Big Data with Apache Spark and MongoDB

29

Single View Enables New Interactions

• Deliver faster• Deliver to new applications (mobile, etc.)• Improve services• New analytics

Page 30: Painting the Future of Big Data with Apache Spark and MongoDB

30

Changing Data

• Now that data is easy to get to, users will want to make changes

• With single view, can change data in the source systems of record

• Remember the change of address scenario?

Page 31: Painting the Future of Big Data with Apache Spark and MongoDB

31

Example – Change of Address

• Enter in different systems• Call different parts of the organization• What if you have dependents that

live with you?

Page 32: Painting the Future of Big Data with Apache Spark and MongoDB

32

Capture Data Changes

Systems of Engagement

Dat

a S

ervi

ces

Data Processing Integration,

Analytics, etc.

Systems of Record

Master Data

Raw Data

Integrated Data

ETL

Bus

Apache Kafka

record

record

record

Page 33: Painting the Future of Big Data with Apache Spark and MongoDB

33

Engagement Data Model - Metadata

{

_id : ObjectId("556c1122c9c8f48313553be5"),

meta : {

system : "PatientRecords",

lastUpdate : ISODate(...),

version : 2,

lineage : { ... },

...

},

common : { ... },

source : { ... }

}

Page 34: Painting the Future of Big Data with Apache Spark and MongoDB

34

Engagement Data Model - Source

{

_id : ObjectId("556b92b83f7e775b8e92b30a"),

...

source : {

patientId : "D6E5D510-592D-C613-DB46..."

address1 : "John Smith",

address2 : null,

city : "New York",

state : "NY",

zip : "10007"

},

...

}

Page 35: Painting the Future of Big Data with Apache Spark and MongoDB

35

Engagement Data Model - Common

{

_id : ObjectId("556b92b83f7e775b8e92b30a"),

...

common : {

patient : "D6E5D510-592D-C613-DB46...",

address : {

addr1 : "John Smith",

city : "New York",

state : "NY",

zip : "10007"

}

},

...

}

Page 36: Painting the Future of Big Data with Apache Spark and MongoDB

36

Systems of Record Data Model

• Address records can be in different systems• Each system can be notified of the change to the record

Page 37: Painting the Future of Big Data with Apache Spark and MongoDB

37

Update Process

1. User accesses an application to change their address

2. User updates their address in the System of Engagement

3. The address change is broadcast to any Systems of Record that have registered

4. An adapter applies the address change to the System of Record in an application-specific manor

Page 38: Painting the Future of Big Data with Apache Spark and MongoDB

38

Tracking Changes

• Add basic document versioning to track what changed when

• Prefer the separate "current" and "history" collections approach– current contains the last updated version– history contains all previous versions

• Can query history to see the lineage

Page 39: Painting the Future of Big Data with Apache Spark and MongoDB

39

Engagement Data Model – Current

{

_id : ObjectId("556c1122c9c8f48313553be5"),

meta : {

system : "PatientRecords",

lastUpdate : ISODate(...),

version : 2,

lineage : {

event : "update",

source : "ProfileApp",

},

...

},

...

}

Page 40: Painting the Future of Big Data with Apache Spark and MongoDB

40

Engagement Data Model - History

{

_id : { ObjectId(...), v : 1 },

meta : {

system : "PatientRecords",

lastUpdate : ISODate(...),

version : 1,

lineage : {

event : "update",

source : "PatientRecords",

},

...

},

...

}

Page 41: Painting the Future of Big Data with Apache Spark and MongoDB

41

Result – New Possibilities

• Change address in one place!• Other value-add processes can be triggered by changes• Example: Automated outreach

– heath and benefits centers in new location– help moving

• Extend address change to Veteran’s dependents

Page 42: Painting the Future of Big Data with Apache Spark and MongoDB

Next Steps

Page 43: Painting the Future of Big Data with Apache Spark and MongoDB

43

Keep going

• Keep adding valuable processes to improve or provide new services

• Phase out legacy if desired– Part 1 – From Relational to MongoDB

• Improve data governance– Part 3 – Bulletproof Data Management

• Reduce costs

Page 44: Painting the Future of Big Data with Apache Spark and MongoDB

44

• Systems of Engagement give users new ways to interact with data

• You can start small and add value quickly• MongoDB enables Systems of Engagement

– Dynamic schema– Fast, flexible querying, analysis, & aggregation– High performance– Scalable– Secure

Summary

Page 45: Painting the Future of Big Data with Apache Spark and MongoDB

45

• Systems of Engagement and the Future of Enterprise IT: A Sea Change in Enterprise IT http://www.aiim.org/futurehistory

• Systems of Engagement & the Enterprisehttp://www-01.ibm.com/software/ebusiness/jstart/systemsofengagement/

• Geoffrey Moore - The Future of Enterprise IThttp://www.slideshare.net/SAPanalytics/geoffrey-moore-the-future-of-enterprise-it

• Ask Asyahttp://askasya.com/post/trackversionshttp://askasya.com/post/revisitversions

References