Painting the Future of Big Data with Apache Spark and MongoDB

James KerrSenior Solutions Architect

[email protected]

Conquering Data Proliferation

2

Part 2 In The Data Management Series

Data integration

Capture data changes

Engaging with your data

From RelationalTo MongoDB

ConqueringData Proliferation

BulletproofData Management

çΩ

Part1

Part2

Part3

3

Agenda

• Today's Problem• Systems of Engagement• Single View of…• Changing Data• Summary

Today's Problem

6

Result

• Data walled off in "silos"• Can't get a complete picture • Have to "swivel chair" system to system• Hard to find new avenues to add value• Frustrated ops• Frustrated customers

7

Example

• 20+ million Veterans in the US today• 250,000+ employees at Veterans Affairs• $3.9 billion for IT in 2015 budget

• What happens when a Veteran has to change their address with the VA?

• How does a doctor see a single view of a Veteran's health record?

Systems of Engagement

9

Next Big Wave of Change

Today's Systems of Record were yesterday's Systems of Engagement!

Enterprise IT Transition From• Systems of Record

To the Next Stage• Systems of Engagement

10

Definition

• Incorporate technologies which encourage peer interactions

• More decentralized• More options for infrastructure especially cloud• Enable new / faster interactions

11

Notional Architecture


Dat

a S

ervi

ces

Data Processing Integration,

Analytics, etc.

Systems of Record

Master Data

Raw Data

Integrated Data

…

12

Many Complexities to Tackle

• Data Extraction (ETL)• Change Data Capture (CDC)• Data Governance• Data Lineage

– Versioning– Merging changes

• Security / Entitlements

13

Focus for Today

• Data Extraction (ETL)• Change Data Capture (CDC)• Data Governance• Data Lineage

– Versioning– Merging changes

• Security / Entitlements

Getting Started

15

Don't Boil the Ocean

• Information is often spread across multiple systems of record

• Start with a read-only view of that information• Target high value/impact data – "moments of

engagement"

16

Example – Single View of a Health Record

• Veteran's view• Doctor's view• Case worker's view

17

Single View Architecture


Dat

a S

ervi

ces


Analytics, etc.

Systems of Record

Master Data

Raw Data

Integrated Data

…

ETLrecord

record

18

• Dynamic schema• Rich querying• Aggregation framework• High scale/performance• Auto-sharding• Map-reduce capability (Native MR or Hadoop Connector)• Enterprise Security Features

Single View – Why MongoDB?

19

Systems of Record Data Model

• Continuity of Care (CCR) XML docs• Pulled some examples from

http://googlehealthsamples.googlecode.com/svn/trunk/CCR_samples

... <Immunizations> <Immunization> <CCRDataObjectID>BB0022</CCRDataObjectID> <DateTime> <Type> <Text>Start date</Text> </Type> <ExactDateTime>1998-06-13T05:00:00Z</ExactDateTime> </DateTime> <Source> <Actor> <ActorID>Jane Smith</ActorID> <ActorRole> <Text>Ordering clinician</Text> </ActorRole> </Actor> </Source>...




20


... <Medications> <Medication> <CCRDataObjectID>52</CCRDataObjectID> <DateTime> <Type> <Text>Prescription Date</Text> </Type> <ExactDateTime>2007-03-09T12:00:00Z</ExactDateTime> </DateTime> <Type> <Text>Medication</Text> </Type> <Source> <Actor> <ActorID>Rx History Supplier</ActorID> </Actor> </Source> <Product> <ProductName> <Text>TIZANIDINE HCL 4 MG TABLET TEV</Text> <Code> <Value>-1</Value>

<CodingSystem>omi-coding</CodingSystem> <Version>2005</Version>

...

21

Engagement Data Model

• Leverage dynamic schema / flexible data model• Use an envelope/wrapper pattern

Source Data

Master Data / Common Data Model

Metadata

Integrated Data

Metadata

22

Data Flow

1. Read most recent CCRs from each source system

2. Create a source document for each CCR in our system of engagement database

1. Transform XML to JSON for the source data

2. Record the system and date in the metadata

3. Pull out the patient's identifying information to the common data

4. Generate an Id for the raw file

3. Store the original CCR XML into GridFS

4. After each source document is created, update the integrated document for the patient

23

Engagement Data Model - Metadata

{

_id : ObjectId("556b92b83f7e775b8e92b30a"),

meta : {

system : "EHR1",

lastUpdate : ISODate(...)

...

},

common : { ... },

source : { ... }

raw_id : "..."

}

24

Engagement Data Model - Source

{


...

source : {

...

Immunizations : { Immunization : { CCRDataObjectID :"BB0022", DateTime : { Type : { Text :"Start date" }, ExactDateTime :"1998-06-13T05:00:00Z" }, Source : {

Actor : { ActorID :"Jane Smith", ActorRole : {

Text :"Ordering clinician" } } }, ...

},

...

}

25

Engagement Data Model - Common

{


...

common : {

patient : "D6E5D510-592D-C613-DB46..."

},

...

}

26

Engagement Data Model - Integrated

{

_id : ObjectId("556b92b83f7e775b8e92b30d"),

...

meta : {

lastUpdate : ISODate(...)

integrated : [

{ _id : ObjectId("...a"),

{ _id : ObjectId("...b")

]

},

common : { ... }

...

}

27

Engagement Data Model - Integrated

{

_id : ObjectId("556b92b83f7e775b8e92b30d"),

...

common : {

patient : "D6E5D510-592D-C613-DB46...",

CCRs : [

{

...

Medication : {

Product : {

ProductName :

"TIZANIDINE HCL 4 MG TABLET TEV"

}

}

...

},

{

...

Immunizations : { ... },

...

}

]

}

...

}

Engage!

29

Single View Enables New Interactions

• Deliver faster• Deliver to new applications (mobile, etc.)• Improve services• New analytics

30

Changing Data

• Now that data is easy to get to, users will want to make changes

• With single view, can change data in the source systems of record

• Remember the change of address scenario?

31

Example – Change of Address

• Enter in different systems• Call different parts of the organization• What if you have dependents that

live with you?

32

Capture Data Changes


Dat

a S

ervi

ces


Analytics, etc.

Systems of Record

Master Data

Raw Data

Integrated Data

…

ETL

Bus

Apache Kafka

record

record

record

33

Engagement Data Model - Metadata

{

_id : ObjectId("556c1122c9c8f48313553be5"),

meta : {

system : "PatientRecords",

lastUpdate : ISODate(...),

version : 2,

lineage : { ... },

...

},

common : { ... },

source : { ... }

}

34

Engagement Data Model - Source

{


...

source : {

patientId : "D6E5D510-592D-C613-DB46..."

address1 : "John Smith",

address2 : null,

city : "New York",

state : "NY",

zip : "10007"

},

...

}

35

Engagement Data Model - Common

{


...

common : {

patient : "D6E5D510-592D-C613-DB46...",

address : {

addr1 : "John Smith",

city : "New York",

state : "NY",

zip : "10007"

}

},

...

}

36


• Address records can be in different systems• Each system can be notified of the change to the record

37

Update Process

1. User accesses an application to change their address

2. User updates their address in the System of Engagement

3. The address change is broadcast to any Systems of Record that have registered

4. An adapter applies the address change to the System of Record in an application-specific manor

38

Tracking Changes

• Add basic document versioning to track what changed when

• Prefer the separate "current" and "history" collections approach– current contains the last updated version– history contains all previous versions

• Can query history to see the lineage

39

Engagement Data Model – Current

{

_id : ObjectId("556c1122c9c8f48313553be5"),

meta : {



version : 2,

lineage : {

event : "update",

source : "ProfileApp",

},

...

},

...

}

40

Engagement Data Model - History

{

_id : { ObjectId(...), v : 1 },

meta : {



version : 1,

lineage : {

event : "update",

source : "PatientRecords",

},

...

},

...

}

41

Result – New Possibilities

• Change address in one place!• Other value-add processes can be triggered by changes• Example: Automated outreach

– heath and benefits centers in new location– help moving

• Extend address change to Veteran’s dependents

Next Steps

43

Keep going

• Keep adding valuable processes to improve or provide new services

• Phase out legacy if desired– Part 1 – From Relational to MongoDB

• Improve data governance– Part 3 – Bulletproof Data Management

• Reduce costs

44

• Systems of Engagement give users new ways to interact with data

• You can start small and add value quickly• MongoDB enables Systems of Engagement

– Dynamic schema– Fast, flexible querying, analysis, & aggregation– High performance– Scalable– Secure

Summary

45

• Systems of Engagement and the Future of Enterprise IT: A Sea Change in Enterprise IT http://www.aiim.org/futurehistory

• Systems of Engagement & the Enterprisehttp://www-01.ibm.com/software/ebusiness/jstart/systemsofengagement/

• Geoffrey Moore - The Future of Enterprise IThttp://www.slideshare.net/SAPanalytics/geoffrey-moore-the-future-of-enterprise-it

• Ask Asyahttp://askasya.com/post/trackversionshttp://askasya.com/post/revisitversions

References

http://www.aiim.org/futurehistory

http://www.aiim.org/futurehistory

http://www-01.ibm.com/software/ebusiness/jstart/systemsofengagement/

http://www-01.ibm.com/software/ebusiness/jstart/systemsofengagement/

http://www.slideshare.net/SAPanalytics/geoffrey-moore-the-future-of-enterprise-it



http://askasya.com/post/trackversions

http://askasya.com/post/trackversions

http://askasya.com/post/revisitversions



Questions?

[email protected]

mailto:[email protected]

mailto:[email protected]

Painting the Future of Big Data with Apache Spark and MongoDB

Technology

Transcript of Painting the Future of Big Data with Apache Spark and MongoDB