Data Management 2: Conquering Data Proliferation

46
James Kerr Solutions Architect [email protected] Conquering Data Proliferation

Transcript of Data Management 2: Conquering Data Proliferation

Page 1: Data Management 2: Conquering Data Proliferation

James KerrSolutions Architect

[email protected]

Conquering Data Proliferation

Page 2: Data Management 2: Conquering Data Proliferation

2

Part 2 In The Data Management Series

Data integration

Capture data changes

Engaging with your data

From RelationalTo MongoDB

ConqueringData Proliferation

BulletproofData Management

çΩ

Part1

Part2

Part3

Page 3: Data Management 2: Conquering Data Proliferation

3

Agenda

• Today's Problem• Systems of Engagement• Single View of…• Changing Data• Summary

Page 4: Data Management 2: Conquering Data Proliferation

Today's Problem

Page 5: Data Management 2: Conquering Data Proliferation

5

Enterprises Today

Page 6: Data Management 2: Conquering Data Proliferation

6

Result

• Data walled off in "silos"• Can't get a complete picture • Have to "swivel chair" system to system• Hard to find new avenues to add value• Frustrated ops• Frustrated customers

Page 7: Data Management 2: Conquering Data Proliferation

7

Example

• 20+ million Veterans in the US today• 250,000+ employees at Veterans Affairs• $3.9 billion for IT in 2015 budget

• What happens when a Veteran has to change their address with the VA?

• How does a doctor see a single view of a Veteran's health record?

Page 8: Data Management 2: Conquering Data Proliferation

Systems of Engagement

Page 9: Data Management 2: Conquering Data Proliferation

9

Big Wave of Change Happening

Today's Systems of Record were yesterday's Systems of Engagement!

Enterprise IT Transition From• Systems of Record

To the Next Stage• Systems of Engagement

Page 10: Data Management 2: Conquering Data Proliferation

10

Definition

• Incorporate technologies which encourage peer interactions

• More decentralized• More options for infrastructure especially cloud• Enable new / faster interactions

Page 11: Data Management 2: Conquering Data Proliferation

11

Notional Architecture

Systems of Engagement

Dat

a S

ervi

ces

Data Processing Integration,

Analytics, etc.

Systems of Record

Master Data

Raw Data

Integrated Data

Page 12: Data Management 2: Conquering Data Proliferation

12

Many Complexities to Tackle

• Data Extraction (ETL)• Change Data Capture (CDC)• Data Governance• Data Lineage

– Versioning– Merging changes

• Security / Entitlements

Page 13: Data Management 2: Conquering Data Proliferation

13

Focus for Today

• Data Extraction (ETL)• Change Data Capture (CDC)• Data Governance• Data Lineage

– Versioning– Merging changes

• Security / Entitlements

Page 14: Data Management 2: Conquering Data Proliferation

Getting Started

Page 15: Data Management 2: Conquering Data Proliferation

15

Don't Boil the Ocean

• Information is often spread across multiple systems of record

• Start with a read-only view of that information• Target high value/impact data – "moments of

engagement"

Page 16: Data Management 2: Conquering Data Proliferation

16

Example – Single View of a Health Record

• Veteran's view• Doctor's view• Case worker's view

Page 17: Data Management 2: Conquering Data Proliferation

17

Single View Architecture

Systems of Engagement

Dat

a S

ervi

ces

Data Processing Integration,

Analytics, etc.

Systems of Record

Master Data

Raw Data

Integrated Data

ETLrecord

record

Page 18: Data Management 2: Conquering Data Proliferation

18

• Dynamic schema• Rich querying• Aggregation framework• High scale/performance• Auto-sharding• Map-reduce capability (Native MR or Hadoop Connector)• Enterprise Security Features

Single View – Why MongoDB?

Page 19: Data Management 2: Conquering Data Proliferation

19

Systems of Record Data Model

• Continuity of Care (CCR) XML docs• Pulled some examples from

http://googlehealthsamples.googlecode.com/svn/trunk/CCR_samples

... <Immunizations> <Immunization> <CCRDataObjectID>BB0022</CCRDataObjectID> <DateTime> <Type> <Text>Start date</Text> </Type> <ExactDateTime>1998-06-13T05:00:00Z</ExactDateTime> </DateTime> <Source> <Actor> <ActorID>Jane Smith</ActorID> <ActorRole> <Text>Ordering clinician</Text> </ActorRole> </Actor> </Source>...

Page 20: Data Management 2: Conquering Data Proliferation

20

Systems of Record Data Model

... <Medications> <Medication> <CCRDataObjectID>52</CCRDataObjectID> <DateTime> <Type> <Text>Prescription Date</Text> </Type> <ExactDateTime>2007-03-09T12:00:00Z</ExactDateTime> </DateTime> <Type> <Text>Medication</Text> </Type> <Source> <Actor> <ActorID>Rx History Supplier</ActorID> </Actor> </Source> <Product> <ProductName> <Text>TIZANIDINE HCL 4 MG TABLET TEV</Text> <Code> <Value>-1</Value>

<CodingSystem>omi-coding</CodingSystem> <Version>2005</Version>

...

Page 21: Data Management 2: Conquering Data Proliferation

21

Engagement Data Model

• Leverage dynamic schema / flexible data model• Use an envelope/wrapper pattern

Source Data

Master Data / Common Data Model

Metadata

Integrated Data

Metadata

Page 22: Data Management 2: Conquering Data Proliferation

22

Data Flow

1. Read most recent CCRs from each source system

2. Create a source document for each CCR in our system of engagement database

1. Transform XML to JSON for the source data

2. Record the system and date in the metadata

3. Pull out the patient's identifying information to the common data

4. Generate an Id for the raw file

3. Store the original CCR XML into GridFS

4. After each source document is created, update the integrated document for the patient

Page 23: Data Management 2: Conquering Data Proliferation

23

Engagement Data Model - Metadata

{

_id : ObjectId("556b92b83f7e775b8e92b30a"),

meta : {

system : "EHR1",

lastUpdate : ISODate(...)

...

},

common : { ... },

source : { ... }

raw_id : "..."

}

Page 24: Data Management 2: Conquering Data Proliferation

24

Engagement Data Model - Source

{

_id : ObjectId("556b92b83f7e775b8e92b30a"),

...

source : {

...

Immunizations : { Immunization : { CCRDataObjectID :"BB0022", DateTime : { Type : { Text :"Start date" }, ExactDateTime :"1998-06-13T05:00:00Z" }, Source : {

Actor : { ActorID :"Jane Smith", ActorRole : {

Text :"Ordering clinician" } } }, ...

},

...

}

Page 25: Data Management 2: Conquering Data Proliferation

25

Engagement Data Model - Common

{

_id : ObjectId("556b92b83f7e775b8e92b30a"),

...

common : {

patient : "D6E5D510-592D-C613-DB46..."

},

...

}

Page 26: Data Management 2: Conquering Data Proliferation

26

Engagement Data Model - Integrated

{

_id : ObjectId("556b92b83f7e775b8e92b30d"),

...

meta : {

lastUpdate : ISODate(...)

integrated : [

{ _id : ObjectId("...a"),

{ _id : ObjectId("...b")

]

},

common : { ... }

...

}

Page 27: Data Management 2: Conquering Data Proliferation

27

Engagement Data Model - Integrated

{

_id : ObjectId("556b92b83f7e775b8e92b30d"),

...

common : {

patient : "D6E5D510-592D-C613-DB46...",

CCRs : [

{

...

Medication : {

Product : {

ProductName :

"TIZANIDINE HCL 4 MG TABLET TEV"

}

}

...

},

{

...

Immunizations : { ... },

...

}

]

}

...

}

Page 28: Data Management 2: Conquering Data Proliferation

Engage!

Page 29: Data Management 2: Conquering Data Proliferation

29

Single View Enables New Interactions

• Deliver faster• Deliver to new applications (mobile, etc.)• Improve services• New analytics

Page 30: Data Management 2: Conquering Data Proliferation

30

Changing Data

• Now that data is easy to get to, users will want to make changes

• With single view, can change data in the source systems of record

• Remember the change of address scenario?

Page 31: Data Management 2: Conquering Data Proliferation

31

Example – Change of Address

• Enter in different systems• Call different parts of the organization• What if you have dependents that

live with you?

Page 32: Data Management 2: Conquering Data Proliferation

32

Capture Data Changes

Systems of Engagement

Dat

a S

ervi

ces

Data Processing Integration,

Analytics, etc.

Systems of Record

Master Data

Raw Data

Integrated Data

ETL

Bus

Apache Kafka

record

record

record

Page 33: Data Management 2: Conquering Data Proliferation

33

Engagement Data Model - Metadata

{

_id : ObjectId("556c1122c9c8f48313553be5"),

meta : {

system : "PatientRecords",

lastUpdate : ISODate(...),

version : 2,

lineage : { ... },

...

},

common : { ... },

source : { ... }

}

Page 34: Data Management 2: Conquering Data Proliferation

34

Engagement Data Model - Source

{

_id : ObjectId("556c1122c9c8f48313553be5"),

...

source : {

patientId : "D6E5D510-592D-C613-DB46..."

address1 : "John Smith",

address2 : null,

city : "New York",

state : "NY",

zip : "10007"

},

...

}

Page 35: Data Management 2: Conquering Data Proliferation

35

Engagement Data Model - Common

{

_id : ObjectId("556c1122c9c8f48313553be5"),

...

common : {

patient : "D6E5D510-592D-C613-DB46...",

address : {

addr1 : "John Smith",

city : "New York",

state : "NY",

zip : "10007"

}

},

...

}

Page 36: Data Management 2: Conquering Data Proliferation

36

Systems of Record Data Model

• Address records can be in different systems• Each system can be notified of the change to the record

Page 37: Data Management 2: Conquering Data Proliferation

37

Update Process

1. User accesses an application to change their address

2. User updates their address in the System of Engagement

3. The address change is broadcast to any Systems of Record that have registered

4. An adapter applies the address change to the System of Record in an application-specific manner

Page 38: Data Management 2: Conquering Data Proliferation

38

Tracking Changes

• Add basic document versioning to track what changed when

• Prefer the separate "current" and "history" collections approach– current contains the last updated version– history contains all previous versions

• Can query history to see the lineage

(See http://askasya.com/post/revisitversions)

Page 39: Data Management 2: Conquering Data Proliferation

39

Engagement Data Model – Current

{

_id : ObjectId("556c1122c9c8f48313553be5"),

meta : {

system : "PatientRecords",

lastUpdate : ISODate(...),

version : 2,

lineage : {

event : "update",

source : "ProfileApp",

},

...

},

...

}

Page 40: Data Management 2: Conquering Data Proliferation

40

Engagement Data Model - History

{

_id : {

id : ObjectId("556c1122c9c8f48313553be5"), v : 1

},

meta : {

system : "PatientRecords",

lastUpdate : ISODate(...),

version : 1,

lineage : {

event : "update",

source : "PatientRecords",

},

...

},

...

}

Page 41: Data Management 2: Conquering Data Proliferation

41

Result – New Possibilities

• Change address in one place!• Other value-add processes can be triggered by changes• Example: Automated outreach

– heath and benefits centers in new location– help moving

• Extend address change to Veteran’s dependents

Page 42: Data Management 2: Conquering Data Proliferation

Next Steps

Page 43: Data Management 2: Conquering Data Proliferation

43

Keep going

• Keep adding valuable processes to improve or provide new services

• Phase out legacy if desired– Part 1 – From Relational to MongoDB

• Improve data governance– Part 3 – Bulletproof Data Management

• Reduce costs• Innovate

Page 44: Data Management 2: Conquering Data Proliferation

44

• Systems of Engagement give users new ways to interact with data

• You can start small and add value quickly• MongoDB enables Systems of Engagement

– Dynamic schema– Fast, flexible querying, analysis, & aggregation– High performance– Scalable– Secure

Summary

Page 45: Data Management 2: Conquering Data Proliferation

45

• Systems of Engagement and the Future of Enterprise IT: A Sea Change in Enterprise IT http://www.aiim.org/futurehistory

• Systems of Engagement & the Enterprisehttp://www-01.ibm.com/software/ebusiness/jstart/systemsofengagement/

• Geoffrey Moore - The Future of Enterprise IThttp://www.slideshare.net/SAPanalytics/geoffrey-moore-the-future-of-enterprise-it

• Ask Asyahttp://askasya.com/post/trackversionshttp://askasya.com/post/revisitversions

References