Painting the Future of Big Data with Apache Spark and MongoDB

download Painting the Future of Big Data with Apache Spark and MongoDB

of 46

  • date post

    11-Aug-2015
  • Category

    Technology

  • view

    71
  • download

    0

Embed Size (px)

Transcript of Painting the Future of Big Data with Apache Spark and MongoDB

  1. 1. James Kerr Senior Solutions Architect james.kerr@mongodb.com Conquering Data Proliferation
  2. 2. 2 Part 2 In The Data Management Series Data integration Capture data changes Engaging with your data From Relational To MongoDB Conquering Data Proliferation Bulletproof Data Management Part 1 Part 2 Part 3
  3. 3. 3 Agenda Today's Problem Systems of Engagement Single View of Changing Data Summary
  4. 4. Today's Problem
  5. 5. 5
  6. 6. 6 Result Data walled off in "silos" Can't get a complete picture Have to "swivel chair" system to system Hard to find new avenues to add value Frustrated ops Frustrated customers
  7. 7. 7 Example 20+ million Veterans in the US today 250,000+ employees at Veterans Affairs $3.9 billion for IT in 2015 budget What happens when a Veteran has to change their address with the VA? How does a doctor see a single view of a Veteran's health record?
  8. 8. Systems of Engagement
  9. 9. 9 Next Big Wave of Change Today's Systems of Record were yesterday's Systems of Engagement! Enterprise IT Transition From Systems of Record To the Next Stage Systems of Engagement
  10. 10. 10 Definition Incorporate technologies which encourage peer interactions More decentralized More options for infrastructure especially cloud Enable new / faster interactions
  11. 11. 11 Notional Architecture Systems of Engagement DataServices Data Processing Integration, Analytics, etc. Systems of Record Master Data Raw Data Integrated Data
  12. 12. 12 Many Complexities to Tackle Data Extraction (ETL) Change Data Capture (CDC) Data Governance Data Lineage Versioning Merging changes Security / Entitlements
  13. 13. 13 Focus for Today Data Extraction (ETL) Change Data Capture (CDC) Data Governance Data Lineage Versioning Merging changes Security / Entitlements
  14. 14. Getting Started
  15. 15. 15 Don't Boil the Ocean Information is often spread across multiple systems of record Start with a read-only view of that information Target high value/impact data "moments of engagement"
  16. 16. 16 Example Single View of a Health Record Veteran's view Doctor's view Case worker's view
  17. 17. 17 Single View Architecture Systems of Engagement DataServices Data Processing Integration, Analytics, etc. Systems of Record Master Data Raw Data Integrated Data ETL record record
  18. 18. 18 Dynamic schema Rich querying Aggregation framework High scale/performance Auto-sharding Map-reduce capability (Native MR or Hadoop Connector) Enterprise Security Features Single View Why MongoDB?
  19. 19. 19 Systems of Record Data Model Continuity of Care (CCR) XML docs Pulled some examples from http://googlehealthsamples.googlecode.com/svn/trunk/CCR_samples ... BB0022Start date1998-06-13T05:00:00ZJane SmithOrdering clinician ...
  20. 20. 20 Systems of Record Data Model ... 52Prescription Date2007-03-09T12:00:00ZMedicationRx History SupplierTIZANIDINE HCL 4 MG TABLET TEV -1omi-coding2005 ...
  21. 21. 21 Engagement Data Model Leverage dynamic schema / flexible data model Use an envelope/wrapper pattern Source Data Master Data / Common Data Model Metadata Integrated Data Metadata
  22. 22. 22 Data Flow 1. Read most recent CCRs from each source system 2. Create a source document for each CCR in our system of engagement database 1. Transform XML to JSON for the source data 2. Record the system and date in the metadata 3. Pull out the patient's identifying information to the common data 4. Generate an Id for the raw file 3. Store the original CCR XML into GridFS 4. After each source document is created, update the integrated document for the patient
  23. 23. 23 Engagement Data Model - Metadata { _id : ObjectId("556b92b83f7e775b8e92b30a"), meta : { system : "EHR1", lastUpdate : ISODate(...) ... }, common : { ... }, source : { ... } raw_id : "..." }
  24. 24. 24 Engagement Data Model - Source { _id : ObjectId("556b92b83f7e775b8e92b30a"), ... source : { ... Immunizations : { Immunization : { CCRDataObjectID :"BB0022", DateTime : { Type : { Text :"Start date" }, ExactDateTime :"1998-06-13T05:00:00Z" }, Source : { Actor : { ActorID :"Jane Smith", ActorRole : { Text :"Ordering clinician" } } }, ... }, ... }
  25. 25. 25 Engagement Data Model - Common { _id : ObjectId("556b92b83f7e775b8e92b30a"), ... common : { patient : "D6E5D510-592D-C613-DB46..." }, ... }
  26. 26. 26 Engagement Data Model - Integrated { _id : ObjectId("556b92b83f7e775b8e92b30d"), ... meta : { lastUpdate : ISODate(...) integrated : [ { _id : ObjectId("...a"), { _id : ObjectId("...b") ] }, common : { ... } ... }
  27. 27. 27 Engagement Data Model - Integrated { _id : ObjectId("556b92b83f7e775b8e92b30d"), ... common : { patient : "D6E5D510-592D-C613-DB46...", CCRs : [ { ... Medication : { Product : { ProductName : "TIZANIDINE HCL 4 MG TABLET TEV" } } ... }, { ... Immunizations : { ... }, ... } ] } ... }
  28. 28. Engage!
  29. 29. 29 Single View Enables New Interactions Deliver faster Deliver to new applications (mobile, etc.) Improve services New analytics
  30. 30. 30 Changing Data Now that data is easy to get to, users will want to make changes With single view, can change data in the source systems of record Remember the change of address scenario?
  31. 31. 31 Example Change of Address Enter in different systems Call different parts of the organization What if you have dependents that live with you?
  32. 32. 32 Capture Data Changes Systems of Engagement DataServices Data Processing Integration, Analytics, etc. Systems of Record Master Data Raw Data Integrated Data ETL Bus Apache Kafka record record record
  33. 33. 33 Engagement Data Model - Metadata { _id : ObjectId("556c1122c9c8f48313553be5"), meta : { system : "PatientRecords", lastUpdate : ISODate(...), version : 2, lineage : { ... }, ... }, common : { ... }, source : { ... } }
  34. 34. 34 Engagement Data Model - Source { _id : ObjectId("556b92b83f7e775b8e92b30a"), ... source : { patientId : "D6E5D510-592D-C613-DB46..." address1 : "John Smith", address2 : null, city : "New York", state : "NY", zip : "10007" }, ... }
  35. 35. 35 Engagement Data Model - Common { _id : ObjectId("556b92b83f7e775b8e92b30a"), ... common : { patient : "D6E5D510-592D-C613-DB46...", address : { addr1 : "John Smith", city : "New York", state : "NY", zip : "10007" } }, ... }
  36. 36. 36 Systems of Record Data Model Address records can be in different systems Each system can be notified of the change to the record
  37. 37. 37 Update Process 1. User accesses an application to change their address 2. User updates their address in the System of Engagement 3. The address change is broadcast to any Systems of Record that have registered 4. An adapter applies the address change to the System of Record in an application-specific manor
  38. 38. 38 Tracking Changes Add basic document versioning to track what changed when Prefer the separate "current" and "history" collections approach current contains the last updated version history contains all previous versions Can query history to see the lineage
  39. 39. 39 Engagement Data Model Current { _id : ObjectId("556c1122c9c8f48313553be5"), meta : { system : "PatientRecords", lastUpdate : ISODate(...), version : 2, lineage : { event : "update", source : "ProfileApp", }, ... }, ... }
  40. 40. 40 Engagement Data Model - History { _id : { ObjectId(...), v : 1 }, meta : { system : "PatientRecords", lastUpdate : ISODate(...), version : 1, lineage : { event : "update", source : "PatientRecords", }, ... }, ... }
  41. 41. 41 Result New Possibilities Change address in one place! Other value-add processes can be triggered by changes Example: Automated outreach heath and benefits centers in new location help moving Extend address change to Veterans dependents
  42. 42. Next Steps
  43. 43. 43 Keep going Keep adding valuable processes to improve or provide new services Phase out legacy if desired Part 1 From Relational to MongoDB Improve data governance Part 3 Bulletproof Data Management Reduce costs
  44. 44. 44 Systems of Engagement give users new ways to interact with data You can start small and add value quickly MongoDB enables Systems of Engagement Dynamic schema Fast, flexible querying, analysis, & aggregation High performance Scalable Secure Summary
  45. 45. 45 Systems of Engagement and the Future of Enterprise IT: A Sea Change in Enterprise IT http://www.aiim.org/futurehistory Systems of Engagement & the Enterprise http://www-01.ibm.com/software/ebusiness/jstart/systemsofengagement/ Geoffrey Moore - The Future of Enterprise IT http://www.slideshare.net/SAPanalytics/geoffrey-moore-the-future-of- enterprise-it Ask Asya http://askasya.com/post/trackversions http://askasya.com/post/revisitversions References
  46. 46. Questions? james.kerr@mongodb.com