University of Crete Department of Computer Science ΗΥ-561 Web Data Management XML Data Archiving...

21
University of Crete University of Crete Department of Computer Science Department of Computer Science ΗΥ-5 ΗΥ-5 61 61 Web Data Management Web Data Management XML Data XML Data Archiving Archiving Konstantinos Kouratoras Konstantinos Kouratoras

Transcript of University of Crete Department of Computer Science ΗΥ-561 Web Data Management XML Data Archiving...

Page 1: University of Crete Department of Computer Science ΗΥ-561 Web Data Management XML Data Archiving Konstantinos Kouratoras.

University of Crete University of Crete Department of Computer ScienceDepartment of Computer Science

ΗΥ-5 ΗΥ-56161Web Data ManagementWeb Data Management

XML Data ArchivingXML Data Archiving

Konstantinos KouratorasKonstantinos Kouratoras

Page 2: University of Crete Department of Computer Science ΗΥ-561 Web Data Management XML Data Archiving Konstantinos Kouratoras.

What is the problem?What is the problem?

ΗΥ-561 XML Data Archiving – Konstantinos Kouratoras Slide 1

Most research on database contentMost research on database content Usually overwrite existing stateUsually overwrite existing state Need of research on database historyNeed of research on database history

Lost scientific evidenceLost scientific evidence No verification of findings basisNo verification of findings basis

Page 3: University of Crete Department of Computer Science ΗΥ-561 Web Data Management XML Data Archiving Konstantinos Kouratoras.

Why is this interesting?Why is this interesting?

ΗΥ-561 XML Data Archiving – Konstantinos Kouratoras Slide 2

History of the dataHistory of the data Scientific researchScientific research

SWISS-PROT (protein sequence)SWISS-PROT (protein sequence) OMIM (human genes and genetic disordersOMIM (human genes and genetic disorders))

Great deal of manual labourGreat deal of manual labour Continuous changesContinuous changes

Access to old versionsAccess to old versions

Page 4: University of Crete Department of Computer Science ΗΥ-561 Web Data Management XML Data Archiving Konstantinos Kouratoras.

First ApproachFirst Approach

ΗΥ-561 XML Data Archiving – Konstantinos Kouratoras Slide 3

Object matching across versionsObject matching across versions Changes descriptionsChanges descriptions Archive spaceArchive space History efficient queriesHistory efficient queries

Page 5: University of Crete Department of Computer Science ΗΥ-561 Web Data Management XML Data Archiving Konstantinos Kouratoras.

Proposed technique (1/2)Proposed technique (1/2)

ΗΥ-561 XML Data Archiving – Konstantinos Kouratoras Slide 4

Based on:Based on: Hierarchical dataHierarchical data Key structured databasesKey structured databases Accretive databasesAccretive databases

Page 6: University of Crete Department of Computer Science ΗΥ-561 Web Data Management XML Data Archiving Konstantinos Kouratoras.

Proposed technique (2/2)Proposed technique (2/2)

ΗΥ-561 XML Data Archiving – Konstantinos Kouratoras Slide 5

Merging versions into one hierarchyMerging versions into one hierarchy Elements stored onceElements stored once TimestampsTimestamps

Sequence of versionsSequence of versions Time intervalsTime intervals InheritanceInheritance

Keys for element identificationKeys for element identification

Page 7: University of Crete Department of Computer Science ΗΥ-561 Web Data Management XML Data Archiving Konstantinos Kouratoras.

ExampleExample

ΗΥ-561 XML Data Archiving – Konstantinos Kouratoras Slide 6

Page 8: University of Crete Department of Computer Science ΗΥ-561 Web Data Management XML Data Archiving Konstantinos Kouratoras.

XML Model (1/3)XML Model (1/3)

ΗΥ-561 XML Data Archiving – Konstantinos Kouratoras Slide 7

Nodes valuesNodes values T-node: data valuesT-node: data values A-node: attribute name, attribute valueA-node: attribute name, attribute value E-node (internal nodes): tag nameE-node (internal nodes): tag name

List of values of E and T childrenList of values of E and T children Set of values of A childrenSet of values of A children

Nodes value equalityNodes value equality Agree on their valueAgree on their value

Path expressionPath expression Sequence of node namesSequence of node names

Page 9: University of Crete Department of Computer Science ΗΥ-561 Web Data Management XML Data Archiving Konstantinos Kouratoras.

XML Model (2/3)XML Model (2/3)

ΗΥ-561 XML Data Archiving – Konstantinos Kouratoras Slide 8

KeyKey Pair of path expressions (Q, {PPair of path expressions (Q, {P11,…,P,…,Pkk})})

Q: target set of nodesQ: target set of nodes {P{P11,…,P,…,Pkk}: Q key constraints}: Q key constraints

Relative keyRelative key Description dependent on ancestor node keyDescription dependent on ancestor node key Weak entitiesWeak entities

Page 10: University of Crete Department of Computer Science ΗΥ-561 Web Data Management XML Data Archiving Konstantinos Kouratoras.

XML Model (3/3)XML Model (3/3)

ΗΥ-561 XML Data Archiving – Konstantinos Kouratoras Slide 9

Keys for previous exampleKeys for previous example (/,(db,{}))(/,(db,{}))

At most one db element at the rootAt most one db element at the root (/db,(address,{}))(/db,(address,{}))

At most one address under db nodeAt most one address under db node (/db,(emp,{id}))(/db,(emp,{id}))

Every employee within a db element can be uniquely identified by Every employee within a db element can be uniquely identified by his id subelementhis id subelement

(/db/emp,(name,{})), (/db/emp,(sal,{})), (/db/emp,(tel,{}))(/db/emp,(name,{})), (/db/emp,(sal,{})), (/db/emp,(tel,{})) There can be at most one name, sal and tel node for each There can be at most one name, sal and tel node for each

employeeemployee

Page 11: University of Crete Department of Computer Science ΗΥ-561 Web Data Management XML Data Archiving Konstantinos Kouratoras.

ArchiveArchiveArchiveArchive

Components (1/4)Components (1/4)

ΗΥ-561 XML Data Archiving – Konstantinos Kouratoras Slide 10

Annotate KeysAnnotate KeysAnnotate KeysAnnotate Keys

Nested MergeNested MergeNested MergeNested Merge

ArchiverArchiver

Archiver components overviewArchiver components overview

Annotate Keys,Annotate Keys,TimestampsTimestamps

Annotate Keys,Annotate Keys,TimestampsTimestamps

KeysKeysKeysKeys

NewNewversionversion

NewNewversionversion

New ArchiveNew ArchiveNew ArchiveNew Archive

Page 12: University of Crete Department of Computer Science ΗΥ-561 Web Data Management XML Data Archiving Konstantinos Kouratoras.

Components (2/4)Components (2/4)

ΗΥ-561 XML Data Archiving – Konstantinos Kouratoras Slide 11

Annotate keysAnnotate keys Elements annotation with key valuesElements annotation with key values Uniquely identified nodesUniquely identified nodes

Path from root to nodePath from root to node Key annotationKey annotation

Page 13: University of Crete Department of Computer Science ΗΥ-561 Web Data Management XML Data Archiving Konstantinos Kouratoras.

Components (3/4)Components (3/4)

ΗΥ-561 XML Data Archiving – Konstantinos Kouratoras Slide 12

Nested mergeNested merge Identify corresponding elementsIdentify corresponding elements Merge elementsMerge elements Update sets of timestampsUpdate sets of timestamps Nodes with no correspondingNodes with no corresponding

Simply addedSimply added

Page 14: University of Crete Department of Computer Science ΗΥ-561 Web Data Management XML Data Archiving Konstantinos Kouratoras.

Components (4/4)Components (4/4)

ΗΥ-561 XML Data Archiving – Konstantinos Kouratoras Slide 13

Page 15: University of Crete Department of Computer Science ΗΥ-561 Web Data Management XML Data Archiving Konstantinos Kouratoras.

Experimental Results (1/2)Experimental Results (1/2)

ΗΥ-561 XML Data Archiving – Konstantinos Kouratoras Slide 14

Competitive techniquesCompetitive techniques Incremental diffIncremental diff Cumulative diffCumulative diff

Compression methodsCompression methods Gzip (text)Gzip (text) Xmill (XML)Xmill (XML)

Page 16: University of Crete Department of Computer Science ΗΥ-561 Web Data Management XML Data Archiving Konstantinos Kouratoras.

Experimental Results (2/2)Experimental Results (2/2)

ΗΥ-561 XML Data Archiving – Konstantinos Kouratoras Slide 15

Page 17: University of Crete Department of Computer Science ΗΥ-561 Web Data Management XML Data Archiving Konstantinos Kouratoras.

Efficient Retrievals (1/2)Efficient Retrievals (1/2)

ΗΥ-561 XML Data Archiving – Konstantinos Kouratoras Slide 16

Version retrievalVersion retrieval Binary tree for each node x with children as leavesBinary tree for each node x with children as leaves

• TimestampTimestamp• Archive offsetArchive offset

Page 18: University of Crete Department of Computer Science ΗΥ-561 Web Data Management XML Data Archiving Konstantinos Kouratoras.

Efficient Retrievals (2/2)Efficient Retrievals (2/2)

ΗΥ-561 XML Data Archiving – Konstantinos Kouratoras Slide 17

Temporal history retrievalTemporal history retrieval Find keyed node xFind keyed node x Set of keyed childrenSet of keyed children Archive offset, timestamp offsetArchive offset, timestamp offset Sort listSort list Repeat for each keyed nodeRepeat for each keyed node

Page 19: University of Crete Department of Computer Science ΗΥ-561 Web Data Management XML Data Archiving Konstantinos Kouratoras.

ConclusionConclusion

ΗΥ-561 XML Data Archiving – Konstantinos Kouratoras Slide 18

Efficient archiving techniqueEfficient archiving technique Meaningful change descriptionsMeaningful change descriptions Space overhead comparable to diff approachSpace overhead comparable to diff approach

OMIM archive for a yearOMIM archive for a year Less than 1.12 times the space of last versionLess than 1.12 times the space of last version Less than 1.08 times the size of incremental-diffLess than 1.08 times the size of incremental-diff 40% compression with XML compression tool40% compression with XML compression tool

Works well with XML compressionWorks well with XML compression Basic operations with single passBasic operations with single pass XML output (further use)XML output (further use)

Page 20: University of Crete Department of Computer Science ΗΥ-561 Web Data Management XML Data Archiving Konstantinos Kouratoras.

Xarch (1/2)Xarch (1/2)

ΗΥ-561 XML Data Archiving – Konstantinos Kouratoras Slide 19

Archiving toolArchiving tool Extends archiving techniqueExtends archiving technique

Sort elements by keySort elements by key External merge sortExternal merge sort

Query languageQuery language Versions retrievalVersions retrieval History trackingHistory tracking

Page 21: University of Crete Department of Computer Science ΗΥ-561 Web Data Management XML Data Archiving Konstantinos Kouratoras.

Xarch (2/2)Xarch (2/2)

ΗΥ-561 XML Data Archiving – Konstantinos Kouratoras Slide 20

Query language exampleQuery language example