University of Crete Department of Computer Science ΗΥ-561 Web Data Management XML Data Archiving...
-
Upload
nickolas-grant -
Category
Documents
-
view
220 -
download
0
Transcript of University of Crete Department of Computer Science ΗΥ-561 Web Data Management XML Data Archiving...
University of Crete University of Crete Department of Computer ScienceDepartment of Computer Science
ΗΥ-5 ΗΥ-56161Web Data ManagementWeb Data Management
XML Data ArchivingXML Data Archiving
Konstantinos KouratorasKonstantinos Kouratoras
What is the problem?What is the problem?
ΗΥ-561 XML Data Archiving – Konstantinos Kouratoras Slide 1
Most research on database contentMost research on database content Usually overwrite existing stateUsually overwrite existing state Need of research on database historyNeed of research on database history
Lost scientific evidenceLost scientific evidence No verification of findings basisNo verification of findings basis
Why is this interesting?Why is this interesting?
ΗΥ-561 XML Data Archiving – Konstantinos Kouratoras Slide 2
History of the dataHistory of the data Scientific researchScientific research
SWISS-PROT (protein sequence)SWISS-PROT (protein sequence) OMIM (human genes and genetic disordersOMIM (human genes and genetic disorders))
Great deal of manual labourGreat deal of manual labour Continuous changesContinuous changes
Access to old versionsAccess to old versions
First ApproachFirst Approach
ΗΥ-561 XML Data Archiving – Konstantinos Kouratoras Slide 3
Object matching across versionsObject matching across versions Changes descriptionsChanges descriptions Archive spaceArchive space History efficient queriesHistory efficient queries
Proposed technique (1/2)Proposed technique (1/2)
ΗΥ-561 XML Data Archiving – Konstantinos Kouratoras Slide 4
Based on:Based on: Hierarchical dataHierarchical data Key structured databasesKey structured databases Accretive databasesAccretive databases
Proposed technique (2/2)Proposed technique (2/2)
ΗΥ-561 XML Data Archiving – Konstantinos Kouratoras Slide 5
Merging versions into one hierarchyMerging versions into one hierarchy Elements stored onceElements stored once TimestampsTimestamps
Sequence of versionsSequence of versions Time intervalsTime intervals InheritanceInheritance
Keys for element identificationKeys for element identification
ExampleExample
ΗΥ-561 XML Data Archiving – Konstantinos Kouratoras Slide 6
XML Model (1/3)XML Model (1/3)
ΗΥ-561 XML Data Archiving – Konstantinos Kouratoras Slide 7
Nodes valuesNodes values T-node: data valuesT-node: data values A-node: attribute name, attribute valueA-node: attribute name, attribute value E-node (internal nodes): tag nameE-node (internal nodes): tag name
List of values of E and T childrenList of values of E and T children Set of values of A childrenSet of values of A children
Nodes value equalityNodes value equality Agree on their valueAgree on their value
Path expressionPath expression Sequence of node namesSequence of node names
XML Model (2/3)XML Model (2/3)
ΗΥ-561 XML Data Archiving – Konstantinos Kouratoras Slide 8
KeyKey Pair of path expressions (Q, {PPair of path expressions (Q, {P11,…,P,…,Pkk})})
Q: target set of nodesQ: target set of nodes {P{P11,…,P,…,Pkk}: Q key constraints}: Q key constraints
Relative keyRelative key Description dependent on ancestor node keyDescription dependent on ancestor node key Weak entitiesWeak entities
XML Model (3/3)XML Model (3/3)
ΗΥ-561 XML Data Archiving – Konstantinos Kouratoras Slide 9
Keys for previous exampleKeys for previous example (/,(db,{}))(/,(db,{}))
At most one db element at the rootAt most one db element at the root (/db,(address,{}))(/db,(address,{}))
At most one address under db nodeAt most one address under db node (/db,(emp,{id}))(/db,(emp,{id}))
Every employee within a db element can be uniquely identified by Every employee within a db element can be uniquely identified by his id subelementhis id subelement
(/db/emp,(name,{})), (/db/emp,(sal,{})), (/db/emp,(tel,{}))(/db/emp,(name,{})), (/db/emp,(sal,{})), (/db/emp,(tel,{})) There can be at most one name, sal and tel node for each There can be at most one name, sal and tel node for each
employeeemployee
ArchiveArchiveArchiveArchive
Components (1/4)Components (1/4)
ΗΥ-561 XML Data Archiving – Konstantinos Kouratoras Slide 10
Annotate KeysAnnotate KeysAnnotate KeysAnnotate Keys
Nested MergeNested MergeNested MergeNested Merge
ArchiverArchiver
Archiver components overviewArchiver components overview
Annotate Keys,Annotate Keys,TimestampsTimestamps
Annotate Keys,Annotate Keys,TimestampsTimestamps
KeysKeysKeysKeys
NewNewversionversion
NewNewversionversion
New ArchiveNew ArchiveNew ArchiveNew Archive
Components (2/4)Components (2/4)
ΗΥ-561 XML Data Archiving – Konstantinos Kouratoras Slide 11
Annotate keysAnnotate keys Elements annotation with key valuesElements annotation with key values Uniquely identified nodesUniquely identified nodes
Path from root to nodePath from root to node Key annotationKey annotation
Components (3/4)Components (3/4)
ΗΥ-561 XML Data Archiving – Konstantinos Kouratoras Slide 12
Nested mergeNested merge Identify corresponding elementsIdentify corresponding elements Merge elementsMerge elements Update sets of timestampsUpdate sets of timestamps Nodes with no correspondingNodes with no corresponding
Simply addedSimply added
Components (4/4)Components (4/4)
ΗΥ-561 XML Data Archiving – Konstantinos Kouratoras Slide 13
Experimental Results (1/2)Experimental Results (1/2)
ΗΥ-561 XML Data Archiving – Konstantinos Kouratoras Slide 14
Competitive techniquesCompetitive techniques Incremental diffIncremental diff Cumulative diffCumulative diff
Compression methodsCompression methods Gzip (text)Gzip (text) Xmill (XML)Xmill (XML)
Experimental Results (2/2)Experimental Results (2/2)
ΗΥ-561 XML Data Archiving – Konstantinos Kouratoras Slide 15
Efficient Retrievals (1/2)Efficient Retrievals (1/2)
ΗΥ-561 XML Data Archiving – Konstantinos Kouratoras Slide 16
Version retrievalVersion retrieval Binary tree for each node x with children as leavesBinary tree for each node x with children as leaves
• TimestampTimestamp• Archive offsetArchive offset
Efficient Retrievals (2/2)Efficient Retrievals (2/2)
ΗΥ-561 XML Data Archiving – Konstantinos Kouratoras Slide 17
Temporal history retrievalTemporal history retrieval Find keyed node xFind keyed node x Set of keyed childrenSet of keyed children Archive offset, timestamp offsetArchive offset, timestamp offset Sort listSort list Repeat for each keyed nodeRepeat for each keyed node
ConclusionConclusion
ΗΥ-561 XML Data Archiving – Konstantinos Kouratoras Slide 18
Efficient archiving techniqueEfficient archiving technique Meaningful change descriptionsMeaningful change descriptions Space overhead comparable to diff approachSpace overhead comparable to diff approach
OMIM archive for a yearOMIM archive for a year Less than 1.12 times the space of last versionLess than 1.12 times the space of last version Less than 1.08 times the size of incremental-diffLess than 1.08 times the size of incremental-diff 40% compression with XML compression tool40% compression with XML compression tool
Works well with XML compressionWorks well with XML compression Basic operations with single passBasic operations with single pass XML output (further use)XML output (further use)
Xarch (1/2)Xarch (1/2)
ΗΥ-561 XML Data Archiving – Konstantinos Kouratoras Slide 19
Archiving toolArchiving tool Extends archiving techniqueExtends archiving technique
Sort elements by keySort elements by key External merge sortExternal merge sort
Query languageQuery language Versions retrievalVersions retrieval History trackingHistory tracking
Xarch (2/2)Xarch (2/2)
ΗΥ-561 XML Data Archiving – Konstantinos Kouratoras Slide 20
Query language exampleQuery language example