Post on 15-Jan-2015
description
Painless OO <-> XMLwith XML::Pastor
Joel Bernstein - LPW 2008
It’s all Greek to me
schema (pl. schemata)σχήμα (skhēma)shape, plan
I do not like XMLPeople use it wrong
• Apple Property Lists
• Tag soup
• Data transfer format vs data storage format
How many of you?
• Use XML
• Hate XML
• Like XML
Do you write XML
• By hand?
• Programmatically?
• Schemata?
• Validation?
• Transformation?
XML::Pastor is forall of you.
XML is hard, right?Some hard things:
• Roundtripping data
• Manipulating XML via DOM API
• Preserving element sibling order, comments, XML entities etc.
SolutionTools should make both the syntax and the details of
the manipulation of XML invisible
XML::Pastor
• I didn’t write it
• Written by Ayhan Ulusoy
• Available on CPAN
• Abstracts away some of the pain of XML
What does it do?
• Generates Perl code from W3C XML Schema (XSD)
• Roundtrip and validate XML to/from Perl without loss of schema information
• Lets you program without caring about XML structure
Parsing with Pastor
• Parse entire XML into XML::LibXML::DOM object
• Convert XML DOM tree into native Perl objects
• Throw away DOM, no longer needed
Reasons to not use XML::Pastor
• When you have no XML Schema
• Although several tools can infer XML schemata from documents
• It’s a code-generator
• No stream parsing
XML::Pastor Code Generation
• Write out static code to tree of .pm files
• Write out static code to single .pm file
• Create code in a scalar in memory
• Create code and eval() it for use
Warning, boring bit
How Pastor worksCode generation
• Parse schemata into schema model
• Perl data structures containing all the global elements, types, attributes, ...
• “Resolve” Model - determine class names, resolve references, etc
• Create boilerplate code, write out / eval
How Pastor worksCode Generation pt. 2
How Pastor worksGenerated classes
• Each generated class (i.e. type) has classdata “XmlSchemaType” containing schema model
• If the class isa SimpleType it may contain restriction facets
• If the class isa ComplexType it will contain info about child elements and attributes
How Pastor worksIn use
• If classes generated offline, then “use” them, if online then they are already loaded
• These classes have methods to create, retrieve, save object to/from XML
• Manipulate/query data using OO API to complexType fields
• Validate modified objects against schema
Very simple Album XML demo
Album XML document
Album XML schema
Pastorize creates Perl classes from Album XML schema:
Resulting code tree like:
Roundtrip and modify XML data using Pastor:
The result!
Real world Pastor
Moose::Role for Pastor
Country XML
Dynamic XML::Pastor usage
Query the Country object
Modify elements and attributes with uniform syntax
NodeArray syntax
Create new City data and combine with existing Country object
Validate modified data against the stored schema
Turn Pastor objects back into XML, or transform to XML::LibXML DOM
Simple D::HA object
Rekeying data
Rekeying data deeper
XML::Pastor Scope
• Good for “data XML”
• Unsuitable for “mixed markup”
• e.g. XHTML
• Unsuitable for “huge” documents
XML::Pastor Supported XML Schema Features• Simple and Complex Types• Global Elements• Groups, Attributes, AttributeGroups• Derive simpleTypes by extension• Derive complexTypes by restriction• W3C built-in Types, Unions, Lists• (Most) Restriction Facets for Simple types• External Schema import, include, redefine
XML::Pastorknown limitations
• Mixed elements unsupported
• Substitution groups unsupported
• ‘any’ and ‘anyAttribute’ elements unsupported
• Encodings (only UTF-8 officially supported)
• Default values for attributes - help needed
XML Data Binding
• Binding XML documents to objects specifically designed for the data in those documents
• Allows e.g. data-centric applications to manipulate data more naturally than by using DOM API
Sales Order XML
Sales Order XML Logical data model
XML DOM
XML DOM
How this makes me feel:
Other XML modules• XML::Twig
• XML::Compile
• XML::Simple
• XML::Smart
XML::Twig
• Manipulates XML directly
• Using code is coupled closely to document structure
• Optimised for processing huge documents as trees
• No schemata, no validation
XML::Compile
• Original design rationale is to deal with SOAP envelopes and WSDL documents
• Different approach but similar goals to Pastor - processes XML based on XSD into Perl data structures
• More like XML::Simple with Schema support
XML::Compile pt. 2
• Schema support incomplete
• Shaky support for imports, includes
• Include restriction on targetNamespace
• I haven’t used it yet but it looks good
XML::Simple
• Working roundtrip binding for simple cases
• e.g. XMLout(XMLin($file)) works
• Simple API
• Produces single deep data structure
• Gotchas with element multiplicity
XML::Simple pt. 2
• No schemata, no validation
• Can be teamed with a SAX parser
• More suitable for configuration files?
XML::Smart
• Similar implementation to XML::Pastor
• Uses tie() and lots of crac^H^H^H^Hmagic
• Gathers structure information from XML instance, rather than schema
• No code generation!
XML::Smart pt. 2
• No schemata, so no schema validation
• Based on Object::MultiType - overloaded objects as HASH, ARRAY, SCALAR, CODE & GLOB
• Like Pastor, overloads array/hashref access to the data - promotes decoupling
• Reasonable docs, some community growing
Any questions?
Thanks for comingSee you next year
Bonus MaterialIf we have enough time
XML Schema Inference
• Create an XML schema from an XML document instance
• Every document has an (implicit) schema
• Tools like Relaxer, Trang, as well as the System.Xml.Serializer the .NET Framework can all infer XML Schemata from document instances
Schema diff