Painless OO XML with XML::Pastor

Post on 15-Jan-2015

3.110 views 0 download

description

An introduction to XML::Pastor, comparison with other modules etc

Transcript of Painless OO XML with XML::Pastor

Painless OO <-> XMLwith XML::Pastor

Joel Bernstein - LPW 2008

It’s all Greek to me

schema (pl. schemata)σχήμα (skhēma)shape, plan

I do not like XMLPeople use it wrong

• Apple Property Lists

• Tag soup

• Data transfer format vs data storage format

How many of you?

• Use XML

• Hate XML

• Like XML

Do you write XML

• By hand?

• Programmatically?

• Schemata?

• Validation?

• Transformation?

XML::Pastor is forall of you.

XML is hard, right?Some hard things:

• Roundtripping data

• Manipulating XML via DOM API

• Preserving element sibling order, comments, XML entities etc.

SolutionTools should make both the syntax and the details of

the manipulation of XML invisible

XML::Pastor

• I didn’t write it

• Written by Ayhan Ulusoy

• Available on CPAN

• Abstracts away some of the pain of XML

What does it do?

• Generates Perl code from W3C XML Schema (XSD)

• Roundtrip and validate XML to/from Perl without loss of schema information

• Lets you program without caring about XML structure

Parsing with Pastor

• Parse entire XML into XML::LibXML::DOM object

• Convert XML DOM tree into native Perl objects

• Throw away DOM, no longer needed

Reasons to not use XML::Pastor

• When you have no XML Schema

• Although several tools can infer XML schemata from documents

• It’s a code-generator

• No stream parsing

XML::Pastor Code Generation

• Write out static code to tree of .pm files

• Write out static code to single .pm file

• Create code in a scalar in memory

• Create code and eval() it for use

Warning, boring bit

How Pastor worksCode generation

• Parse schemata into schema model

• Perl data structures containing all the global elements, types, attributes, ...

• “Resolve” Model - determine class names, resolve references, etc

• Create boilerplate code, write out / eval

How Pastor worksCode Generation pt. 2

How Pastor worksGenerated classes

• Each generated class (i.e. type) has classdata “XmlSchemaType” containing schema model

• If the class isa SimpleType it may contain restriction facets

• If the class isa ComplexType it will contain info about child elements and attributes

How Pastor worksIn use

• If classes generated offline, then “use” them, if online then they are already loaded

• These classes have methods to create, retrieve, save object to/from XML

• Manipulate/query data using OO API to complexType fields

• Validate modified objects against schema

Very simple Album XML demo

Album XML document

Album XML schema

Pastorize creates Perl classes from Album XML schema:

Resulting code tree like:

Roundtrip and modify XML data using Pastor:

The result!

Real world Pastor

Moose::Role for Pastor

Country XML

Dynamic XML::Pastor usage

Query the Country object

Modify elements and attributes with uniform syntax

NodeArray syntax

Create new City data and combine with existing Country object

Validate modified data against the stored schema

Turn Pastor objects back into XML, or transform to XML::LibXML DOM

Simple D::HA object

Rekeying data

Rekeying data deeper

XML::Pastor Scope

• Good for “data XML”

• Unsuitable for “mixed markup”

• e.g. XHTML

• Unsuitable for “huge” documents

XML::Pastor Supported XML Schema Features• Simple and Complex Types• Global Elements• Groups, Attributes, AttributeGroups• Derive simpleTypes by extension• Derive complexTypes by restriction• W3C built-in Types, Unions, Lists• (Most) Restriction Facets for Simple types• External Schema import, include, redefine

XML::Pastorknown limitations

• Mixed elements unsupported

• Substitution groups unsupported

• ‘any’ and ‘anyAttribute’ elements unsupported

• Encodings (only UTF-8 officially supported)

• Default values for attributes - help needed

XML Data Binding

• Binding XML documents to objects specifically designed for the data in those documents

• Allows e.g. data-centric applications to manipulate data more naturally than by using DOM API

Sales Order XML

Sales Order XML Logical data model

XML DOM

XML DOM

How this makes me feel:

Other XML modules• XML::Twig

• XML::Compile

• XML::Simple

• XML::Smart

XML::Twig

• Manipulates XML directly

• Using code is coupled closely to document structure

• Optimised for processing huge documents as trees

• No schemata, no validation

XML::Compile

• Original design rationale is to deal with SOAP envelopes and WSDL documents

• Different approach but similar goals to Pastor - processes XML based on XSD into Perl data structures

• More like XML::Simple with Schema support

XML::Compile pt. 2

• Schema support incomplete

• Shaky support for imports, includes

• Include restriction on targetNamespace

• I haven’t used it yet but it looks good

XML::Simple

• Working roundtrip binding for simple cases

• e.g. XMLout(XMLin($file)) works

• Simple API

• Produces single deep data structure

• Gotchas with element multiplicity

XML::Simple pt. 2

• No schemata, no validation

• Can be teamed with a SAX parser

• More suitable for configuration files?

XML::Smart

• Similar implementation to XML::Pastor

• Uses tie() and lots of crac^H^H^H^Hmagic

• Gathers structure information from XML instance, rather than schema

• No code generation!

XML::Smart pt. 2

• No schemata, so no schema validation

• Based on Object::MultiType - overloaded objects as HASH, ARRAY, SCALAR, CODE & GLOB

• Like Pastor, overloads array/hashref access to the data - promotes decoupling

• Reasonable docs, some community growing

Any questions?

Thanks for comingSee you next year

Bonus MaterialIf we have enough time

XML Schema Inference

• Create an XML schema from an XML document instance

• Every document has an (implicit) schema

• Tools like Relaxer, Trang, as well as the System.Xml.Serializer the .NET Framework can all infer XML Schemata from document instances

Schema diff