Painless OO XML with XML::Pastor - 2009 Remix

Post on 15-Jan-2015

1.227 views 2 download

description

How to build Perl classes with roundtrip data binding to XML, painlessly, using W3C XML Schema and XML::PastorSlides from a previous revision of this talk are online at:http://www.slideshare.net/joelbernstein/painless-oo-xml-with-xmlpastorq-presentation/I will be presenting an expanded, more practical, 2009 version of this talk. Now with more code and less theory!- XML is hard, right? Some things which are hard.- XML data binding- Comparisons of modules- XML::Twig- XML::Smart- XML::Simple- XML::Pastor- Pastor howto- XML schema inference- Trang, Relaxer- Relaxer howto- The future?For more information on XML::Pastor see:http://search.cpan.org/~aulusoy/XML-Pastor/Relaxer download:http://www.relaxer.jp/download/relaxer-1.0.zipRelaxer book (Japanese...):http://www.amazon.co.jp/exec/obidos/ASIN/4894715279/Trang:http://www.thaiopensource.com/download/trang-20030619.zip

Transcript of Painless OO XML with XML::Pastor - 2009 Remix

Painless OO <-> XMLwith XML::Pastor

(2009 remix)

Joel BernsteinYAPC::EU 2009

It’s all Greek to me

schemaσχήµα (skhēma)shape, plan

How many of you?

How many of you?

• Use XML

How many of you?

• Use XML

• Hate XML

How many of you?

• Use XML

• Hate XML

• Like XML

A Confession

• I do not like XML

• People use it wrong

XML Data Binding

• Binding XML documents to objects specifically designed for the data in those documents.

• I often have to do this.

XML is hard, right?Some hard things:

• Roundtripping data

• Manipulating XML via DOM API

• Preserving element sibling order, comments, XML entities etc.

Typical horrendous XML document

Sales Order XML Logical data model

XML DOM

I shouldn’t need to care about this

How this makes me feel:

Fundamental problem

• I don’t think in elements and attributes

• I think about my data, not how it’s stored

• This is Perl. DWIM.

SolutionTools should make both the syntax and the details of

the manipulation of XML invisible

Do you write XML

Do you write XML

• By hand?

Do you write XML

• By hand?

• Programmatically?

Do you write XML

• By hand?

• Programmatically?

• Schemata?

Do you write XML

• By hand?

• Programmatically?

• Schemata?

• Validation?

Do you write XML

• By hand?

• Programmatically?

• Schemata?

• Validation?

• Transformation?

XML::Pastor is forall of you.

XML::Pastor

• Available on CPAN

• Abstracts away some of the pain of XML

• Ayhan Ulusoy is the author

• I am just a user

What does it do?

• Generates Perl code from W3C XML Schema (XSD)

• Roundtrip and validate XML to/from Perl without loss of schema information

• Lets you program without caring about XML structure

pastorize

• Automates codegen process

• Conceptually similar to DBIC::Schema::Loader

• TMTOWTDI - offline or runtime

• Works on multiple XSDs (caveat, collisions)

pastorize in usepastorize --mode offline --style multiple \

--destination /tmp/lib/perl \--class_prefix MyApp::Data \/some/path/to/schema.xsd

Very simple contrived Album XML demo

Album XML document

Album XML schema

Pastorize the Album XML schema:

Resulting code tree like:

Modify some XML

Roundtrip and modify XML data using Pastor:

# Load XML# Accessors

# Modify

# Write XML

The result!

Real world Pastor

Real world Pastor

$HASH1 = { 1 => 'Vodafone UK', 2 => 'O2 UK', 3 => 'Orange UK', 4 => 'T-Mobile UK', 8 => 'Hutchinson 3 UK'};

Country XML

Dynamic schema parsing of Country XML

Query the Country object

Modify elements and attributes with uniform syntax

Manipulate array-like data

Create new City data and combine with existing Country object

Validate modified data against the stored schema

Turn Pastor objects back into XML, or transform to XML::LibXML DOM

Parsing with Pastor

• Parse entire XML into XML::LibXML::DOM object

• Convert XML DOM tree into native Perl objects

• Throw away DOM, no longer needed

Reasons to not use XML::Pastor

• When you have no XML Schema

• Although several tools can infer XML schemata from documents

• It’s a code-generator

• No stream parsing

XML::Pastor Scope

• Good for “data XML”

• Unsuitable for “mixed markup”

• e.g. XHTML

• Unsuitable for “huge” documents

XML::Pastorknown limitations

• Mixed elements unsupported

• Substitution groups unsupported

• ‘any’ and ‘anyAttribute’ elements unsupported

• Encodings (only UTF-8 officially supported)

• Default values for attributes - help needed

Other XML modules• XML::Twig

• XML::Compile

• XML::Simple

• XML::Smart

XML::Twig

• Manipulates XML directly

• Using code is coupled closely to document structure

• Optimised for processing huge documents as trees

• No schemata, no validation

XML::Compile

• Original design rationale is to deal with SOAP envelopes and WSDL documents

• Different approach but similar goals to Pastor - processes XML based on XSD into Perl data structures

• More like XML::Simple with Schema support

XML::Compile pt. 2

• Schema support incomplete

• Shaky support for imports, includes

• Include restriction on targetNamespace

• I haven’t used it yet but it looks good

XML::Simple

• Working roundtrip binding for simple cases

• e.g. XMLout(XMLin($file)) works

• Simple API

• Produces single deep data structure

• Gotchas with element multiplicity

XML::Simple pt. 2

• No schemata, no validation

• Can be teamed with a SAX parser

• More suitable for configuration files?

XML::Smart

• Similar implementation to XML::Pastor

• Uses tie() and lots of crac^H^H^H^Hmagic

• Gathers structure information from XML instance, rather than schema

• No code generation!

XML::Smart pt. 2

• No schemata, so no schema validation

• Based on Object::MultiType - overloaded objects as HASH, ARRAY, SCALAR, CODE & GLOB

• Like Pastor, overloads array/hashref access to the data - promotes decoupling

• Reasonable docs, some community growing

Any questions?

Thanks for comingSee you next year

http://search.cpan.org/dist/XML-Pastor/

Bonus MaterialIf we have enough time

XML::Pastor Supported XML Schema Features• Simple and Complex Types• Global Elements• Groups, Attributes, AttributeGroups• Derive simpleTypes by extension• Derive complexTypes by restriction• W3C built-in Types, Unions, Lists• (Most) Restriction Facets for Simple types• External Schema import, include, redefine

XML Schema Inference

• Create an XML schema from an XML document instance

• Every document has an (implicit) schema

• Tools like Relaxer, Trang, as well as the System.Xml.Serializer the .NET Framework can all infer XML Schemata from document instances

Simple D::HA object

Rekeying data

Rekeying data deeper

Warning, boring bit

XML::Pastor Code Generation

• Write out static code to tree of .pm files

• Write out static code to single .pm file

• Create code in a scalar in memory

• Create code and eval() it for use

How Pastor worksCode generation

• Parse schemata into schema model

• Perl data structures containing all the global elements, types, attributes, ...

• “Resolve” Model - determine class names, resolve references, etc

• Create boilerplate code, write out / eval

How Pastor worksGenerated classes

• Each generated class (i.e. type) has classdata “XmlSchemaType” containing schema model

• If the class isa SimpleType it may contain restriction facets

• If the class isa ComplexType it will contain info about child elements and attributes

How Pastor worksIn use

• If classes generated offline, then “use” them, if online then they are already loaded

• These classes have methods to create, retrieve, save object to/from XML

• Manipulate/query data using OO API to complexType fields

• Validate modified objects against schema

Thanks for comingSee you next year

http://search.cpan.org/dist/XML-Pastor/