XML Schema Design: Form Follows Function

A disagreement on a recent project got me thinking about some of the more subtle aspects of schema design.  In this particular case, there were two different opinions on the best way to represent supertypes and subtypes.  Since the schema was based on an abstract UML model, supertypes and subtypes were everywhere.  I suspect that, with the trend towards model-driven development, others will encounter this same issue more and more.  So some perspective based on experience might be helpful.
The options for representing supertypes/subtypes essentially boiled down to these:
  •  Inheritance via xsd:extension.  Elements that are superclasses are defined in the regular way, then subclasses extend these, inheriting from the superclass.  The advantages of this approach include 1) it follows normal object-oriented principles, which have proven to be one of the most important advances in software development and fit naturally into the way people deal with complexity (“cars and trucks are both vehicles”), 2) it results in unambigous representations in Java as when JAXB is used, 3) it has an unambiguous SOAP representation, which is becoming more important as web services and SOA become more prevalent, 4) schema understanding is facilitated since subclasses explicitly declare their parent, 5) instance processing is facilitated since subclasses are forced to declare their type using the xsi:type attribute, 6) extensibility is facilitated since subclasses can be added without editing the superclass, and 7) it is the default for many UML-to-XSD generators such as the one provided in IBM’s Rational suite.
  • Composition via xsd:choice.  Elements that are subclasses are defined in the regular way, then superclasses are composed with a choice of elements for various subclasses.  The advantages of this approach are 1) it builds naturally upon the xsd:choice construct that is common in manually-constructed schemas so a lot of people are familiar with it, 2) it naturally gives you a wrapper element around subclass attributes that makes schema-unaware document processing with XSLT simpler (though wrappers can be added in the inheritance approach too), 3) it facilitates schema understanding since superclasses explicitly declare their children, and 4) subclass instances aren’t required to have an xsi:type declaration.  Some might claim that this approach avoids some schema understandability issues that arise if elements are declared with a superclass type and then the instance overridden with a subclass type (see Obasanjo, “Is Complex Type Derivation Unnecessary?”), but I prefer to follow the simple rule, “If it hurts, don’t do it.”
  • Aspects via xsd:group.  Superclass elements are defined as xsd:groups, and subclasses are created by including these groups.  For straightforward superclasse/subclass implementation, it’s not clear that their are any advantages to this approach: it’s somewhat akin to manually implementing inheritance rather than using the XML Schema features designed for that purpose.
In the end, the best approach probably depends on what you intend to do with the XML documents.  From the document processor’s perspective, composition via xsd:choice is the natural solution for marked-up text that will be processed as a document, while the software developer will tend to prefer inheritance via xsd:extension for XML that will be manipulated by software as data.   Using xsd:group may be a compromise that appeases both camps but doesn’t make either happy.  That last option is perhaps more interesting in another context given the rise of aspect oriented programming, but that’s a topic for another blog.
3
Posted in Blog Posts.