XML Schema Design: Form Follows Function

A disagreement on a recent project got me thinking about some of the more subtle aspects of schema design.  In this particular case, there were two different opinions on the best way to represent supertypes and subtypes.  Since the schema was based on an abstract UML model, supertypes and subtypes were everywhere.  I suspect that, with the trend towards model-driven development, others will encounter this same issue more and more.  So some perspective based on experience might be helpful.
The options for representing supertypes/subtypes essentially boiled down to these:
  •  Inheritance via xsd:extension.  Elements that are superclasses are defined in the regular way, then subclasses extend these, inheriting from the superclass.  The advantages of this approach include 1) it follows normal object-oriented principles, which have proven to be one of the most important advances in software development and fit naturally into the way people deal with complexity (“cars and trucks are both vehicles”), 2) it results in unambigous representations in Java as when JAXB is used, 3) it has an unambiguous SOAP representation, which is becoming more important as web services and SOA become more prevalent, 4) schema understanding is facilitated since subclasses explicitly declare their parent, 5) instance processing is facilitated since subclasses are forced to declare their type using the xsi:type attribute, 6) extensibility is facilitated since subclasses can be added without editing the superclass, and 7) it is the default for many UML-to-XSD generators such as the one provided in IBM’s Rational suite.
  • Composition via xsd:choice.  Elements that are subclasses are defined in the regular way, then superclasses are composed with a choice of elements for various subclasses.  The advantages of this approach are 1) it builds naturally upon the xsd:choice construct that is common in manually-constructed schemas so a lot of people are familiar with it, 2) it naturally gives you a wrapper element around subclass attributes that makes schema-unaware document processing with XSLT simpler (though wrappers can be added in the inheritance approach too), 3) it facilitates schema understanding since superclasses explicitly declare their children, and 4) subclass instances aren’t required to have an xsi:type declaration.  Some might claim that this approach avoids some schema understandability issues that arise if elements are declared with a superclass type and then the instance overridden with a subclass type (see Obasanjo, “Is Complex Type Derivation Unnecessary?”), but I prefer to follow the simple rule, “If it hurts, don’t do it.”
  • Aspects via xsd:group.  Superclass elements are defined as xsd:groups, and subclasses are created by including these groups.  For straightforward superclasse/subclass implementation, it’s not clear that their are any advantages to this approach: it’s somewhat akin to manually implementing inheritance rather than using the XML Schema features designed for that purpose.
In the end, the best approach probably depends on what you intend to do with the XML documents.  From the document processor’s perspective, composition via xsd:choice is the natural solution for marked-up text that will be processed as a document, while the software developer will tend to prefer inheritance via xsd:extension for XML that will be manipulated by software as data.   Using xsd:group may be a compromise that appeases both camps but doesn’t make either happy.  That last option is perhaps more interesting in another context given the rise of aspect oriented programming, but that’s a topic for another blog.
1

Is Oracle XML DB Ready for Prime Time?

Recent experience suggests not.  At least not if you want to mix XML and relational data.

Lately we’ve tried to generate XML records from about 10 relational tables using Oracle’s XQuery with the ora:view() function.  That attempt failed with an indefinite hang when processing more than a few hundred records (Oracle bug 8944761).  Using an explicit cursor in PL/SQL got us around that one, albeit slowly: 20 records/second is not my idea of fast when you have 400,000 records to process.  Next we tried to use XQuery and ora:view() to merge about 60 code table lookups into XML records (kinda nice to let users see both codes and descriptions, right?).  The performance of roughly three lookups per second was very disappointing.  
 
While the performance of pure relational/SQL and pure XML/XQuery operations appears good, mixing relational data into an XQuery with ora:view() appears to be a recipe for performance problems.
1

Achieving High Availability with Oracle

Applications that require high availability need some form of redundancy to protect against failures.  For those with an Oracle database, the first thing that comes to mind is often Oracle Real Application Clusters (RAC).  But is this the best solution?

Oracle RAC allows multiple active DMBS servers to access a single database.  If one server goes down, traffic is automatically routed to the remaining server within a matter of seconds.  Although Oracle has made deploying RAC easier in recent releases (10g and 11g), it still adds a significant amount of complexity and cost.  More importantly, it does not protect against storage system failures: if you lose your SAN, NAS, or JBOD storage, you lose your database.
Oracle DataGuard allows an active database to be replicated to another hot-standby or read-only copy.  If the active server or its storage goes down, the standby server is made active and traffic is routed to it within a few minutes.  DataGuard is included in the Oracle Enterprise license at no extra cost, and can replicate to either a local instance or to a remote instance (for full disaster recovery).  More importantly, it protects against both server and storage system failures.  It also can play a useful role in migrating to new server hardware or software.
If your business can tolerate a few minutes of outage, DataGuard is probably a good place to start.  If you need to further reduce the impact of a DBMS server outage from minutes to seconds, you should consider adding RAC.
1

BPS Supports the Surge on Grants.gov

The American Recovery and Reinvestment Act resulted in a surge of Federal grant activity, and consequently a surge in the load on the Grants.gov system.  To help the Grants.gov program office prepare the system for this increased load, BPS provided detailed performance testing/analysis and made clear, concrete recommendations.

Although the system strained under loads early on, implementation of the recommendations significantly increased system capacity, and the system successfully handled the heaviest loads.  BPS staff worked nights and weekends to test system enhancements off-hours to help minimize the impact to the user community while ensuring the stability, functionality, and performance of the system.

Our hats are off to the dedicated Federal staff and the system integration contractor staff who also put in many late hours…it was a great team effort and ultimately successful.

BPS Supports Launch of NARA’s Presidential Libraries

On January 20, 2009 the presidential administration officially changed hands.  In the background, millions of presidential records from the Bush administration were being transferred to the National Archives and Records Administration (NARA): documents, email, photographs, and databases of all kinds.  The transfer was performed with little fanfare, because the NARA was ready for this huge onslaught of records, including many times the number of electronic records received from past administrations.

A year earlier, the chances of success did not look so good.  NARA was struggling with implementation of the baseline Electronic Records Archive (ERA), a large and complex system designed for the controlled transfer of records.  It would be perhaps another six months before the system was ready to transfer Federal records and could even begin the modifications needed to support the unique processes and formats associated with presidential records.  The January 20 deadline was clearly unachievable.

NARA engaged with Lockheed Martin to identify an alternate path to success.  Part of the solution would be a novel storage system with embedded search capabilities that would support the presidential library requirements.  That still left, however, development of a full-scale enterprise web application with record management and case management capabilities, which would have to be completed in parallel with (and in less time than) the baseline ERA system.  Working under Lockheed, BPS proposed and prototyped an alternative architecture that leveraged a content management system and an open source application framework to dramatically reduce development time, while providing users a high degree of functionality and flexibility.  The system also exploited XML as a common format to provide uniform access to a wide variety of records.

Working hand-in-hand with a small team of capable Lockheed staff and NARA representatives, initial demonstration capabilities were completed within four months, and production capabilities were achieved a few months later.  BPS staff flew to Texas to provide training to end users, and all was ready by the transition date.  The rest, as they say, is history…well preserved in the Bush 43 Presidential Library.

BPS Announces Launch of Continuous Testing Framework

How can test and assurance keep up with the trend toward rapid development and continuous integration? With a continuous testing framework, naturally.

As the Grants.gov system development approached critical milestones, the rate of new “builds” to deploy fixes and enhancements increased dramatically.  Even for small changes, however, a thorough testing was required to ensure not only that the new other functions worked, but that others were not inadvertently broken.  And yet the testing needed to be accomplished in much less time than before.

To respond to this challenge, BPS developed an innovative solution that used open source continuous integration tools and applied them to automated test cases.  The result: a dramatic decrease in the time to conduct system testing and an increase in test quality.  This continuous testing framework can benefit any medium-to-large scale application development effort…contact us to see how we might be able to help your project.