XML Application Design

A markup language created with XML is called an XML application. MathML, SOAP, XSL, SVG and XHTML are all XML applications. Application is a pretty lousy term considering its common software usage, but we’re stuck with it!

A number of XML technology components exist to make designing XML applications easier. All of these components arose out of recognizing common needs among many XML applications. Using the standard XML components in your XML application designs can save work and make things more accessible to a wider audience. This essay presents some guidelines for designing XML applications and gives an overview of the standard XML components.

Naming Conventions

One of your first considerations when designing an XML application is deciding how you’ll name the elements and attributes. At the very least, you’ll want to use a consistent naming convention. There’s not an industry consensus on naming conventions by any means, but favoring UpperCamel case for elements and lowerCamel case for attributes is a reasonable guideline based on recent XML standards development.

The following table summarizes naming conventions across a number of successful XML applications. Note that the XML applications get relatively older as you move to the right in the table.

Element Attribute	UpperCamel lowerCamel	lowerCamel lowerCamel	lower-dash lower-dash
Application	SOAP XAML WS-*	XML Schema RelaxNG WSDL SVG	XSL DocBook ebXML MathML XInclude XLink

Neither the CAPS_AND_UNDERSCORES nor the loweruntogether naming conventions are recommended, though they do exist in the wild. You’ll find these foul creatures especially if you’re pulling data from legacy systems and databases without case-sensitive object names. The World is case-sensitive…so get used to it! And we certainly aren’t forced to jam names into eight characters or less anymore!

When you choose your naming style, you need to be aware of the context of your XML application. If your XML application works with applications like XSL or DocBook, then it may be advantageous to copy their lower-dash style for your names. CamelCase is used in most web services-related technologies. If you don’t want your WS-Swan specification looking like a WS-UglyDuckling, then you’re best off choosing CamelCase.

Verbose XML Markup

XML is often criticized for being bulky and verbose. Long element names don’t help this situation and it’s not uncommon to have your markup outweigh your data! But don't unduly worry about the length of element and attribute names in your XML. Clarity of expression should be your primary goal when selecting names. Favor the use of terms from your XML application’s business and problem domains.

Element names with clear meanings should always be preferred over terse or cryptic names because they make working with the XML easier. At the cost of a little bulkiness, your code gains a lot in clarity and maintainability with good names.

Let’s consider the potential performance impacts of bulky XML markup—you’ll find it’s not that bad. The lengths of names don’t necessarily affect the memory footprint of a DOM. Name tables behind the scenes in DOM implementations likely take care of keeping memory use down. XML document file sizes will be larger on disk if commonly used names are lengthy, but disk space is practically unlimited these days anyway.

The amount of memory on modern machines allows you to get away with holding fairly large XML documents in memory, but you need to always be mindful of memory usage. Batch processing large XML documents need not consume a lot of memory. Hardly any memory is consumed with a SAX approach.

Transferring large XML documents over networks may be cause for concern, but it’s a concern that can usually be addressed with compression. Text compression algorithms take advantage of repeating runs of characters and XML markup is full of repeating runs of characters. XML compresses well. A number of software and hardware products exist for compressing XML over networks.

Besides the physical impact of bulky XML is the emotional impact of bulky XML. But don’t let the bulk of XML markup turn you off from using it! When you initially learn XSL transforms, for example, the verbosity of the markup is quite frustrating to read and write. The amount of typing required setting up a single template with a xsl:choose structure or just calling a template with a few parameters is considerable. Compared to traditional programming languages XSL is downright fat. However, after becoming familiar with XSL, you learn to not even see the bulky names of the XSL elements. In fact, you’ll come to appreciate the economy of expression XSL has for processing XML.

The XML editing tools have reached a level of maturity that alleviates much of the human burden of bulky XML. XML editors with automatic tag completion are commonplace. Some XML editors can leverage your DTDs or XML schemas to make editing a breeze.

Elements and Attributes

Favor elements in your XML applications because they are more open to change over time. You will very seldom create an XML application correctly the first time, so flexibility is important in your design.

The following movie catalog sample makes the case for preferring elements to attributes:

1 |<?xml version="1.0" ?>
2 |<Catalog>
3 |   <Movie genre="Comedy"> 
4 |         <Title>Young Frankenstein</Title>
5 |   </Movie>
6 |</Catalog>

A genre value must be selected from a fixed list of genres for each Movie element in your movie catalog. Suppose the need arises to assign multiple genres to a single movie but you didn’t allow for this in your original design. Because you can’t have multiple attributes with the same name for a single element, a genre attribute would limit you to one genre per movie. Sure, you could always choose to delimit a list of values within a single genre attribute, but your data becomes less structured and more difficult to manipulate if you do.

As the following sample shows, one or more child Genre elements for each Movie element would provide a more open design than the single attribute approach would:

1 |<?xml version="1.0" ?>
2 |<Catalog>
3 |   <Movie>
4 |         <Title>Young Frankenstein</Title>
5 |         <Genre>Comedy</Genre>
6 |         <Genre>Seasonal-Halloween</Genre>
7 |   </Movie>
8 |</Catalog>

For a more sophisticated design, you may even choose to provide a container element for genres that groups the child Genre elements.

1 |<?xml version="1.0" ?>
2 |<Catalog>
3 |   <Movie>
4 |         <Title>Young Frankenstein</Title>
5 |         <Genres>
6 |               <Genre>Comedy</Genre>
7 |               <Genre>Seasonal-Halloween</Genre>
8 |         </Genres>
9 |   </Movie>
10|</Catalog>

Code and validation schemes are often simplified by the use of container elements. It’s common to write code that filters or extracts portions of an XML document during processing and container elements provide a convenient hook for such operations.

Appropriate Metadata

Units of measure, currency type, and other data types often need to be added to XML applications as metadata. It can be difficult to decide what metadata to include in your XML application. As a general guideline, use metadata sparingly and only when it makes an important contribution to the interpretation of the data.

Having a Price element that includes a currency=”USD” attribute is certainly a textbook-ready example of metadata, but such metadata is only appropriate when you’ll be mixing currency types in the XML. If you’ll be infrequently using non-USD currency, make the attribute have a default value of ”USD” in your XML Schema or DTD so its use is optional in the general case.

Some metadata is inherent in the structure of an XML document and therefore should not be included in your XML applications. The order and count of elements and the parent/child relationship between nested elements, for example, does not need to be repeated in element metadata.

The coding cost of maintaining additional metadata that could be inferred from the structure of the XML document can be high. Imagine a case where you’re maintaining both the total count and ordinal position of each item in a list of items. Under this scenario, any insert or delete from the list requires not only the simple operation on the new or deleted node, but also the updating of both the total count and any ordinal positions for all subsequent items. That’s a bit of work not to be undertaken lightly!

Validation Issues

Consider validation during design. Some markup language constructs can be a challenge to validate, especially with XML Schema. It’s helpful to be familiar with common usage patterns of the validation mechanisms you’re going to use when designing your markup language.

All the validation methods can be, for lack of a better word, relaxed, by writing them in less restrictive ways. Our List element could simply be declared as having mixed content for example. Unfortunately, you’d likely get none of the benefits of validation under this usage scenario.

To avoid surprises, it’s often best to design your markup language by writing the validation code first in an iterative fashion with prototype XML test data. Use the validation code as your design tool. If you wait to consider validation at the end of your design, you risk running into frustrating validation traps.

A frequent validation design problem involves ID values. You must be aware of the simple value limitations placed on IDs.

IDs must begin with an alphabetic character or with an underscore.
IDs may not begin with a number.
IDs must be unique within the entire XML document.

The numeric restriction on IDs is the most common trap, because good unique numeric IDs are often available, especially when you’re taking data from databases. To take advantage of the numeric IDs, you either have to declare the value not to be an ID type in your schema, losing some validation power, or to prefix the numeric value with an acceptable character.

Also beware duplicate IDs from disparate elements like matching customer and category IDs of “C01”. If you have bad IDs, validating parsers will barf errors on you. The ID constraint checking performed by validating parsers is a valuable tool; so don’t shy away from ID and IDREF usage in your designs.

Building Blocks

The remainder of this essay introduces the standard XML components available for use in your XML application designs.

XInclude

XInclude provides an element-based mechanism for pulling content into an XML document. An XML parser that supports XInclude processing will read XInclude directives while parsing and process the included content as if it were part of the source document. Here’s what the XInclude element looks like in action:

1 |<?xml version="1.0" ?>
2 |<List name="Fruit List"
3 |   xmlns:xi="http://www.w3.org/2001/XInclude">
4 |   <xi:include href="items.xml"/>
5 |</List>

Even if the toolset you’re using doesn’t natively support XInclude, it’s often easy to provide support with a modified stream reader. See the MVP project’s XInclude-enabled XmlTextReader classes if you’re using Microsoft .NET.

Some XML applications that pre-date the XInclude specification developed custom include mechanisms, XSL’s xsl:include element for example. Because XInclude processing occurs in the parser, upstream from your XML application processing, as long as your parser supports XInclude, your application can take advantage of XInclude without any additional coding on your part.

XInclude can pull in text or XML. When including XML, you may be able to further use the mechanisms provided by XPointer to pull in just about any subset of an XML document.

XLink

XLink provides an attribute-based resource linking mechanism. You can use XLink to create simple links that work like HTML anchor tags. XLink also allows for more sophisticated bi-directional linking or even graph representation. Some graph structures aren’t easy to represent in XML, XLink has a decent and thorough approach to representing graph structures. It’s XLink’s approach to linking that’s valuable to your XML applications, not just a common set of link attribute names. Here’s what simple XLink attributes look like:

1 |<?xml version="1.0" ?>
2 |<List name="Fruit List"
3 |   xmlns:xlink="http://www.w3.org/1999/XLink">
4 |   <item xlink:href="http://www.apple.com/">Apple</Item>
5 |</List>

Many XML applications in the wild make use of XLink, including SVG and DocBook. If you’re a savvy XML application builder and you have link-like things to do, you ought to consider supporting XLink linking.

One nice thing about linking being attribute-based is that you can have any element carry link information. If you have an XML application representing shapes in a diagram, then each shape element can carry link information using XLink. An example of where more sophisticated links might be useful is in a UML model represented as XML. You could use XLink attributes to add labels describing the nature, direction, or other properties of a link between diagram components.

XPointer

XPointer provides a language for addressing structures within an XML document. XPointer was designed to be used in conjunction with XLink and XInclude for tasks beyond simple URI linking. XPointer can address individual characters within an XML document and can be used to form text or element selection ranges.

The XPointer specification is divided into several parts that independently define different kinds of target addressing called XPointer Schemes. The XPointer Framework recommendation establishes how XPointer Schemes should be implemented to participate in XPointer expressions. Multiple schemes may be combined to make up the content of a single XPointer expression. Here’s an overview of several XPointer Schemes:

The XPointer xpointer() Scheme provides for XPath-based addressing. The expression xpointer(/List/Item[2]) would identify the second Item element in a List. The xpointer() Scheme provides extension functions to basic XPath expressions that allow for additional types of range selections.

The XPointer xmlns() Scheme allows for namespace prefix mapping in XPointer expressions. The expression xmlns(lh=http://liquidhub.com/SimpleList) maps the lh prefix to a namespace URI. In the previous XML Namespaces essay we discussed why namespace-to-prefix mapping is important. The xpointer() and xmlns() schemes must be used together to properly handle namespaces.

The XPointer element() Scheme provides for a funky XML ID and position-based element addressing. Here are three samples that should give you a feel for element() Scheme addressing: element(targetID), element(/1/2), element(targetID/2). The last sample addresses the second child element of an element with an ID equal to “targetID”. ID addressing only works in a validating parser context where attributes can be identified to be of type ID.

xml:base, xml:space, xml:lang, xml:id

Another kind of standard building block are the xml:* attributes. These attributes can be used on any elements in your XML application and are part of the XML http://www.w3.org/XML/1998/namespaces namespace. Unlike the other extension components, the XML namespace does not need to be explicitly declared when using these attributes, though the attributes themselves need to be declared in your XML Schema or DTD for validation.

The xml:base attribute works similarly to HTML’s BASE tag, it establishes a context for relative URI resolution. The xml:base attribute affects XLink relative URI references as well.

In the XML White Space essay we discussed how the xml:space attribute can be used on any element to describe how white space should be handled within that element. The value must be either preserve or default and acts as a hint to the XML parser on how to handle white space.

XML applications containing resources for multiple languages can take advantage of the xml:lang attribute. This attribute’s values are taken from the ISO 639 standard country codes, xml:lang=”en-US”. Together with XML’s Unicode support, the xml:lang attribute adds to XML’s strength for internationalization.

The xml:id attribute is simply a proposed common name for ID type attributes. Using xml:id as a standard ID type attribute would enable ID behavior and restrictions outside of a validating parser context. XLink, XPointer, and XInclude could all use ID references on well-formed XML without requiring DTD or Schema validation. The xml:id specification was still in the standards pipeline when this essay was written.

Designing an XML application requires the same skills as designing class libraries or relational database models. Well-designed XML applications follow established patterns, use common components, and have meaningful names for things that correspond to the business or problem domain.

References

XML Linking Language (XLink) Version 1.0: http://www.w3.org/TR/xlink/

XPointer Framework: http://www.w3.org/TR/xptr-framework/

XPointer xpointer() Scheme: http://www.w3.org/TR/xptr-xpointer/

XPointer element() Scheme: http://www.w3.org/TR/xptr-element/

XPointer xmlns() Scheme: http://www.w3.org/TR/xptr-xmlns/

XML Base: http://www.w3.org/TR/xmlbase/

XML 1.1 Specification, 2.10 White Space Handling: http://w3c.org/TR/2004/REC-xml11-20040204/#sec-white-space

XML 1.1 Specification, 2.12 Language Identification: http://www.w3.org/TR/2004/REC-xml11-20040204/#sec-lang-tag

ISO 639 Language Codes: http://www.w3.org/WAI/ER/IG/ert/iso639.htm

xml:id Version 1.0: http://www.w3.org/TR/xml-id/

XML Inclusions (XInclude) Version 1.0: http://www.w3.org/TR/xinclude/

MVP.XML Project: XInclude.NET Module: http://mvp-xml.sourceforge.net/xinclude/