XML Application Design
A markup language created with XML is called an XML application. MathML, SOAP, XSL, SVG and XHTML are all XML applications. Application is a pretty lousy term considering its common software usage, but we’re stuck with it!
A number of XML technology components exist to make designing XML applications easier. All of these components arose out of recognizing common needs among many XML applications. Using the standard XML components in your XML application designs can save work and make things more accessible to a wider audience. This essay presents some guidelines for designing XML applications and gives an overview of the standard XML components.
Naming Conventions
One of your first considerations when designing an XML application is deciding how you’ll name the elements and attributes. At the very least, you’ll want to use a consistent naming convention. There’s not an industry consensus on naming conventions by any means, but favoring UpperCamel case for elements and lowerCamel case for attributes is a reasonable guideline based on recent XML standards development.
The following table summarizes naming conventions across a number of successful XML applications. Note that the XML applications get relatively older as you move to the right in the table.
Element Attribute |
UpperCamel lowerCamel |
lowerCamel lowerCamel |
lower-dash lower-dash |
Application | SOAP XAML WS-* |
XML Schema RelaxNG WSDL SVG |
XSL DocBook ebXML MathML XInclude XLink |
When you choose your naming style, you need to be aware of the context of your XML application. If your XML application works with applications like XSL or DocBook, then it may be advantageous to copy their lower-dash style for your names. CamelCase is used in most web services-related technologies. If you don’t want your WS-Swan specification looking like a WS-UglyDuckling, then you’re best off choosing CamelCase.
Verbose XML Markup
XML is often criticized for being bulky and verbose. Long element names don’t help this situation and it’s not uncommon to have your markup outweigh your data! But don't unduly worry about the length of element and attribute names in your XML. Clarity of expression should be your primary goal when selecting names. Favor the use of terms from your XML application’s business and problem domains.
Let’s consider the potential performance impacts of bulky XML markup—you’ll find it’s not that bad. The lengths of names don’t necessarily affect the memory footprint of a DOM. Name tables behind the scenes in DOM implementations likely take care of keeping memory use down. XML document file sizes will be larger on disk if commonly used names are lengthy, but disk space is practically unlimited these days anyway.
The amount of memory on modern machines allows you to get away with holding fairly large XML documents in memory, but you need to always be mindful of memory usage. Batch processing large XML documents need not consume a lot of memory. Hardly any memory is consumed with a SAX approach.
Transferring large XML documents over networks may be cause for concern, but it’s a concern that can usually be addressed with compression. Text compression algorithms take advantage of repeating runs of characters and XML markup is full of repeating runs of characters. XML compresses well. A number of software and hardware products exist for compressing XML over networks.
Besides the physical impact of bulky XML is
the emotional impact of bulky XML. But don’t let the bulk of XML
markup turn you off from using it! When you initially learn XSL
transforms, for example, the verbosity of the markup is quite
frustrating to read and write. The amount of typing required
setting up a single template with a xsl:choose
structure or just
calling a template with a few parameters is considerable. Compared
to traditional programming languages XSL is downright fat. However,
after becoming familiar with XSL, you learn to not even see the
bulky names of the XSL elements. In fact, you’ll come to appreciate
the economy of expression XSL has for processing XML.
The XML editing tools have reached a level of maturity that alleviates much of the human burden of bulky XML. XML editors with automatic tag completion are commonplace. Some XML editors can leverage your DTDs or XML schemas to make editing a breeze.

Elements and Attributes
Favor elements in your XML applications because they are more open to change over time. You will very seldom create an XML application correctly the first time, so flexibility is important in your design.
The following movie catalog sample makes the case for preferring elements to attributes:
1 |<?xml version="1.0" ?> 2 |<Catalog> 3 | <Movie genre="Comedy"> 4 | <Title>Young Frankenstein</Title> 5 | </Movie> 6 |</Catalog>
A genre value must be selected from a fixed
list of genres for each Movie
element in your movie
catalog. Suppose the need arises to assign multiple genres to a
single movie but you didn’t allow for this in your original design.
Because you can’t have multiple attributes with the same name for a
single element, a genre
attribute would limit you to
one genre per movie. Sure, you could always choose to delimit a
list of values within a single genre
attribute, but your data becomes less structured and more difficult
to manipulate if you do.
As the following sample shows, one or more
child Genre
elements for each
Movie
element would provide a more
open design than the single attribute approach would:
1 |<?xml version="1.0" ?> 2 |<Catalog> 3 | <Movie> 4 | <Title>Young Frankenstein</Title> 5 | <Genre>Comedy</Genre> 6 | <Genre>Seasonal-Halloween</Genre> 7 | </Movie> 8 |</Catalog>
For a more sophisticated design, you may even
choose to provide a container element for genres that groups
the child Genre
elements.
1 |<?xml version="1.0" ?> 2 |<Catalog> 3 | <Movie> 4 | <Title>Young Frankenstein</Title> 5 | <Genres> 6 | <Genre>Comedy</Genre> 7 | <Genre>Seasonal-Halloween</Genre> 8 | </Genres> 9 | </Movie> 10|</Catalog>
Code and validation schemes are often simplified by the use of container elements. It’s common to write code that filters or extracts portions of an XML document during processing and container elements provide a convenient hook for such operations.
Appropriate Metadata
Units of measure, currency type, and other data types often need to be added to XML applications as metadata. It can be difficult to decide what metadata to include in your XML application. As a general guideline, use metadata sparingly and only when it makes an important contribution to the interpretation of the data.
Having a
Price
element that includes a
currency=”USD”
attribute is
certainly a textbook-ready example of metadata, but such metadata
is only appropriate when you’ll be mixing currency types in the
XML. If you’ll be infrequently using non-USD currency, make the
attribute have a default value of ”USD”
in
your XML Schema or DTD so its use is optional in the general
case.
Some metadata is inherent in the structure of an XML document and therefore should not be included in your XML applications. The order and count of elements and the parent/child relationship between nested elements, for example, does not need to be repeated in element metadata.
The coding cost of maintaining additional metadata that could be inferred from the structure of the XML document can be high. Imagine a case where you’re maintaining both the total count and ordinal position of each item in a list of items. Under this scenario, any insert or delete from the list requires not only the simple operation on the new or deleted node, but also the updating of both the total count and any ordinal positions for all subsequent items. That’s a bit of work not to be undertaken lightly!
Validation Issues
Consider validation during design. Some markup language constructs can be a challenge to validate, especially with XML Schema. It’s helpful to be familiar with common usage patterns of the validation mechanisms you’re going to use when designing your markup language.
All the validation methods can be, for lack of
a better word, relaxed, by writing them in less restrictive
ways. Our List
element could simply be
declared as having mixed content for example. Unfortunately, you’d
likely get none of the benefits of validation under this usage
scenario.
To avoid surprises, it’s often best to design your markup language by writing the validation code first in an iterative fashion with prototype XML test data. Use the validation code as your design tool. If you wait to consider validation at the end of your design, you risk running into frustrating validation traps.
A frequent validation design problem involves ID values. You must be aware of the simple value limitations placed on IDs.
- IDs must begin with an alphabetic character or with an underscore.
- IDs may not begin with a number.
- IDs must be unique within the entire XML document.
The numeric restriction on IDs is the most common trap, because good unique numeric IDs are often available, especially when you’re taking data from databases. To take advantage of the numeric IDs, you either have to declare the value not to be an ID type in your schema, losing some validation power, or to prefix the numeric value with an acceptable character.
Also beware duplicate IDs from disparate elements like matching customer and category IDs of “C01”. If you have bad IDs, validating parsers will barf errors on you. The ID constraint checking performed by validating parsers is a valuable tool; so don’t shy away from ID and IDREF usage in your designs.

Building Blocks
The remainder of this essay introduces the standard XML components available for use in your XML application designs.
XInclude
XInclude provides an element-based mechanism for pulling content into an XML document. An XML parser that supports XInclude processing will read XInclude directives while parsing and process the included content as if it were part of the source document. Here’s what the XInclude element looks like in action:
1 |<?xml version="1.0" ?> 2 |<List name="Fruit List" 3 | xmlns:xi="http://www.w3.org/2001/XInclude"> 4 | <xi:include href="items.xml"/> 5 |</List>
Some XML applications that pre-date the
XInclude specification developed custom include mechanisms, XSL’s
xsl:include
element for example.
Because XInclude processing occurs in the parser, upstream from
your XML application processing, as long as your parser supports
XInclude, your application can take
advantage of XInclude without any additional coding
on your part.
XInclude can pull in text or XML. When including XML, you may be able to further use the mechanisms provided by XPointer to pull in just about any subset of an XML document.
XLink
XLink provides an attribute-based resource linking mechanism. You can use XLink to create simple links that work like HTML anchor tags. XLink also allows for more sophisticated bi-directional linking or even graph representation. Some graph structures aren’t easy to represent in XML, XLink has a decent and thorough approach to representing graph structures. It’s XLink’s approach to linking that’s valuable to your XML applications, not just a common set of link attribute names. Here’s what simple XLink attributes look like:
1 |<?xml version="1.0" ?> 2 |<List name="Fruit List" 3 | xmlns:xlink="http://www.w3.org/1999/XLink"> 4 | <item xlink:href="http://www.apple.com/">Apple</Item> 5 |</List>
Many XML applications in the wild make use of XLink, including SVG and DocBook. If you’re a savvy XML application builder and you have link-like things to do, you ought to consider supporting XLink linking.
One nice thing about linking being
attribute-based is that you can have any element carry link
information. If you have an XML application representing shapes in
a diagram, then each shape
element can carry link
information using XLink. An example of where more sophisticated
links might be useful is in a UML model represented as XML. You
could use XLink attributes to add labels describing the nature,
direction, or other properties of a link between diagram
components.
XPointer
XPointer provides a language for addressing structures within an XML document. XPointer was designed to be used in conjunction with XLink and XInclude for tasks beyond simple URI linking. XPointer can address individual characters within an XML document and can be used to form text or element selection ranges.
The XPointer specification is divided into several parts that independently define different kinds of target addressing called XPointer Schemes. The XPointer Framework recommendation establishes how XPointer Schemes should be implemented to participate in XPointer expressions. Multiple schemes may be combined to make up the content of a single XPointer expression. Here’s an overview of several XPointer Schemes:
The XPointer xpointer() Scheme provides for
XPath-based addressing. The expression xpointer(/List/Item[2])
would
identify the second Item
element in a List
. The
xpointer() Scheme provides extension functions to basic XPath
expressions that allow for additional types of range
selections.
The XPointer
xmlns() Scheme allows for namespace prefix mapping in XPointer
expressions. The expression xmlns(lh=http://liquidhub.com/SimpleList)
maps the lh
prefix to a namespace URI. In
the previous XML Namespaces essay we discussed why
namespace-to-prefix mapping is important. The xpointer() and
xmlns() schemes must be used together to properly handle
namespaces.
The XPointer
element() Scheme provides for a funky XML ID and position-based
element addressing. Here are three samples that should give you a
feel for element() Scheme addressing: element(targetID),
element(/1/2), element(targetID/2)
. The last sample
addresses the second child element of an element with an ID equal
to “targetID”. ID addressing only works in a validating parser
context where attributes can be identified to be of type ID.
xml:base, xml:space, xml:lang, xml:id
Another kind of standard building block are
the xml:*
attributes. These attributes
can be used on any elements in your XML application and are part of
the XML http://www.w3.org/XML/1998/namespaces
namespace. Unlike the other extension components, the XML namespace
does not need to be explicitly declared when using these
attributes, though the attributes themselves need to be declared in
your XML Schema or DTD for validation.
The xml:base
attribute works similarly
to HTML’s BASE
tag, it establishes a context
for relative URI resolution. The xml:base
attribute affects XLink relative URI references as well.
In the XML White Space essay we discussed how
the xml:space
attribute can be used on
any element to describe how white space should be handled within
that element. The value must be either preserve
or default
and acts as a hint to the
XML parser on how to handle white space.
XML applications containing resources for
multiple languages can take advantage of the xml:lang
attribute. This attribute’s values are taken from the ISO 639
standard country codes, xml:lang=”en-US”
. Together with
XML’s Unicode support, the xml:lang
attribute adds to XML’s
strength for internationalization.
The xml:id attribute is simply a proposed common name for ID type attributes. Using xml:id as a standard ID type attribute would enable ID behavior and restrictions outside of a validating parser context. XLink, XPointer, and XInclude could all use ID references on well-formed XML without requiring DTD or Schema validation. The xml:id specification was still in the standards pipeline when this essay was written.

Designing an XML application requires the same skills as designing class libraries or relational database models. Well-designed XML applications follow established patterns, use common components, and have meaningful names for things that correspond to the business or problem domain.
References
- XML Linking Language (XLink) Version 1.0
- http://www.w3.org/TR/xlink/
- XPointer Framework
- http://www.w3.org/TR/xptr-framework/
- XPointer xpointer() Scheme
- http://www.w3.org/TR/xptr-xpointer/
- XPointer element() Scheme
- http://www.w3.org/TR/xptr-element/
- XPointer xmlns() Scheme
- http://www.w3.org/TR/xptr-xmlns/
- XML Base
- http://www.w3.org/TR/xmlbase/
- XML 1.1 Specification, 2.10 White Space Handling
- http://w3c.org/TR/2004/REC-xml11-20040204/#sec-white-space
- XML 1.1 Specification, 2.12 Language Identification
- http://www.w3.org/TR/2004/REC-xml11-20040204/#sec-lang-tag
- ISO 639 Language Codes
- http://www.w3.org/WAI/ER/IG/ert/iso639.htm
- xml:id Version 1.0
- http://www.w3.org/TR/xml-id/
- XML Inclusions (XInclude) Version 1.0
- http://www.w3.org/TR/xinclude/
- MVP.XML Project: XInclude.NET Module
- http://mvp-xml.sourceforge.net/xinclude/