Archive | Basic XML | XSL Transforms | Projects | About

XSL Identity Transforms

On this site, XSL means XSLT. Don’t be alarmed.

Many developers have a hard time getting started with XSL. One difficulty lies in the fact that XSL favors a recursive processing style. XML well-formedness guarantees that an XML document can be represented as a tree structure, and recursion is ideal for working with tree structures. Recursive thinking doesn’t come naturally to most people. You have to work hard to “get it.” Perhaps the same is true for XSL.

XML is an increasingly fundamental part of the technology landscape. XSL is a powerful way to manipulate XML and developers should be familiar with such a useful tool. Confidence in transforming XML documents with XSL is as important to a developer’s career as confidence in querying relational databases with SQL. This essay aims to show how powerful and elegant XSL’s recursive approach to transforming XML documents can be. The variations on simple identity transforms presented here embrace recursion and may give you a new way of thinking about XSL.

Pull vs. Push

In the simplest XSL transforms, a single template like the one below pulls content from an XML document into the transform output:

1 |<NameTag xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
2 |   xsl:version="1.0 ">
3 |   My name is:
4 |   <b><xsl:value-of select="/Customer/Name" /></b>
5 |</NameTag>

Any literal content items within the template, like the “My name is:” text on line three and the <b> tags on line four, are simply copied to the output. The xsl:value-of expression within the <b> tags on line four pulls content from the source XML document into the result:

1 |<NameTag>
2 |   My name is:
3 |   <b>Sam Page</b>
4 |</NameTag>

A pull approach is very straightforward, but when pull-style templates get large, they are a mess to maintain. The pull approach is not suited to handling document-style XML input at all.

The following style sheet demonstrates a push approach:

1 |<xsl:stylesheet version="1.0"
2 |   xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
3 |
4 |   <!-- Customer template -->
5 |   <xsl:template match="/Customer">
6 |         <NameTag>
7 |               <xsl:apply-templates select="Name" />
8 |         </NameTag>
9 |   </xsl:template>
10|
11|   <!-- Name template -->
12|   <xsl:template match="Name">
13|         My name is:
14|         <b><xsl:value-of select="." /></b>
15|   </xsl:template>
16|
17|</xsl:stylesheet>
Visualize XSL
Apply-templates throws any node matching its select expression up into the air and the best matching template catches each node establishing a new context. Once the catching templates are finished and all thrown nodes have been caught, context returns to the caller and processing continues.

Processing begins with the Customer template. The XSL processor establishes a context within the source document at the root Customer element. The xsl:apply-templates method on line seven causes the Name template to match and the processor establishes a new context at the Customer element’s Name child element. Expressions within a template always evaluate relative to the current context. The context-changing methods, xsl:apply-templates and xsl:for-each, push the context around the source XML document. When a template is finished processing, control returns to the calling template (pop!) and context reverts to its previous state.

The xsl:call-template method does not change context and is used chiefly to encapsulate code in named templates much like a subroutine.

The benefits of the push approach are not necessarily apparent with such a simple example, but as transformation requirements get more complex, the push approach shines.

With pull processing, context stands in one place and you must reach throughout the XML input document for content. With push processing, context jumps around the XML input document at your direction allowing simple, local content selection. Understanding context is the key to understanding XSL.

The Identity Transform

An identity transform is a push processing style sheet that reproduces its input as its output. On the face of it, that doesn’t sound like a very useful transformation. But identity transforms provide the basis of a whole class of useful transformations.

The identity transform below shows a typical recursive implementation of an identity transform using XSL’s shallow-copy method xsl:copy:

1 |<?xml version="1.0" ?>
2 |<xsl:stylesheet version="1.0"
3 |   xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
4 |
5 |   <!-- IdentityTransform -->
6 |   <xsl:template match="/ | @* | node()">
7 |         <xsl:copy>
8 |               <xsl:apply-templates select="@* | node()" />
9 |         </xsl:copy>
10|   </xsl:template>
11|
12|</xsl:stylesheet>

This identity transform makes a depth-first traversal of the entire XML document, copying elements and attributes as it goes. It’s a slick little piece of code!

The following list summarizes the special XPath functions frequently used when writing identity transforms:

The node() function matches all the node types below it in the list above, that’s why our identity template is so spare.

Sometimes you’ll see identity transforms expressed with @*|* instead of @*|node(). Such templates drop comments and processing instructions but copy all elements and attributes. Text nodes happen to get picked up by a built-in XSL processor template for text nodes. Ever noticed how a style sheet gone astray tends to dump all the text to output? That’s the built-in text template at work.

The identity transform does not produce a byte-exact copy of its XML input. For example, it may expand closed elements to a pair of open and close elements or change white space depending on your XML toolset. It could also change the encoding of the document and expand entity references. But identity transforms do produce a copy that is semantically equivalent to its XML input.

Variations

Where the identity transform gets interesting is when you create additional special-case templates in your style sheet. During XSL processing, templates are assigned a match priority allowing only the most specific template to match. The XPath functions in the identity transform all have a relatively low match priority. If a more specific template match is found during the transform’s recursive walk of the XML document, that template is used instead of the identity template.

All of the following templates are additions to the base identity transform style sheet listed above.

You can remove a single element or prune an entire branch from an XML document with a single-line empty template added to your style sheet:

1 |<xsl:template match="header" />

Empty XSL templates eat XML content during an identity transform. Because the template above contains no further xsl:apply-templates calls, the recursion that led to the header element rewinds without continuing to the header element’s child nodes.

Renaming is fun with an identity transform. Renaming all the para elements to p elements in an XML document can be accomplished with an identity transform and the following template:

1 |<!-- Rename para to p -->
2 |<xsl:template match="para">
3 |   <p>
4 |         <xsl:apply-templates select="@* | node()" />
5 |   </p>
6 |</xsl:template>

Note the similarity to the base identity transform template. In this case, the xsl:copy command has been replaced with literal p elements. The xsl:apply-templates call continues the recursive walk among the p element’s child nodes. Renaming elements with an identity transform is a lot less work than the equivalent operation with the DOM API.

You can strip all attributes from a particular element by simply not including the @* in the recursive xsl:apply-templates call as line four shows below:

1 |<!-- Strip all attributes from Product elements -->
2 |<xsl:template match="Product">
3 |   <xsl:copy>
4 |         <xsl:apply-templates select="node()" />
5 |   </xsl:copy>
6 |</xsl:template>

Identity transforms can help you quickly build subsets of XML documents. In the following example, only customers in a certain zip code are copied to output:

1 |<!-- Copy only customers in the 90210 zip code -->
2 |<xsl:template match="Customer">
3 |   <xsl:if test="Address/ZipCode=’90210’">
4 |         <xsl:copy>
5 |               <xsl:apply-templates select="@* | node()" />
6 |         </xsl:copy>
7 |   </xsl:if>
8 |</xsl:template>

There are many ways to achieve this filtering without an identity transform, but as the filtering criteria become more complex, this approach has numerous advantages. By using xsl:choose constructs in place of the if above, much more complicated and more clearly expressed filtering can be achieved than with equivalent XPath expressions.

Creative variations on identity transforms allow XML documents to be entirely re-structured, sorted, grouped, flattened, expanded, filtered, decorated and otherwise transmogrified!

Attributes to Elements

A useful transform that’s very similar to the identity transform is one that converts attributes to elements. This simple transform has applications in documentation, validation, and code generation.

1 |<?xml version="1.0" ?>
2 |<xsl:stylesheet version="1.0"
3 |   xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
4 |
5 |   <!-- Elements -->
6 |   <xsl:template match="/ | node()">
7 |         <xsl:copy>
8 |               <xsl:apply-templates select="@* | node()" />
9 |         </xsl:copy>
10|   </xsl:template>
11|
12|   <!-- Attributes -->
13|   <xsl:template match="@*">
14|         <xsl:element name="attribute">
15|               <xsl:attribute name="name">
16|                     <xsl:value-of select="local-name()" />
17|               </xsl:attribute>
18|               <xsl:value-of select="." />
19|         </xsl:element>
20|   </xsl:template>
21|
22|</xsl:stylesheet>

If you were to expand the template above to include special rules for all the node types, you could create a style sheet that reveals the structure of an XML document as viewed from the XSL processor’s perspective. This is a good learning exercise.

Some other good learning exercises worth attempting are creating an XML representation of the Infoset of an XML document, pretty printing an XML document, or producing the C14N representation of an XML document. All of these transforms can be based on an identity transform. I’ll warn you that perfect results may not be attainable, but you can get close, and you will gain a better understanding of XSL and your XML toolset along the way. In the future, I’ll post my attempts at these transforms.

section break

Developers must embrace recursion in XSL transformations in order to really “get” what XSL is all about. The identity transform is the King of recursive XSL transforms. In just three statements, the basic identity transform can copy any XML document. With simple variations on the identity transform, you can make quite complex transformations of XML documents. This style of XSL programming is also integral to creating XSL transformation pipelines—the subject of the next essay.

References

XSL Transformations (XSLT) Version 1.0
http://www.w3.org/TR/xslt/