XSL Pipeline Processing
Pipeline processing is a powerful XSL programming technique that leads to programs that are much easier to maintain and enhance. Using a series of simple XSL transforms chained together in series, complex transformations can be achieved. This essay demonstrates the value of a pipeline processing approach along with some implementation specifics.
Developers familiar with the power of pipeline operations central to the UNIX operating system know how simple, modular tools can be chained together to accomplish a wide variety of complex tasks.
XSL pipelines offer the same advantage for XML transformation. Where UNIX pipelines are based around standard input and output of lines of text, XSL pipelines rely on the structure of well-formed XML between stages.
The Ideal Transform Rule
Sometimes the XML you need to transform may not be suited to producing the output you’re trying to produce. Sometimes the output you’re trying to produce is quite complex in its own right. In these situations, it’s advantageous to break a transform into two stages. The first stage produces an “ideal input” for the second stage. To paraphrase Einstein, the second stage therefore becomes “as simple as possible, and no simpler.”
There are many reasons an XML input may not be ideal. Data pulled from legacy systems or databases may have an awful structure or an antiquated naming convention that makes your code difficult to understand. It’s not uncommon to have many processes share a large XML structure with each process only requiring a small subset of the data. A pre-processing XSL transform can eliminate these problems with ugly XML. Never deal with ugly XML!
Work from ideal input when writing complex style sheets.
In a two-stage transform, the first stage is usually simple because it only deals with the restructuring move from input to ideal. The second stage is simple because you’ve tailored the ideal input for its operation with the first stage. Both transforms benefit from not trying to accomplish both restructuring and final output at once. Simple transforms are desirable because they’re easier to write, understand, and maintain.
Multi-stage transforms can be assembled as batch files, through API calls, or by a single XSL style sheet using an intermediate result tree fragment. The following example illustrates the use of a result tree fragment stored in a variable:
1 |<xsl:stylesheet version="1.0" 2 | xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 3 | xmlns:msxsl="urn:schemas-microsoft-com:xslt"> 4 | 5 | <!-- Stage One Result Tree Fragment --> 6 | <xsl:variable name="sorted-names"> 7 | <SortedNames> 8 | <xsl:apply-templates select="//Name" mode="s1"> 9 | <xsl:sort select="." /> 10| </xsl:apply-templates> 11| </SortedNames> 12| </xsl:template> 13| 14| <!-- Stage One Name Template --> 15| <xsl:template match="Name" mode="s1"> 16| <xsl:copy-of select="." /> 17| </xsl:template> 18| 19| <!-- Stage Two Root Template --> 20| <xsl:template match="/"> 21| <xsl:apply-templates 22| select="msxsl:node-set($sorted-names)//Name" /> 23| </xsl:template> 24| 25| <!-- Stage Two Name Template --> 26| <xsl:template match="Name"> 27| My name is: 28| <b><xsl:value-of select="." /></b><br /> 29| </xsl:template> 30| 31|</xsl:stylesheet>
In the example
above, an extract containing only sorted
elements is created as a first stage, then a set of HTML-formatted
name labels are produced from the extract in the second stage. This
sample is not sufficiently complex to demonstrate the benefits of
multiple stage transforms in general, but it does demonstrate the
use of the result tree fragment mechanism.
When working with result tree fragments, you need to use an extension function provided by your XSL processor. The inability to work with result tree fragments was an oversight in the XSL 1.0 specification. All the major XSL processor implementations have created extension functions to handle result tree fragments because the feature is too useful to ignore. In XSL 2.0, result tree fragments will be handled automatically without an extension function.
This sample shows the Microsoft XML toolset result tree fragment solution in particular, but the other implementations are very similar.
Use xsl:copy-of to put the contents of a variable containing a result tree fragment into your output. Wrap the call in xsl:comment tags to separate the output from the rest of your transform if need be.
To use an
extension function, first you must include the extension function
namespace in your style sheet. The
xmlns:msxsl namespace declaration
on line three in the sample above accomplishes this. The extension
msxsl:node-set(), as seen on line
22, is only available when the extension namespace has been
node-set() function establishes a
context within the result tree fragment during processing
instead of in the input XML document.
Any number of result tree fragments may be created and processed in a single style sheet. Multiple style sheets can always be combined into a single style sheet using result tree fragments. However, this technique should be used sparingly for XSL pipeline processing because combined style sheets are often considerably more complex and therefore less maintainable.
The sample’s use
mode attribute when building the
result tree fragment is not entirely necessary, but it’s often
helpful. Modes segregate templates with match patterns that would
otherwise conflict during processing. When mode is changed during
processing, only templates in the current mode match.
Modes are used to make multiple passes over an XML document producing different outputs. For example, a single style sheet may produce both a table of contents and the body of a report in two passes over the body of the report.
multi-stage sample above, mode is used to create the
$sorted-names variable containing
the result tree fragment beginning on line five. The output of the
templates matched in the
s1 mode is accumulated in the
variable as a result tree fragment. The
is entered and exited within the xsl:variable tags via the
xsl:apply-templates call with the
attribute on line eight.
xsl:apply-templates call on line
21, in default mode, moves processing context to the result tree
fragment, allowing the stage two name template to match.
XSL pipelines are powerful because they are easily extended to accommodate additional functionality. Consider the following simple pipeline:
A dataset is
extracted from a database as
Dataset.xml. This XML is
transformed into an intermediate table XML format by
that decorates the data with column headers, alignment and other
formatting hints specific to the display of this dataset. Finally,
the generic HtmlTable.xsl style sheet produces an HTML table from
the intermediate table XML. The wisdom of the intermediate table
XML format will be revealed shortly.
When the dataset gets large, it’s natural to want to add paging and sorting to the implementation. With a pipeline approach, this simply means inserting some additional stages into the pipeline:
Both the sorting and paging style sheets need parameters. Sort needs a column name and direction, and page needs a page size and page number. How you provide these parameters is up to your specific implementation, but parameterized transforms are a typical component of XSL pipelines.
The Sort.xsl and Page.xsl style sheets are written against and produce the intermediate table XML format. This makes the style sheets more modular and reusable. AHA! By sharing an intermediate format we get three reusable style sheets out of this pipeline implementation. Pipelines like this one are a valuable addition to the developer’s toolkit you bring to every project.
The pipeline stage style sheets are typically based on the identity transform. Stages may change the structure of the data, filter the data, or decorate the data by adding elements or attributes. Variations on identity transforms keep the stages simple.
Imagine how easy it would be to add another stage to this pipeline that flags rows meeting a certain criteria with a highlight or checkmark attribute. Such a style sheet could form the basis for searching or selecting rows in the result set for other operations.
When developing pipelines, a key performance guideline is to create the smallest subset of the XML document as early as possible in the pipeline. For example, if a filter is going to select only five out of a hundred records, then that filter ought to be as early in the pipeline as possible. By reducing the size of the XML flowing through the pipeline, performance can be improved all around.
Pipelines may become slow for a variety of reasons including heavy usage, excessively large XML, or poorly written stages. In general, you will be surprised at how well a pipeline approach performs in practice. But if you do encounter performance problems with pipelines, you’ll find they are well structured for optimization.
Simple timings reveal which pipeline stages are running slow. Consider rewriting slow single stages as DOM operations. DOM operations are more work but can lead to big gains for certain kinds of transforms. Also consider combining similar stages into a single transform if the complexity doesn’t become unreasonable.
Inefficient XPath expressions or bad style sheet processing flow are another common performance problem. Taking advantage of keys and caching intermediate results in variables are helpful XSL performance improvement techniques. Future essays will be devoted to XSL performance.
Microsoft XSL Processing Pipelines
Prefer the read-only
XPathDocument class in your
pipeline implementations. Load and transform operations are much
There are many
ways to implement an XSL transform pipeline with the Microsoft.NET
XML services. Use
XmlWriter-based classes for IO,
XPathDocument classes as a
transform source, and the
XslTransform class to perform the
The diagram below illustrates the data flows between the .NET XML classes commonly used for pipelines:
Note the following
features indicated by the flows:
The following C# code fragment shows how to implement a pipeline:
1 | // load the input document and style sheets 2 |XPathDocument docIn = new XPathDocument( "list.xml" ); 3 |XslTransform xslStageA = new XslTransform( ); 4 |xslStageA.Load( "a.xsl" ); 5 |XslTransform xslStageB = new XslTransform( ); 6 |xslStageB.Load( "b.xsl" ); 7 |XslTransform xslStageC = new XslTransform( ); 8 |xslStageC.Load( "c.xsl" ); 9 |XmlUrlResolver res = new XmlUrlResolver( ); 10| 11|// three stage pipeline, null XsltArgumentList 12|XmlReader xpipe; 13|xpipe = xslStageA.Transform( docIn, null, res ); 14|docIn = new XPathDocument( xpipe ); 15| 16|xpipe = xslStageB.Transform( docIn, null, res ); 17|docIn = new XPathDocument( xpipe ); 18| 19|XmlTextWriter docOut = new 20| XmlTextWriter( "out.xml", System.Text.Encoding.UTF8 ); 21|xslStageC.Transform( docIn, null, docOut, res );
xpipe variable references the
created by the each call to
Transform. I’ve found that letting
XslTransform class handle the
creation of the
XmlReader performs well, though I
haven’t benchmarked this against a user-managed
result is loaded from the
XmlReader into a new
for each stage. The last stage targets an
to send the output of the transform directly to a text file. With
ASP.NET you may choose to target the HTTP output stream associated
with your page response if you’re creating HTML.
This sample pipeline implementation is unfortunately rather dumbed-down. Error handling, a parameter facility, and a set of classes to encapsulate pipeline functionality are beyond the scope of what I wanted to include here. The XslPipe project will provide a robust .NET pipeline implementation.
An XSL pipeline processing approach has considerable advantages. With changing business requirements, pipeline processing enables stages to be modified or added as new features are requested. It’s easier to change an XSL file than to change code and recompile an application.
Development of a pipeline can proceed incrementally, adding stages and delivering functionality in an iterative process typical of modern project lifecycle methodologies. The project technical lead can stub out a pipeline with identity transforms for each stage early in the project, allowing developers to flesh out the stages during development. For project managers, pipeline stages also provide a natural partitioning of tasks among a team of developers.
In future essays, pipelines will be used for a variety of code generation tasks and XSL demonstrations. The XslPipe project will also implement a pipeline processor with an accompanying pipeline specification language.
In the meantime, two Java XSL pipeline projects worth checking out are Norman Walsh’s SXPipe and the Apache Cocoon project. Cocoon allows for very sophisticated XML pipelines, including non-XSL generator stages that produce XML from databases and web service requests. Build your own pipeline system with batch files and start playing with pipelines!
- Microsoft.NET Framework System.Xml Reference
- Microsoft.NET Framework System.Xml.XPath Reference
- SXPipe: Simple XML Pipelines
- Apache Cocoon Project