AxKit -- integrated XML processing solution

Jan Pazdziora

Scheduled: Iterative Software room on Friday at 16:00-16:20 (paper # 35)

Abstract: AxKit is a package integrating flexible XML processing management into Apache/mod_perl environment. It comes with wide range of configuration options and programming hooks that allow fine tuning of processes that will be run on input XML data. Originally written and maintained by Matt Sergeant, it adopts quickly to changing world of XML tools and adds supports for most of popular ones.

This presentation will introduce the AxKit architecture and steps that are done during a data processing. Simple setups will be shown, but focus will be put on explaining concepts of various stages and hooks. Overview of available processors (XSLT, XPS) and trends will be mentioned.


XML is a good format

XML (Extensible Markup Language) is

It can be used to store structured information.

But people prefer other formats for their daily work, namely HTML or PDF.

XML can be transformed into other formats, using many ways. Usually the input XML, some transforming engine and a stylesheet are involved.

Let's store data in XML and process it into other formats on demand.

AxKit for XML transformations and delivery

Written by Matt Sergeant, it makes on-the-fly XML transformation easy. Easier.

It supports a lot of transforming and generation engines, LibXSLT and Sablotron for XSLT (Extensible Stylesheet Language Transformations), XPathScript for transformations written in Perl in more or less declarative way, or XSP for XML generation with code embedded into XML source and with extensions in form of taglibs, to name a few.

It also bundles other utility functions, like charset transformation on output or gzipping the output.

Its main purpose is to provide framework in which order of transformations and type of processing can be configured without programming, using configuration options or processing instructions.

Let's start delivering XML data

AxKit works with Apache in mod_perl environment, so these better be working first, and a couple of other modules will be required during installation. Using perl -MCPAN -e shell will try to fetch the prerequisites but some adjustment may be needed, especially if external libraries are installed in nonstandard places.

AxKit is the enabled and configured using

	PerlModule AxKit
in httpd.conf which loads the whole thing. For certain files, directories or locations, AxKit is then enabled using
	SetHandler perl-script
	PerlHandler AxKit
and we'd better also say what transformation engine should be applied for certain style:
	AxAddStyleMap text/xslt Apache::AxKit::Language::LibXSLT

If our source XML file contains processing instruction

	
AxKit will know that the LibXSLT library should be invoked to do the transformation, using the stylesheet file.xslt.

Many configuration options

AxKit comes with large number of configuration options that can be used to alter its transformation behavior. We can use httpd.conf file to say what processors and what transformations should be run, based on URI or media of the request or DOCTYPE, DTD or other features of the source XML file.

Should available configuration options fail to provide enough flexibility for our setup, we can always resort to extending AxKit with our modules, that will be deployed by proper configuration.

Steps in request processing

  1. Request comes to Apache, it recognizes that it should be handled by AxKit.
  2. AxKit is invoked and the request object passed to it.
  3. Default environment is set for the request, including media and style.
  4. Processing instructions from the source XML are fetched.
  5. Processors are run in proper order.
  6. Final actions (like character set conversions) are done on the output and that is sent back to client.
AxKit handles caching of transformation results so if the requested series of transformations was done before, stored results is passed back to client. You can switch off the caching feature.

As more processors can be run on one source to achieve step-by-step transformation, care must be taken when producing the output. If the next processor expects to work on well formed XML, the previous one has to provide it. On the other hand, with AxKit it is possible to put into the chain transformations that do not work on XML at all and are pure text processing done on text stream.

XSLT and XPathScript transformations

XSLT is a W3C standard for transformation of XML documents. Stylesheets contain templates for elements, attributes and other parts of input XML, and output document mixed with other elements of XSLT transformation that can process the input. External processors, Sablotron or LibXSLT, are called to do the actual work..

XPathScript includes Perl code inside of output markup, and this code can declaratively specify actions that should be done with subdocuments of the input XML, based on element names.

XSP for XML generation

XSP is templating scheme that uses special XML element to include dynamic content into static markup. Inside element <xsp:expr>, any arbitrary Perl code can be included. Special sets of elements, defined by taglibs, can be used to use functionality from external modules, while avoiding Perl code in XSP pages.

And it's written in Perl

AxKit is written in Perl and runs in mod_perl environment. Customized modules can be added to extend its functionality both to support more transformers or data producers, and to do specialized fine-tuning tasks. Modules are run under mod_perl which gives them full access to Apache's internals.

Already now the choice of tools bundled with AxKit is wide and Matt is planning for more. From DocBook presentation of source XML documents to cascades of transformations of dynamic content, the framework is worth to be considered.