Hands on with Java XML filter pipelines

Ignored by many

In this tutorial we will step through this little understood area of XML processing. Java XML Filters are part of the SAX API for XML processing.

Java XML filters are objects that can filter (or process) data within a pipeline of filters, wrapped around a core XML processing application. The SAX API is a standard part of the JAXP (Java API for XML Processing), which has itself been provided as part of the Java environment since Java 2 SDK 1.4.0 and can be used in a wide range of XML processing applications.

In the remainder of this tutorial we will first introduce the concept of Java XML filters and examine how a generic Java XML Filter can be written.

The XMLFilter interface

Although the XMLFilter interface is part of the SAX API for processing XML documents, it's ignored by many Java XML tutorials and overlooked by most Java developers. An XMLFilter is a sub-interface of the XMLReader class; as such it is very like the XMLReader except that it obtains its events from another XML reader rather than a primary source like an XML document, file or database. As such, it is a primary component in the JAXP pipeline architecture. That is, it can sit within a pipeline, receiving XML data from another XML processing element and passing the results of its processing onto another XML processing element if required.

Assuming you have a distribution of SAX, or a version of Java containing the XML APIs, you can look at the included classes; the one you want is org.xml.sax.XMLFilter. You should also ensure that you have the SAX helper classes, found in the org.xml.sax.helpers; package. In that package, you will want to focus on the org.xml.sax.helpers.XMLFilterImpl class.

The XMLFilter interface methods

If you examine the XMLFilter, you'll find that it extends the org.xml.sax.XMLReader interface, and adds two new methods:

1. public void setParent(XMLReader parent); this method allows the application to link the filter to a parent reader (which may be another filter). The argument may not be null.

2. public XMLReader getParent(); this method allows the application to query the parent reader (which may be another filter). It is generally a bad idea to perform any operations on the parent reader directly: they should all pass through this filter.

This probably doesn't look like much; however it is very significant as the ability to link one filter to another (via the setParent method) allows the "pipeline" to be constructed. Of course, you also get all the other XMLReader methods such as startElement(), endElement(), etc. In each of these methods, you can operate upon the input XML data before an application, or the next filter in the pipeline, gets to it.

This means that in the following diagram you get to modify data before an application gets that data using a standard XML framework. Note that as you have all of the SAX callback methods that an XMLReader does, you can work with the elements, the attributes, the prefix mappings, and anything else that SAX can work with.

Filter data for an XML Application

The key here is that the XML form the source, can be filtered (pre-processed) before being accessed by the application, without modification to that application, and in a standard manner. Essentially, the first filter reads the XML data, processes it if necessary and pushes it onto the next filter. Filter 3 then does the same and pushes it onto the XML application. Of course you could do the same thing by writing your own custom XML pre-processors, the point here is that a standard framework is available within which to do this.

Sponsored: Network DDoS protection