Feeds

Hands on with Java XML filter pipelines

Ignored by many

Boost IT visibility and business value

Setting up a pipeline

Once you have your filter set up and compiled, you need to create a pipeline for processing your XML. This should move from input XML document to the filter to the reader. You may even have multiple filters, stacked upon each other. As long as input comes first, and your reader (with application-specific callbacks) comes last, things work fine.

The following listings show how to set up a program to use filters. To do this we have defined a simple XML application (called SimpleXMLApplication). This is a standard SAX Content Handler (it extends the DefaultHandler to obtain the default behaviour). This application merely echoes, to the standard output, the XML passed to it (with suitable indentation).

sax content handler

The XMLFilterTest test harness class links the SimpleXMLFilter and the SimpleXMLAPplication together. Notice that because the one or more filters must sit between input source and the reader, all the operations that you would normally invoke on the reader are invoked on the filter. It then delegates any data that passes through the filter to the reader.

p>java pipeline simple application XMLFilterTest

Also note that we obtain the XMLReader form the root parse and set that as the "parent" of the filter. We then link the filter with the application by making the application the content and document handler of the filter.

The effect of running the XMLFilterTest on a simple XML document is presented below:

effect of running the XMLFilterTest

Data Pollution

However, a word of warning if you use a filter to remove some elements form the data input to the next element in the pipeline. It is all too easy to pollute the data being sent on. For example, consider the case where you don't delegate in the startElement() method for certain data, but forget to do the same in endElement(). The result would be that some elements would never be reported as starting, but would be reported to the reader as ending. This would cause, in the best case, program errors, and in the worst case, data loss or corruption in your application.

Real world use

As an example of a tool that makes extensive use of pipelines of filters, consider the DeltaXML XML diff tool. This uses XML filters to pre and post process XML data before performing XML comparisons, synchronization operations and patches. To download a time limited copy of DeltaXML see the DeltaXML web site here. ®

The essential guide to IT transformation

More from The Register

next story
Munich considers dumping Linux for ... GULP ... Windows!
Give a penguinista a hug, the Outlook's not good for open source's poster child
The Return of BSOD: Does ANYONE trust Microsoft patches?
Sysadmins, you're either fighting fires or seen as incompetents now
Intel's Raspberry Pi rival Galileo can now run Windows
Behold the Internet of Things. Wintel Things
Microsoft cries UNINSTALL in the wake of Blue Screens of Death™
Cache crash causes contained choloric calamity
Eat up Martha! Microsoft slings handwriting recog into OneNote on Android
Freehand input on non-Windows kit for the first time
Time to move away from Windows 7 ... whoa, whoa, who said anything about Windows 8?
Start migrating now to avoid another XPocalypse – Gartner
You'll find Yoda at the back of every IT conference
The piss always taking is he. Bastard the.
prev story

Whitepapers

5 things you didn’t know about cloud backup
IT departments are embracing cloud backup, but there’s a lot you need to know before choosing a service provider. Learn all the critical things you need to know.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Build a business case: developing custom apps
Learn how to maximize the value of custom applications by accelerating and simplifying their development.
Rethinking backup and recovery in the modern data center
Combining intelligence, operational analytics, and automation to enable efficient, data-driven IT organizations using the HP ABR approach.
Next gen security for virtualised datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.