XML shows promise, but …
Don't underestimate its problems
Extensible Markup Language (XML) is everywhere, writes Steven J. Vaughan-Nichols. It's the basis of the new middleware, Web Services. It's the format of choice for both Microsoft's new Office products and .NET services. And every Java vendor worth their salt is working with it. Indeed, IBM's program director for emerging e-business standards group, Steve Holbrook, says, "XML is the most important protocol since HTML."
Strong words, but he's not the only one who sees it that way. XML use is growing, fast. Gartner predicts that the amount of XML data in corporations will grow from about 2 percent in 2000 to 60 percent by 2004. Exact numbers are hard to come by though, since, as Ronald Schmelzer, analyst for ZapThink, an XML research house, says, "XML is so persuasive that it's already everywhere. Eventually, it will even be in dishwashers."
Gartner analysts say that three broad business applications are driving XML usage: enterprise application integration (EAI), extranet data interchange, and Web services. It drives these applications because XML provides developers with an easy-to-use, universal middleware for network applications. Universal data interchange has long been the Holy Grail for network programmers and XML looks to finally deliver it.
Alan Zeichick, editor in chief of SD Times, a software development newspaper, explains that XML has become "increasingly attractive as the first serious cross-platform and cross-application document format. The use of XML as a common denominator, when coupled with its self-describing attributes, makes it irresistible for solving new data-sharing problems. The only real competitor to XML is SQL, and when coupled together (i.e., SQL query, XML results), it's wonderful."
How wonderful is it really?
But according to some, it's not wonderful at all. XML and its close relatives Web Services come with three major problems: fat, bandwidth-eating formats, poor security, and high server loads.
All these issues are addressable, but don't fool yourself. XML may easy to write, but it could prove hard, very hard, to deploy successfully.
Many industry figures, like Steve Vinoski, Iona's chief architect and vice president of platform technologies, are worried about Web services performance. The main reason is that XML, unlike less popular, older binary-based interoperability standards like CORBA's Internet Inter-ORB Protocol (IIOP) and Microsoft's Distributed Component Object Model DCOM), is text-based, which means that data transmitted in XML format is bulky.
How much bulkier than binary? Schmelzer says XML formatting can fatten up a file by as much as 10 to 20 times. That's a lot of bytes to transfer even at Gigabit Ethernet speeds.
And that's before the Extensible Style Language (XSL) baggage is included. All XML data is transmitted with an XSL file. This file, according to David A. E. Wall, chief software architect for Yozons, "specifies how to transform the data you provided into the corresponding HTML. Arguably, the XSL will also be bigger than the HTML you show because of the statements that select the corresponding data elements from the XML, so both the XML and the XSL individually will be bigger than the original HTML alone. Of course, the XML data is much more useful to a program than the HTML, but the HTML is much more useful to a human looking at the data."
The size problem doesn't stop there. Wall goes on to say that, if your XML data format -- not the content -- is to be validated, then you will need a document type definition (DTD), and if you want additional validation capabilities, you need an XML Schema file. These must also be accessible in order to validate the data format, again increasing the amount of information to be transferred with the XML data. While schema validation is nice, it also means that each version of the schema must be managed over time, as all schemas tend to change over time, yet the data in the various XML transfers over time each need to be kept with the corresponding schema (or DTD) that matched it at that time.
All this means that sending 1K of data could take up to 30K, or more, of space. This in turn puts a terrible burden on network throughput. How much of a burden?
Schmelzer comments that "XML is not very efficient from a processing, network, or storage" standpoint, and that its use is growing. By 2006, he says, XML traffic alone may reach 25% of corporate network traffic.
One way of dealing with this network load is to use XML accelerators. For example, DataPower Technology uses LZW to compress XML data streams to reduce network traffic. DataPower's director of product management, Kieran Taylor, explains, "XML is text-based and verbose, and this creates several performance bottlenecks. Parsing data, transforming data, and processing XML in the application server costs a lot in terms of MIPS." Thus the market for XML accelerators is born. Other companies, like F5, Forum Systems and Sarvega are also in this market.
But XML accelerators can't solve XML's entire weight problem. Wall explains that since "XML is a textual data format, that means lots of conversions need to take place. To add two numbers sent via XML, the receiver would have to parse the XML, convert the string numbers into integers or floating point numbers, perform the arithmetic, and then convert the answer back into a string. This also has to take place for dates and times if they will be used in a computation (i.e. add 7 days to a value, or check if a date is later than another date), and issues about time zones all rear their ugly heads. Also, most binary data is converted to BASE64, meaning that every three bytes of the source is converted to four characters, a 33% expansion in data's size, not including of the overhead of the tags themselves." This, of course, puts a strain on the server, which must store, translate, and process the XML.
Yet another problem is that there's no way to pull out a single string of data, say the contents of a field, from an XML document without having to retrieve and parse the entire document. There already are, for better or worse, databases like Software AG's Tamino that store data in XML format, but even on today's RAM-crammed, 2GHz processors, they aren't going to be fast.
Even XML supporters confess to its performance problems. Microsoft XML Web services project manager Philips DesAutels admits that "there's a cost to everything," and that the cost for XML-SOAP-based Web services is performance. Still, from where he sits, with XML, "you trade performance for highly flexible protocols."
In addition to storage and performance issues, XML has a real security problem -- it doesn't have any. Many companies, including Entrust, Microsoft, RSA Security, VeriSign, and webMethods, have tried to layer encryption and digital signatures on top of XML. Their efforts haven't gone far.
Microsoft, IBM, Sun, and others are working together in the Web Services Interoperability Organization (WS-I) to promote the creation of Web Services standards, including WS-Security, which is now being developed by the Organization for the Advancement of Structured Information Standards (OASIS). But even WS-Security, probably the most mature security XML element in play, can't do the job by itself. It needs to be implemented with X.509 certificates, which in turn need a public key infrastructure (PKI).
Other efforts, like OASIS' Security Assertion Markup Language (SAML), an XML framework for exchanging authentication and authorization information, are also works in progress.
All this bodes well for the future, none of the projects are really ready to deploy. So, if you want to secure Web services today, you must look to server- and client-side encryption like Secure Socket Layer (SSL), which will at least hide XML communications over the network.
But SSL comes with its own burden. Multiple SSL sessions take up substantial processor time and add yet another layer of data to already bursting-at-the-seams XML files. DesAutels points out that SSL can also eat up server/client processor time, since with each transaction "you're taking SSL up and down."
Identification and authentication might be an even bigger security problem with XML. Can you tell who's asking for XML data? Is the requester entitled to do it? To make authentication work properly you have to implement an enterprise directory service based on Microsoft's Active Directory, Lightweight Directory Access Protocol (LDAP) or Novell's eDirectory, and get them to work with your applications. If you want to extend directory services beyond your local intranet to the Internet, you have to adopt the seriously flawed Microsoft Passport or the unproven Project Liberty.
Of course the use of SSL complicates working with directory services as well. If your message is secured it first has to be deciphered to see if the sending person or program has the proper rights to access data. If it turns out that they don't, you've just wasted more processing time.
Put it all together and it becomes clear that processors that work with native XML are going to be overwhelmed. XML and SSL accelerators will become a necessity. Authentication engines might also be called for. And even with their aid, you'd be well advised to give servers running XML application servers every bit of RAM and MHz you can lay your hands on. They'll need it.
The bottom line is that while XML makes a fine transfer format and provides the basis for extremely easy cross-network programming, it is in no way a miracle cure for your programming ills.
To make XML, and the Web services based upon it, work, you must be ready to devote extremely powerful hardware resources, from high-speed networks to high-powered servers, to your efforts. Successful XML programming may be easy, but successful XML deployment will be hard.
Steven J. Vaughan-Nichols is editor in chief and resident cyber cynic of Practical Technology.
Sponsored: Benefits from the lessons learned in HPC