Feeds

XML - past, present and future

Chewing the fat with MS's Jean Paoli

  • alert
  • submit to reddit

High performance access to file storage

Last week I had the pleasure of meeting up with Jean Paoli of Microsoft. In November, Jean was presented with the XML Cup 2004 to recognise his lifelong work in XML and its precursor SGML. The meeting gave me an opportunity to hear about the fascinating history of XML and understand some of its importance to Microsoft and the industry.

Jean Paoli was one of the leading members of the original XML working party and he had been working with SGML since 1985. SGML was a mark-up language that was mainly designed to allow manufacturers to pass complex design documents around. It worked very well at that task but never found its way into the mainstream of computing. Its biggest problem was its size, the specification was about a thousand pages and there was only one parser that implemented the complete standard. The other problem was that it was document centric, rather than data centric.

When Jean joined Microsoft in April 1996, officially to help develop IE4, it was a good chance to put into practice ideas that had floated around the SGML community for several years. Jean helped set up the first W3C committee for XML and by the end of the year 80 per cent of the standard was complete. Jean found that his knowledge and understanding of the power of SGML and mark-up languages in general, combined with the Microsoft engineers’ passion and understanding of simplicity and ease of use, enabled him to define XML. The XML specification was less than five per cent the size of SGML but in many ways more powerful.

Defining XML was Jean’s night job and during the day he helped develop Internet Explorer 4.0. The two came together by XML support being included in IE4 when it was launched at the end of 97. This was the time of the IE-Netscape wars and that discussion rather overshadowed the really important new bit of IE that was the XML support. Included in IE4 was the implementation of CDF (the precursor of RSS) which was the first use of XML. The importance of CDF was that it showed the power of XML to transport data from one environment to another in such a way that the producers and consumers did not need to have any direct knowledge of each others environments.

The amazing thing about this story is the speed at which it happened; less than two years from a standards committee being set up, to product coming out in the market, is unusual. This happened because the requirement was well understood and Bill Gates recognised its importance and gave it his backing.

XML is now imbedded into most of Microsoft’s products and central to all of its strategy. And, as they say... the rest is history.

I asked Jean about WordML. When it was first announced, it seemed very Office-centric to me, and I felt that it should have been a more generalised document mark-up language. Jean explained that the raison d’etre for WordML is for archiving Word documents. There is a real problem with documents that have to be kept for a long time (think of birth certificates) if they are stored in internal Word format. The problem is that in 30 years' time they will probably be unreadable as the software will have moved on, let alone 100 years from now. So there is a need to be able to store these documents in a vendor and software neutral format and that is what WordML is designed to do. The schema definition is open source so that anyone can write a parser at any time to read and format the documents. To do this, WordML has to support all the functionality and the quirkiness of Word, and hence the WordML schema is by definition Word-centric.

On the other hand, what is more generally important is Offices’ support of any XML schema. This is an area that has quietly grown up and the first tech conference on the subject last week attracted more than 500 delegates.

© IT-analysis.com

Related stories

XML Tower of Babel - bring on UBL
EDS and Opsware: bringing XML to the data centre
XML machine the successor to von Neumann?

High performance access to file storage

More from The Register

next story
Android engineer: We DIDN'T copy Apple OR follow Samsung's orders
Veep testifies for Samsung during Apple patent trial
Windows 8.1, which you probably haven't upgraded to yet, ALREADY OBSOLETE
Pre-Update versions of new Windows version will no longer support patches
Microsoft lobs pre-release Windows Phone 8.1 at devs who dare
App makers can load it before anyone else, but if they do they're stuck with it
This time it's 'Personal': new Office 365 sub covers just two devices
Redmond also brings Office into Google's back yard
Half of Twitter's 'active users' are SILENT STALKERS
Nearly 50% have NEVER tweeted a word
Windows XP still has 27 per cent market share on its deathbed
Windows 7 making some gains on XP Death Day
Internet-of-stuff startup dumps NoSQL for ... SQL?
NoSQL taste great at first but lacks proper nutrients, says startup cloud whiz
US taxman blows Win XP deadline, must now spend millions on custom support
Gov't IT likened to 'a Model T with a lot of things on top of it'
Batten down the hatches, Ubuntu 14.04 LTS due in TWO DAYS
Admins dab straining server brows in advance of Trusty Tahr's long-term support landing
prev story

Whitepapers

Securing web applications made simple and scalable
In this whitepaper learn how automated security testing can provide a simple and scalable way to protect your web applications.
Five 3D headsets to be won!
We were so impressed by the Durovis Dive headset we’ve asked the company to give some away to Reg readers.
HP ArcSight ESM solution helps Finansbank
Based on their experience using HP ArcSight Enterprise Security Manager for IT security operations, Finansbank moved to HP ArcSight ESM for fraud management.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Mobile application security study
Download this report to see the alarming realities regarding the sheer number of applications vulnerable to attack, as well as the most common and easily addressable vulnerability errors.