Feeds

Writing history with Microsoft's Office lock-in

No XML please, we're arbitrary

  • alert
  • submit to reddit

Choosing a cloud hosting partner with confidence

Sometimes, very small decisions can have a very big impact on how people work in the future. So join us, on a journey into the future: a story that begins with a little fudge.

In a little noticed move, Microsoft has slid on its commitment to produce open standard file formats for its Office products.

By maintaining a proprietary binary format that frequently changes, Microsoft has kept the exit costs high for potential defectors. However, Microsoft has for a long time touted its investment in XML as a sign of its commitment to openness.

You must remember that XML has always had a "feature" which distinguishes it from SGML, its much more complicated publishing predecessor. SGML insisted on leaving nothing to chance, but an XML parser can, by using a DTD (Document Type Definition) file, happily munch its way through a "well formed" XML document schema, leaving many entities which have not been defined alone.

"Well formed" means that the document will parse without errors - it doesn't mean that the document will make any sense.

Some of our schemas are missing

Microsoft has made a curious choice. It has backed away from implementing an OASIS-defined industry standard by flying a populist flag. Microsoft will offer "freedom" to its users by letting them roll their own schemas.

Microsoft has done so by playing a six-cup shell game. There will be six versions of Microsoft Office 2003, but only two will support user-defined schemas. Can you guess under which two cups the schemas are hiding?

We'll tell you. Office Enterprise and Office Professional. As Joe Wilcox notes in this article, it's the first time such important functionality has been isolated in one variant of the suite.

For the rest of the time, you will be using Microsoft's own schema, WordML. But this is only open in the sense that XML is open.

So when you read a statement from Redmond (via Joe) that, "...when you are using Word in Office XP or the Standard version of Office 2003, the WordML--Microsoft's XML schema, which is 100 percent compliant with industry standards for XML--is saving the formatting of the Word doc," you can hear the sound of a wooden nose growing [*].

A splendid summary of the state of affairs can be found at XML Deviant , a column penned by Kendall Grant Clark.

Clark cites Mike Champion, who asks, "what is the point of storing data in XML if the schema [WordML] is so hideous and proprietary than no one can use it without proprietary API support? "

So in the future, you may be faced with two flavors of nonsense. XML Word documents that have been mangled by Microsoft's XML-creation tools, and XML Word documents that have been mangled by users who add their own non-standard entities (such as our Top Secret "VULTURE" tag).

Put your hands where we can see them

Now then. Microsoft argues, with some justification, that its binary Office format is superior technology to "open" and interoperable Unix file systems. The Unix people have barely got round to even starting discussing a Peace Process for Metadata. Microsoft offers a richer format: it supports multiple data streams, and allows all kinds of interesting compound documents to be created.

But if Microsoft had taken note of the responsibilities that go with the power it wields, it would have documented the format and submitted it to a recognized standards body. It could then compete on its own skills as the best implementer of its home grown format.

No XML please, we're arbitrary

(Kendall's must-read column goes onto other areas, such as the quality of WordML, and the market power that Microsoft as a producer of XML content will have on the language, which is an interesting discussion in itself)

The user defined schemas come with a very curious choice of name.

Forgive us for taking part in what looks like a semantic Jihad in recent weeks - yes, there other useful ways of looking at the world - but sometimes the choice of language tells us a lot.

Microsoft calls these user defined schemas "arbitrary schemas".

Remember me not

A very telling quote in Joe's piece comes from Jean Paoli, XML tribal elder and Microsoft's man in XML-land.

Paoli appears to have given up the pretense of Microsoft using XML as a document format at all.

"I'm out of the business of creating formats. Our focus on Office is on data exchange."

Data exchange. There's a good subject.

Let's add the factor "time" into the context. It's already quite hard for you to read EBCDIC documents, unless you have terminal access to an IBM mainframe - or the right IBM mainframe - as there were several EBCDICs and not all were compatible with each other. (Sound familiar?)

Simon Phipps, who works for Sun but here is speaking for himself, making an important point:

" We continue to live in a world where all our know-how is locked into binary files in an unknown format. If our documents are our corporate memory, Microsoft still has us all condemned to Alzheimer's."

He has identified that if we want our data to live on, we need Microsoft to live on too, to help us read it.

So regarding data exchange, who is exchanging what with whom here?

We need our history and our historians. And by ensuring data formats are vendor specific, we're already defining the constraints under which future historians will operate. ®

[*] Creative readers are encouraged to submit entries for what this may sound like, please - no files larger than 35kb.

Security for virtualized datacentres

More from The Register

next story
Microsoft to bake Skype into IE, without plugins
Redmond thinks the Object Real-Time Communications API for WebRTC is ready to roll
Mozilla: Spidermonkey ATE Apple's JavaScriptCore, THRASHED Google V8
Moz man claims the win on rivals' own benchmarks
Microsoft promises Windows 10 will mean two-factor auth for all
Sneak peek at security features Redmond's baking into new OS
FTDI yanks chip-bricking driver from Windows Update, vows to fight on
Next driver to battle fake chips with 'non-invasive' methods
DEATH by PowerPoint: Microsoft warns of 0-day attack hidden in slides
Might put out patch in update, might chuck it out sooner
Ubuntu 14.10 tries pulling a Steve Ballmer on cloudy offerings
Oi, Windows, centOS and openSUSE – behave, we're all friends here
Apple's OS X Yosemite slurps UNSAVED docs into iCloud
Docs, email contacts... shhhlooop, up it goes
Was ist das? Eine neue Suse Linux Enterprise? Ausgezeichnet!
Version 12 first major-number Suse release since 2009
prev story

Whitepapers

Why and how to choose the right cloud vendor
The benefits of cloud-based storage in your processes. Eliminate onsite, disk-based backup and archiving in favor of cloud-based data protection.
Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Saudi Petroleum chooses Tegile storage solution
A storage solution that addresses company growth and performance for business-critical applications of caseware archive and search along with other key operational systems.
Getting ahead of the compliance curve
Learn about new services that make it easy to discover and manage certificates across the enterprise and how to get ahead of the compliance curve.