Having looked at music formats and instant messaging protocols, this final installment of a short series on open formats covers what may be the most ubiquitous of digital file formats: office documents. Spreadsheets, presentations, desktop databases, and the common text document hold most of the business information of our age.
In North America, at least, most of this information lives inside a set of patent-protected, binary (which makes them difficult to reverse-engineer), and undocumented file formats. The Microsoft Office formats, the most well known of which is the Microsoft Word format, are used to store millions (billions?) of documents, from personal journals to government legislation.
For those creating these documents, the problem is inherently disguised. If you create a Microsoft Word document, then you must have access to Microsoft Office and can therefore open, read, and modify the document. The problem arises when you don’t have access to a copy of Microsoft Office. This may be due to financial limitations, or it may be because you are running on a platform that is not supported by Microsoft. No matter how much money you have, you can’t buy a copy of Microsoft Office for Linux.
The frustration of receiving Microsoft Word documents as email attachments led Richard Stallman, founder of the Free Software Foundation, to write a brief manifesto covering the perils of this proprietary format.
The essential problem with a proprietary document format like Microsoft Word is that a private corporation owns the ability to access the works you have created. While it’s not likely that Microsoft is going to deny you access to your Microsoft Word-formatted love letters and chili recipes tomorrow, they do theoretically hold that right.
Confusion and Optimism
The closed binary format in Microsoft Office has been enormously broad in its reach. However, the life of this format is limited. Microsoft recently announced plans to move to a documented format that could potentially be accessed through non-Microsoft means.
The meaning of this announcement has yet to be truly understood. Some see this as the end of the proprietary Microsoft format and a great victory for freedom and openness, as millions of documents will be created in an openly documented format. Others are more cynical, citing licensing issues that will limit what people can do with the formats.
It seems clear, though, that while the legal issues around the new Microsoft formats remain disputed, their technical architecture (basically XML in Zip files) will be much more easily accessible regardless of whether access is endorsed by Microsoft or not.
I don’t clearly understand the issues around this yet myself. The Microsoft community/weblog site, Channel 9, posted a video interview about the new Office formats with Jean Paoli. Watching this video shows the Microsoft engineer’s obvious enthusiasm for openness. However, the video ironically requires proprietary Windows Media technologies for playback.
Alternatives and Workarounds
As Ogg Vorbis is to MP3, and as Jabber is to MSN/ICQ/AIM, so OpenDocument is to Microsoft Office formats. OpenDocument is a new set of standard office file formats for text documents, spreadsheets, presentations, and charts. This open and standard format is the default format in the forthcoming OpenOffice.org 2.0 office suite, but could theoretically be implemented by other applications as well.
Saving your documents in the OpenDocument format means that no one owns the ability to access your works. While the specifications aren’t perfect (I was dismayed to here complaints about the spreadsheet component), it remains a critical standard.
As is the case with instant messaging protocols, the move from proprietary to open office file formats can be eased with the help of transitional software. The OpenOffie.org suite (both the 1.x and upcoming 2.0 versions) can open, edit, and save the main Microsoft Office formats quite well. Using OpenOffice.org, I can easily open any Microsoft Word attachments I might get in my email.
For those that are still stuck with Microsoft Office as an overall platform in their organization, but are looking to move away from Microsoft Windows, there are more promising options. The Wine project is a compatibility layer for running Windows applications on Linux. Especially when packaged in the Codeweavers CrossOver Office product, it is surprisingly easy to actually run Microsoft Office on Linux. This is obviously only a transitional aid, and not a long term solution, but it is helpful.
Conclusion: Freedom Should Be On By Default
The core idea behind this series on open formats and protocols is that you should not be limited in access to what you have created yourself, regardless of the tools you used to create. No one would buy a pen that produced writing that could only be read through special glasses sold by the same company. Even more so, no one would allow their governments to publish documents created by this crippled pen.
Being locked out of content that should be free or that you have legitimately purchased is bad enough. I have to use illegal software to watch DVDs (that I have bought and paid for) on my laptop. However, it is even worse when you are locked out of content that you have created yourself.
If your mom buys a computer, writes you a letter, and emails it to you in the Microsoft Word format, you have to pay Microsoft to read the letter. Of course, your mom doesn’t have to use Microsoft Office, but if it is the default word processor on her new computer, she may not realize the issue.
If you have your wedding video-recorded and it is given to you by the production company in DVD format, you can’t make copies for your family or as a backup. Again, the production company doesn’t have to use the proprietary DVD format, but it is the only one that will play in everyone’s home DVD player. [UPDATE: Several people have corretly pointed out that the proprietary DVD encryption (CSS) is optional and need not be used on personal DVDs – good point.]
For these reasons, it is not good enough that freedom be available as an option. Freedom must be on by default.
The Catch-22 of Open Formats mini-series
- The Catch-22 of Open Format Adoption, Part 1: Music
- The Catch-22 of Open Format Adoption, Part 2: Instant Messaging
- The Catch-22 of Open Format Adoption, Part 3: Office Documents (you are here)