Stéphane Rodriguez has an article about issues one hits when trying to implement or use OpenXML. They don’t have the idiotic and artificial type of arguments that lists like groklaw has created, but some of his examples feel a bit extended to make a good story.
Lets see what the summary of his issues are with my bottom line comments. Also note I’m no expert at this stuff, I’m a geek, not a word processing file format geek and I certainly don’t speak for Microsoft on these issues.
- Self-exploding spreadsheets
- Removing formulas from a spreadsheet is non trivial because there are other files with references to the forumla to update, such as the calculation chain
- You can’t rebuild the calculation chain without going through the whole document.
- While the calculation chain can be excluded it is non optimal to do so because some one who does need to understand the whole spreadsheet will have to recalculate it.
- Some ZIP libraries don’t deal efficiently with doing the sort of operations needed to manipulate these zip based documents structures
- Bottom Line 1: Invalidating the Calculation Chain should be automatic, so that simple manipulation tools work better
- Bottom Line 2: Classic engineering tradeoff, you can precalc stuff if you want, but then you have to be able to precalculate it and keep some sort of invalidating state.
- The intuition that what you type in excel is what is stored is incorrect. Excel does magic to make it more user friendly like automatically adjusting to local convention (like , instead of . in number formatting) and auto converting to a type instead of treating everything as a string or forcing the user to be explicit
- The stored number values are affected by IEEE rounding rules
- Stored values are not locale dependant (This is a bad thing?)
- Bottom Line: It’s not clear how this affects the usability or usefulness of the format to me. Maybe a different example where values that aren’t in this format (generated by a third party tool) fail in excel?
- Worksheet shared forulas are listed as “copy from Cell X” instead of having a neutral non cell reference that everything uses
- This leads to a lot more work to change a formula in one place if others reference it.
- Bottom Line: Sounds like a valid complaint to me
- VML is supposed to be deprecated but gets used in some places like comments
- 10 year old memo from Gates that has little to no bearing on the world or Microsoft today
- Bottom Line: I’m not familiar enough with the spec to know if this is an issue or not, but it sounds like comments in Excel is hard to work with and that’s bad.
- You can’t delete a part and know who relies on it without parsing through everything in the file
- Bottom Line: sounds sucky
- The functional things in the format for excel is in english (like the SUM() function)
- VML and DrawingML have a number of encoding notes to help with localization which aren’t documented well
- Applications on top of OpenXML have to localize everything themselves
- Bottom Line: Maybe I’m missing it, but this seems like a feature, my spreadsheet manipulator doesn’t have to be aware of all the possible language encoding of the word “SUM”
I’m going to cut off this post here for now (wife wants my attention 🙂 ) and maybe continue it another day
Major themes from the list so far:
- The excel format seems to be not well designed for targeted modification of existing files. You have to load an understand the whole thing and then write it all back out again. (unless you are using the custom schema stuff, but that is out of scope)
- VML interacts with parts of openXML is not well describe ways