以文本方式查看主题

-  中文XML论坛 - 专业的XML技术讨论区  (http://bbs.xml.org.cn/index.asp)
--  『 WORD to XML, HTML to XML 』  (http://bbs.xml.org.cn/list.asp?boardid=13)
----  Office 2003 Word to XML to HTML Example  (http://bbs.xml.org.cn/dispbbs.asp?boardid=13&rootid=&id=14739)


--  作者:admin
--  发布时间:2/23/2005 11:43:00 PM

--  Office 2003 Word to XML to HTML Example
-- 10/13/03
I've been looking at [URL=http://www.w3.org/XML/]XML[/URL] files created from [URL=http://www.microsoft.com/office/preview/editions/technologies/xml.asp]Office 2003 Word[/URL] and rendering them to [URL=http://www.w3.org/MarkUp/]HTML[/URL] via [URL=http://www.w3.org/TR/xslt]XSLT[/URL] using the [URL=http://www.java.sun.com/]Java[/URL] classes described on my page, [URL=http://www.timeoutofmind.com/xmlCodeExamples/xmlTreeViewer.cfm]XML Tree Viewer[/URL].

Microsoft sure adds a lot of extra (unnecessary?) data inside the xml version of those documents. My test ".doc" in its standard document format is only 24k in size; the ".xml" version is 20k. When I render the xml into a tree displayed as a html table, the redundant namespace information for each of the document's elements causes the expansion to be 1.4MB in size! Another author [URL=http://www.infoworld.com/article/03/10/03/39FEofficerev_1.html]has also discovered[/URL] this same bloated structure.

I wonder if Microsoft's intention is to force corporations to buy expensive content management systems to deal with the bloat? It seems to me that the [URL=http://www.w3schools.com/dtd/default.asp]DTD[/URL] or [URL=http://www.w3.org/XML/Schema]XML Schema[/URL] Microsoft is using could be simplified to export to xml only the required support for the actual document and not every feature that Word can support.

You can look at the actual files I used in this discussion by downloading the ".zip" compressed file, [URL=http://www.timeoutofmind.com/code_examples/xml/word_to_xml_to_html.zip]Office 2003 Word to XML to HTML Example Code[/URL].

After uncompressing that file, you'll have a folder containing a Word document, that same Word document saved as ".xml", and an html rendering of that ".xml" file.


W 3 C h i n a ( since 2003 ) 旗 下 站 点
苏ICP备05006046号《全国人大常委会关于维护互联网安全的决定》《计算机信息网络国际联网安全保护管理办法》
31.250ms