Business users need to interpret and evaluate their content to best leverage it. To do so, business users need to execute searches across content with an eye to what information might be monetized, how that content might be grouped, packaged, or syndicated, and how the content will appear across various digital distribution channels.
For such activities to occur effectively across very large content collections, content must stored in a structured format: broken down into contextually-defined atomic parts (i.e. sections, sub-sections, paragraphs) in order to facilitate effective searches. Because of its ability to contextualize and structure content, XML, as we all know, is the natural format for delivering content over the internet.
Unlike PDFs, Office, Quark, InDesign and other digital formats, the XML format contains all of the data characteristics and structural information needed to aid effective searches and group data effectively. Well-designed XML is contextually self-aware because it not only defines what a specific piece of content is – but it also defines its contextual location. For example, XML may show that a particular piece of content is within a sub-section of a section of a larger object and is in the same layer as other objects, etc.
Many content creators are already storing their data in XML, and most content syndication is done through an XML format. The bottom line is that the glue that’s tying content together on the internet today is XML. Even MS Office 2007 now stores its documents natively in XML as zipped up collections of XML files under the .docx or .xlsx extension.
In the next post, we'll discuss how to manage the XML in preparation for use within a Content Interpreter.