Main

Content Interpreter Archives

October 2, 2007

Content Interpreter

Over the next few weeks, I want to experiment with fleshing out the concept of a Content Interpreter, a content logic pattern that allows an enterprise to best leverage their content's value.

Today, content creators and publishers are challenged with how to best leverage their content’s value. Many are struggling in this digital world where user-generated content is king and a few niche players who don’t have a content-creation background are vying to re-define how people consume content in the digital age (hello google). Looking forward, early Web 3.0 speculation is that mature internet users will be seeking premium content and the internet will become a worldwide database requiring content that can be adapted quickly in order to plug seamlessly into new digital formats.

I'm thinking that the concept of a Content Interpreter - a system that allows content-centric enterprises to leverage their content’s value by making it easily available in comprehensible formats would help content creators overcome this challenge. A Content Interpreter is the key to empowering content, especially when working with a large digital archive.

Next - the challenge of leveraging your content's value and more about the content interpeter...

October 15, 2007

The Content Interpreter Landscape

Publishers and content-driven enterprises are facing the challenge of how to best leverage their contents’ value in the digital age. Many publishers and content creators are struggling to compete in a world where user-generated content is commanding as much attention as professionally-generated content, and where companies without publishing or content creation backgrounds are vying to re-define how people consume content.

Some traditional publishers have been late to realize these difficulties – which arose for them with the advent of Web 2.0 – and even more challenges and opportunities are on the horizon. Looking forward to Web 3.0, early speculation is that mature internet users will be seeking premium content, and the internet will become a worldwide database necessitating content that can be adapted quickly and plugged seamlessly into new digital formats.

Faced with current and future challenges, the goal of all publishers and content creators must be to leverage their content’s value to its fullest potential through digital distribution, syndication, and other content re-purposing strategies.

Today there are three main hurdles that all content-centric enterprises must overcome in order to thrive:

- Preparing and managing content.
- Researching and interpreting content to leverage its greatest value.
- Deploying content over digital distribution channels.

We'll be looking at these items in our upcoming posts.

October 29, 2007

Content Interpreter III: Empowering Content

Business users need to interpret and evaluate their content to best leverage it. To do so, business users need to execute searches across content with an eye to what information might be monetized, how that content might be grouped, packaged, or syndicated, and how the content will appear across various digital distribution channels.

For such activities to occur effectively across very large content collections, content must stored in a structured format: broken down into contextually-defined atomic parts (i.e. sections, sub-sections, paragraphs) in order to facilitate effective searches. Because of its ability to contextualize and structure content, XML, as we all know, is the natural format for delivering content over the internet.

Unlike PDFs, Office, Quark, InDesign and other digital formats, the XML format contains all of the data characteristics and structural information needed to aid effective searches and group data effectively. Well-designed XML is contextually self-aware because it not only defines what a specific piece of content is – but it also defines its contextual location. For example, XML may show that a particular piece of content is within a sub-section of a section of a larger object and is in the same layer as other objects, etc.

Many content creators are already storing their data in XML, and most content syndication is done through an XML format. The bottom line is that the glue that’s tying content together on the internet today is XML. Even MS Office 2007 now stores its documents natively in XML as zipped up collections of XML files under the .docx or .xlsx extension.

In the next post, we'll discuss how to manage the XML in preparation for use within a Content Interpreter.

November 8, 2007

Content Interpreter IV: Content Management

The critical prerequisite for a Content Interpreter is having the content in a structured and organized manner. There are two core components to a successful content management strategy.

The first is the versioning, storage and maintenance of the master copy of the files. This is the realm and strength of industry standard Enterprise Content Management Systems (ECMS) and strategies.

Second, content must be made accessible for content managers in a way that allows them to determine the feasibility of a new idea quickly and easily. If a product manager sees an opportunity to provide a collection of content to a partner in a particular format, he should be able to nimbly gauge this opportunity during the time he on the phone with this partner. If an editor wants to create a collection of assets to give out at a conference, she should be able to browse the content, chunk relevant sub-sections from larger publications, and create a customized publication for this specific occasion with rapid ease.

These advanced content browsing and creation capabilities, however, are outside of the capabilities of traditional ECMS. What’s needed to meet these needs is a Content Interpreter: a system that sits on an ECMS or Digital Asset Management System (DAMS) and converts relevant digital content from a large repository into human comprehensible content.

In the next post, we'll go over the conceptual architecture of the Content Interpreter.

November 30, 2007

Content Interpreter V: The Definition

In the case of XML, a Content Interpreter is an architecture that consists of an XML database that stores and indexes XML, with a business layer application that provides business intelligence and content transformation capabilities.

These specialized functions are not mature features within ECMS or DAMS. To gain these advantages it is necessary to look for a 3rd party package to integrate into the ECMS or DAMS. In the market, there exists a number of mature XML Database solutions including but not limited to Mark Logic and the open-source eXist.

The key points of implementing a Content Interpreter are:

1. Integration with the content repository.

It is imperative to draw a hard line between the functions that the ECMS and the Content Interpreter are responsible for. For ex. the ECMS should “publish” to the Content Interpreter.

2. Scalability.

The Content Interpreter should be able to handle an enterprise’s full structured content set.

3. Speed.

The Content Interpreter must be able to execute queries in real time and can be quickly extended to create new ways of searching and grouping content.

4. Configuration.

The Content Interpreter should not be a black box but rather a transparent collection of modules that can be configured to meet new requirements.

5. Lightweight.

The Content Interpreter should be relatively portable, and not requisite of an advanced engineering degree to manage and operate.


To meet these five key points of implementing a Content Interpreter, a sophisticated programming language that is simple and lightweight is required. Since the architecture is comprised of an XML Database and a business logic layer, the natural choice is XQuery (XML Query Language). XQuery is a natural fit because its inherent capabilities are to execute tasks such as combining, searching and comparing data.

The Content Interpreter’s layers

The implementation of a Content Interpreter is more an exercise in setting up the appropriate architecture and sticking to a key set of rules as opposed to what tools and technologies are used. Looking at the Content Interpreter from the bottom up, the following outlines its four logical layers:

1. XML or structured content layer – the physical content.
2. ECMS or DAMS layer – the management of the content.
3. XML Database layer – the indexing and searching of the content.
4. Content Interpreter layer - the business application layer.