OKS/CMS Integration

HOWTO

Published by: Ontopia AS
Date: $Date: 2007/10/26 08:17:20 $
Version: $Revision: 1.4 $

Table of contents

Abstract

An early skeleton draft of a guidance document explaining the principles of OKS/CMS integration.

1. Introduction

The basic principle of the integration is to map data from the CMS into the topic map. Generally, this means creating one topic for each document in the CMS (or some subset of documents), but may also require creating topics for other CMS objects (such as folders, users, multimedia objects, etc). The purpose of this mapping is to allow further statements to made about the objects from the CMS inside the topic map; usually this means associations.

In general, the integration task can be divided into four parts:

Each of these areas are outlined briefly below.

One issue that should be carefully considered is whether or not one wants to integration to require the CMS and the OKS to be deployed on the same JVM. There are arguments both ways, but the shape of the integration is different in the two cases, so a decision about this needs to be made up front.

2. Data integration

Doing the data integration correctly requires studying the content organization of the CMS closely to understand the structure and semantics of various constructs used by the CMS. However, a general rule is that each document in the CMS (sometimes only documents that meet some criterion) will map to a topic. The title of the document becomes the name of the topic. Other metadata about the document may or may not be mapped, but date of last modification is often needed.

There are two basic patterns for the mapping of documents, which may in some cases be combined:

In both cases the topic map will most likely also contain topics for which there are no corresponding documents in the CMS, and in the first case this is a necessity, since otherwise there will be no way to classify the documents in the topic map.

Typical information about articles one is likely to want to map into the topic map is: title, id, when published/updated, published/updated by, workflow state, folder associations, and perhaps also site associations. The general rule is that information needed in Topic Maps queries should be mapped across. Workflow state, for example, should be mapped because it's needed to make it possible to hide articles which are not published yet.

In some cases one may also want to map the folder structure of the CMS into the topic map. This will usually be done in a similar fashion to the first pattern above, in the sense that there will be folder topics, and parent-child associations between them, and document-in-folder associations to the document topics.

To implement this, look at the OKS Engine API (there is a developer's guide) and the event API of the CMS. You will typically need events like "article created", "article deleted", and "article changed".

3. Editorial system

Generally this means embedding the instance editor page of Ontopoly into the CMS's editorial interface. Typically authors will fill in the normal fields for documents, and then open a "Topic Maps" pane where the fields defined in Ontopoly are edited. This allows the user to connect the document into the topic map.

Ontopoly supports embedding the instance editor part of Ontopoly into other JSP pages, which greatly simplifies embedding the editor into CMS systems. This support is somewhat hairy, unfortunately, for the time being. The integration can either be done with an iframe element in HTML, or by embedding the Ontopoly form into the CMS's own page. The form control values for Ontopoly can either be sent directly to Ontopoly by the browser on form submission (if Ontopoly gets its own form element), or passed to the CMS, which then sends on the Ontopoly form control values to Ontopoly using HTTP.

There also needs to be some way for the editors to access the rest of Ontopoly in order to create their ontology, but this can usually be solved simply by adding a link to the Ontopoly installation in the menu system of the CMS. In addition, one may want to add a plug-in to Ontopoly (version 3.2 onwards) that links back to the CMS.

To implement this, look at the instanceEmbedded.jsp file in apache-tomcat/webapps/ontopoly/WEB-INF/pages. A proper integration protocol will be defined, but for now this is what there is.

4. Presentation

This part varies greatly with different CMSs, and is generally straightforward as long as the IDs of CMS documents are stored in the topic map on the topics representing those documents, and/or the IDs of topics are stored on the CMS documents they represent.

The key requirement in presentation is really to be able to mix CMS-driven presentation with TM-driven presentation. Showing topics related to an article is easy (using the related-tag). What tends to be more challenging is to produce lists of articles from the topic map. In many cases one will want to run a query against the topic map to produce a list of articles, and then the CMS will need to present summary views of these articles. It is possible to produce the entire list from the topic map, but then information like abstracts and images need to be included in the topic map.

If going the TMRAP route, there is TMRAP documentation in the distribution. The TM/XML syntax may also be worth a look.

5. Search integration

The goal of this is to allow users to perform queries like "find all documents about Iraq containing the word 'shia'" (using a normal, user-friendly search interface, of course). There are two ways to approach this:

If you include the CMS fulltext as a tolog predicate the query described above would look something like this:

import ... as cms
select $DOC, $RELEVANCE from
  cms:fulltext($DOC, "shia", $RELEVANCE),
  is-about($DOC : work, iraq : subject),
  instance-of($DOC, document)
order by $RELEVANCE?

How to implement this is described in section 4 of the tolog predicate reference in the distribution.