OKS/CMS Integration

HOWTO

Published by:	Ontopia AS
Date:	$Date: 2007/10/26 08:17:20 $
Version:	$Revision: 1.4 $

1. Introduction

2. Data integration

3. Editorial system

4. Presentation

5. Search integration

Abstract

An early skeleton draft of a guidance document explaining the principles of OKS/CMS integration.

1. Introduction

The basic principle of the integration is to map data from the CMS into the topic map. Generally, this means creating one topic for each document in the CMS (or some subset of documents), but may also require creating topics for other CMS objects (such as folders, users, multimedia objects, etc). The purpose of this mapping is to allow further statements to made about the objects from the CMS inside the topic map; usually this means associations.

In general, the integration task can be divided into four parts:

Implement data integration between the CMS and the OKS (as outlined above).
Integration of editorial systems to allow CMS objects to be described using the topic map inside the CMS administration interface.
Integration of presentation layer of the CMS with the topic map.
Search integration to allow searches to both full-text search the content in the CMS, and make use of Topic Maps information about the CMS content.

Each of these areas are outlined briefly below.

One issue that should be carefully considered is whether or not one wants to integration to require the CMS and the OKS to be deployed on the same JVM. There are arguments both ways, but the shape of the integration is different in the two cases, so a decision about this needs to be made up front.

2. Data integration

Doing the data integration correctly requires studying the content organization of the CMS closely to understand the structure and semantics of various constructs used by the CMS. However, a general rule is that each document in the CMS (sometimes only documents that meet some criterion) will map to a topic. The title of the document becomes the name of the topic. Other metadata about the document may or may not be mapped, but date of last modification is often needed.

There are two basic patterns for the mapping of documents, which may in some cases be combined:

Map the documents as documents. That is, the topics in the topic map really represent the documents. There will generally be one cms:document topic type, but documents may also become instances of subclasses of this type, depending on how documents are classified in the CMS.
Map the documents as subjects. That is, each document is considered the main resource on a particular subject, and the corresponding topic is considered to represent that subject (as opposed to representing the document). This approach can be more challenging, in that in this case it is crucial to get the topic type right in the topic map, but the CMS is unlikely to have mechanisms for indicating the topic type directly.

In both cases the topic map will most likely also contain topics for which there are no corresponding documents in the CMS, and in the first case this is a necessity, since otherwise there will be no way to classify the documents in the topic map.

Typical information about articles one is likely to want to map into the topic map is: title, id, when published/updated, published/updated by, workflow state, folder associations, and perhaps also site associations. The general rule is that information needed in Topic Maps queries should be mapped across. Workflow state, for example, should be mapped because it's needed to make it possible to hide articles which are not published yet.

In some cases one may also want to map the folder structure of the CMS into the topic map. This will usually be done in a similar fashion to the first pattern above, in the sense that there will be folder topics, and parent-child associations between them, and document-in-folder associations to the document topics.

To implement this, look at the OKS Engine API (there is a developer's guide) and the event API of the CMS. You will typically need events like "article created", "article deleted", and "article changed".

3. Editorial system

Generally this means embedding the instance editor page of Ontopoly into the CMS's editorial interface. Typically authors will fill in the normal fields for documents, and then open a "Topic Maps" pane where the fields defined in Ontopoly are edited. This allows the user to connect the document into the topic map.

Ontopoly supports embedding the instance editor part of Ontopoly into other JSP pages, which greatly simplifies embedding the editor into CMS systems. This support is somewhat hairy, unfortunately, for the time being. The integration can either be done with an iframe element in HTML, or by embedding the Ontopoly form into the CMS's own page. The form control values for Ontopoly can either be sent directly to Ontopoly by the browser on form submission (if Ontopoly gets its own form element), or passed to the CMS, which then sends on the Ontopoly form control values to Ontopoly using HTTP.

There also needs to be some way for the editors to access the rest of Ontopoly in order to create their ontology, but this can usually be solved simply by adding a link to the Ontopoly installation in the menu system of the CMS. In addition, one may want to add a plug-in to Ontopoly (version 3.2 onwards) that links back to the CMS.

To implement this, look at the instanceEmbedded.jsp file in apache-tomcat/webapps/ontopoly/WEB-INF/pages. A proper integration protocol will be defined, but for now this is what there is.

4. Presentation

This part varies greatly with different CMSs, and is generally straightforward as long as the IDs of CMS documents are stored in the topic map on the topics representing those documents, and/or the IDs of topics are stored on the CMS documents they represent.

The key requirement in presentation is really to be able to mix CMS-driven presentation with TM-driven presentation. Showing topics related to an article is easy (using the related-tag). What tends to be more challenging is to produce lists of articles from the topic map. In many cases one will want to run a query against the topic map to produce a list of articles, and then the CMS will need to present summary views of these articles. It is possible to produce the entire list from the topic map, but then information like abstracts and images need to be included in the topic map.

If going the TMRAP route, there is TMRAP documentation in the distribution. The TM/XML syntax may also be worth a look.

5. Search integration

The goal of this is to allow users to perform queries like "find all documents about Iraq containing the word 'shia'" (using a normal, user-friendly search interface, of course). There are two ways to approach this:

Making the topic map drive the search. In this case you will most likely use tolog for the search, and the search function of the CMS can be included in tolog as a tolog predicate by implementing two Java interfaces that perform the search in the CMS and return the results in the correct way for the OKS. The result will be a predicate that produces article topics. See the tolog predicate reference document for details.
Letting the CMS or the database drive the search. In this case you can use the OKS API and tolog to do the Topic Maps part of the search. However, you are on your own here.

If you include the CMS fulltext as a tolog predicate the query described above would look something like this:

import ... as cms
select $DOC, $RELEVANCE from
  cms:fulltext($DOC, "shia", $RELEVANCE),
  is-about($DOC : work, iraq : subject),
  instance-of($DOC, document)
order by $RELEVANCE?

How to implement this is described in section 4 of the tolog predicate reference in the distribution.