LTM Change Proposal 1.3

Abstract

This document contains the initial proposal for new features to be introduced in version 1.3 of the Linear Topic Maps Notation. The purpose of this document is to invite discussion before the changes are made official.

This is $Revision: 1.1 $. (It has now been superseded by a second edition.)

1 Introduction

In version 1.3 the following changes are proposed:

Adding the #INCLUDE directive proposed but rejected for 1.2.
Support for variant names.
A #VERSION directive for declaring the version of LTM in use in an LTM file.
URI prefixes, and perhaps also more convenient references.
Support for reification.
Add a section describing deserialization of LTM documents into SAM model instances, at the same time clarifying the rules and requirements for merging.

If all of these changes are accepted for LTM 1.3 it will be no less expressive than XTM 1.0. It will still not support the full SAM model, since full representation of source locators will not be supported. It is not expected that all the changes will be accepted; this document is intended more as a trial balloon than anything.

It must be admitted that some cruft has been allowed to gather in the syntax over versions 1.0, 1.1, and 1.2, and this proposal outlines them and how they might be dealt with by deprecating some features and adding alternatives. A later 2.0 version might then remove the deprecated features.

2 The #INCLUDE directive

Splitting large topic maps up into separate files can make their maintenance substantially easier, and at the same time opens for reuse of individual modules. LTM 1.2 added the #MERGEMAP directive, which allowed external topic map documents to be merged in. These documents had their own namespaces, however, which meant that the only way to merge was by subject identifier or subject address.

The #INCLUDE directive

[1] include ::= '#' 'INCLUDE' WS STRING

The STRING is a URI reference to the LTM topic map to import. This inclusion mechanism will cause the external LTM file to be treated as if its content had been included at the point of reference. The benefit is that this makes merging much easier to specify for the authors.

Issue (ltm-include-srclocs):

Is this actually quite stupid? It will cause the source locators of the included topics to have source locators based on the including TM. What to do then about any "#foo" URI references in the included topic map?

Example of use: #INCLUDE "geography.ltm".

3 Variant names

LTM already supports two special cases of variant names: sort names and display names. An example of a topic declaration that uses them all might be:

[paris : city = "Paris (France)"; "paris"; "Paris"]

In this case, the two latter names are variants of a predefined kind (sort name and display name, in that order). The question is how to allow additional scopes on these two, and also to allow additional variant names with arbitrary scopes to be specified. The interaction between the scope of the variants with the scoping of the entire topic name is also an issue.

One way to do this might be to simply allow more names to be added after the display name, separated by semicolons in the same way, as shown below:

Adding variants to topic names

[2]	`topname`	::=	`'=' basename variantlist? scope?`
[3]	`variantlist`	::=	`';' (sortname \| sortname? ';' (dispname \| dispname? (';' variant)+))`
[4]	`variant`	::=	`STRING scope`
[5]	`sortname`	::=	`STRING scope`
[6]	`dispname`	::=	`STRING scope`

An example of this might be:

[xml = "Extensible Markup Language"; ; 
         ; "XML" /acronym;
         ; "Extended Markup Language" /erroneous]

Issue (ltm-variant-scope-amb):

In this syntax there is a syntactical ambiguity: the scope of the topic name is not clearly distinguished from that of the final variant. We might mandate the insertion of a ';' before the topic name scope, but this would not be backwards compatible.

4 The #VERSION directive

Given the increasing number of LTM versions and the prospect that later versions might not be backwards compatible it might be helpful if LTM documents could declare what version they are written in. This might help implementations select the right parser for parsing them, or even allow a special forwards-compatible mode for versions newer than the latest supported version.

The syntax would be as follows:

The #VERSION directive

[7] version ::= '#' 'VERSION' STRING

where the STRING would be the version number of the LTM version used.

5 URI prefixes

In cases where LTM files declare a number of PSIs for the topics the files usually become quite cluttered with all the long URIs that take up a lot of visual space. A typical example is the first draft XMLvoc ontology, which is nearly unreadable because of all the PSIs.

One solution to this is to allow the user to declare prefixes for the URIs using a directive, in much the same way that XML Namespaces work. Unfortunately, a further syntax extension is needed. Say the 'stdreg' prefix were declared. In that case @"stdreg:data_model" would be ambiguous in the sense that it would not be clear whether this were a new URI scheme, or a URI prefix. Requiring the quotes to be omitted when prefixes are used, however, would solve the problem.

The syntax for this might be as follows:

The #URIPREFIX directive

[8]	`uriprefix`	::=	`'#' 'URIPREFIX' WS NAME WS STRING`
[9]	`indicator`	::=	`'@' (STRING \| NAME ':' NAME)`
[10]	`subject`	::=	`'%' (STRING \| NAME ':' NAME)`

With this new syntax the beginning of the XMLvoc file referenced above could have been written as follows:

Example: xmlvoc.ltm in LTM 1.3

#URIPREFIX xtm "http://www.topicmaps.org/xtm/1.0/core.xtm#"
#URIPREFIX srg "http://psi.xml.org/stdsreg/#"

/* -------------- housekeeping topics -------------- */

[super-sub = "superclass-subclass relationship"
           = "superclass(es)" /sub
           = "subclass(es)" /super  @xtm:superclass-subclass]
[super     = "superclass"           @xtm:superclass]
[sub       = "subclass"             @xtm:subclass]
[sort      = "sort"                 @xtm:sort]


/* -------------- topic types -------------- */

[application_programming_interface = "application programming interface"
 @srg:application_programming_interface]
[application_domain = "application domain" @srg:application_domain]
[character_set = "character set"           @srg:character_set]
[character_encoding = "character encoding" @srg:character_encoding]
[data_model = "data model"                 @srg:data_model]
[document = "document"                     @srg:document]
[document_stage = "document stage"         @srg:document_stage]
[legal_entity = "legal entity"             @srg:legal_entity]
 [organization = "organization"            @srg:organization]
  [standards_body = "standards body"       @srg:standards_body]
 [person = "person"                        @srg:person]

An issue is whether one should go further and allow (NAME | indicator | subject) wherever NAME is now used to refer to a topic. This would be easier in some cases, as it would allow topics to be referred to directly without having to declare them explicitly (they might be merged in). On the other hand it would complicate the syntax and its implementations. Feedback on this issue is much wanted.

6 Reification support

There are six constructs in topic maps which may be reified: topic maps, base names, variant names, occurrences, associations, and association roles. LTM already allows the topic map to be reified, but none of the other constructs. In order to allow these to be reified, LTM must allow them to be given IDs that may be used to refer to them from the topics that reify them. The question is, how are the IDs best assigned to these constructs?

Using the '@' character followed by the ID has already been suggested, but that character has already been taken for another purpose. The same applies to '%', and the '&' character will be used in a similar way in the next tolog version, and may find its way to LTM at some point. The '~' character, however, is free, and might work.

Example: Examples of reification

[ltm : syntax = "LTM" / acronym ~ltm-name]
[ltm-name-topic : name = "LTM" @"#ltm-name"]

invented-by(ltm-name-topic : invention, steve-pepper : inventor) ~invented
[invented-topic : association = "Invention of LTM" "#invented"]

{ltm, specification, "http://www.ontopia.net/download/ltm.html"} ~ltmspec
[ltmspec-topic : occurrence = "The LTM specification" "#ltmspec"]
written-by(ltmspec-topic : work, lmg : author)

As can be seen, this works, but is awkward. One must come up with an ID for the construct to be reified as well as for the topic that reifies it, and the topic must be made to point at the reified construct. (This mirrors reification as done in XTM exactly and naively.) The ideal solution would be as follows: to use the same ID for the construct and its topic.

Example: Improved reification mechanism

[ltm : syntax = "LTM" / acronym ~ltm-name]
[ltm-name : name = "LTM"]

invented-by(ltm-name-topic : invention, steve-pepper : inventor) ~invented
[invented : association = "Invention of LTM"]

{ltm, specification, "http://www.ontopia.net/download/ltm.html"} ~ltmspec
[ltmspec : occurrence = "The LTM specification"]
written-by(ltmspec-topic : work, lmg : author)

The difficulty with this is that the IDs of topics map to source locators (as they must) and that in the SAM (as in XTM 1.0) reification is achived by making the subject indicator of a topic point to the topic map object being reified. So, is there any way around this? The author has so far been unable to think of one, but ideas are welcome.

7 Deserialization specification

The deserialization specification would explain how, given an LTM document, to build a SAM model instance. It would thus fill the same role as the XTM Syntax Specification. The idea is to firm up the definition of LTM now that there are multiple implementations, to clarify the merging rules (which were deliberately vague in previous versions), and to test that the SAM works as intended for relating different topic map syntaxes to one another.