XTM: The Ontopia Comments

Comments on tmdtd-00-09-25.txt from

Kal Ahmed <kal@ontopia.net>
Lars Marius Garshol <larsga@ontopia.net>
Geir Ove Grønmo <grove@ontopia.net>
Steve Pepper <pepper@ontopia.net>

(c) Ontopia 2000

$Id: syntax-comments-2000-09.html,v 1.1 2002/02/01 15:36:18 larsga Exp $

These comments are based on the draft DTD posted by Michel Biezunski entitled tmdtd-00-09-25.txt. This was based on a submission by Hans Holger Rath and Graham Moore (TM0_3.dtd), which although copyrighted Empolis seems to be itself based on an earlier submission by Kal Ahmed and Graham Moore (2nd Draft, 28/02/00).

Our comments are grouped under the following subheadings:

GENERAL
MAKING EVERYTHING A TOPIC
TOPICMAP
TOPIC
DISPNAME
OCCURS
ASSOC
ASSOCRL
FACET
FVALUE
ADDTHMS/ADDTHEMS/MERGEMAP
PUBLIC SUBJECT IDENTIFIERS
MERGING

Typographic conventions:

names of element types are shown in single quotes, thus: 'topic'
names of attributes are wrapped in hyphens (convention stolen from SRN), thus: -type-

GENERAL

Architecture or interchange format?

It needs to be decided whether XTM is to define an architecture (like XLink) or merely an interchange syntax. In our opinion we should stick to the interchange syntax, at least for the time being. This has a couple of consequences:

There is no need for -xtm:class- since the only GIs that will be used are those actually defined by XTM. Features for mapping other data into topic maps do not belong here.
The "xtm:" prefix can be removed from the DTD: it only confuses things. If an XTM namespace is deemed to be necessary, it should be declared as the default namespace on the 'topicmap' element.

Linking issues

It should be a goal that generic XLink processors be able to do something useful with topic maps even though they are not aware of the topic map semantics. Furthermore, it should be possible to address topics across topic maps. Therefore:

All references to topics (via -scope-, -types-, -type-, -href- and other attributes) should use URL syntax, not IDREF syntax. In particular, local ID addresses should be prefixed by the '#' symbol.
The declared value of all such attributes should be CDATA, and a conventional comment should specify which of them may contain a white-space separated list of URLs.

The processing implications of allowing referencing outside the document need to be spelled out, especially as regards merging.

It should be considered whether -show- and -actuate- attributes should be declared on the xlink element types 'topic', 'assoc', 'facet', and 'mergemap'. If so, the default should be #IMPLIED since they have no topic map semantics whatsoever.

-type- and -types-

In ISO 13250 only topics have a -types- attribute. All other constructs that can be typed have a -type- attribute. It has been proposed to SC34 that -types- be used everywhere, but this does not seem appropriate in XTM. The following syntax

   <assoc types="foo bar">
     <assocrl type="r1">t1</assocrl>
     <assocrl type="r2">t2</assocrl>
   </assoc>

would merely be syntactic sugar for multiple associations, which could easily be represented as:

   <assoc type="foo">
     <assocrl type="r1">t1</assocrl>
     <assocrl type="r2">t2</assocrl>
   </assoc>

   <assoc type="bar">
     <assocrl type="r1">t1</assocrl>
     <assocrl type="r2">t2</assocrl>
   </assoc>

In the same way, here is an example of an occurrence that has multiple roles (definition, etymology and pronunciation):

Op['e]ra bouffe [F. op['e]ra opera + bouffe comic, It. buffo], Opera buffa [It.], light, farcical, burlesque opera.

This could in theory be represented as follows:

    <occurs types="def etym pron" href="...">

However it could equally well be represented using three 'occurs' elements. Allowing multiple occurrence role types begs the question of the significance of the one representation as compared to the other. It makes more sense to stick to XTM's KISS principle, which would suggest retaining -type- everywhere in the DTD. (Actually we propose changing the name of the -type- attribute in some cases for clarity: see below under the individual element types in question ('occrl', 'assocrl', 'facet', and 'fvalue'.)

Our conclusion is that only topics should have multiple types.

MAKING EVERYTHING A TOPIC

It is often said (sometimes by people who should know better) that "everything in a topic map is a topic". This is not true. Only types, themes and topics are topics: associations are not topics (though association types are topics); occurrences are not topics (though occurrence role types are topics); and many other things are also not topics.

The fact that types are topics is extremely useful and powerful. It is the basis for the "self-documentation" feature of topic maps, and it puts topic maps into an exclusive group of knowledge representation formalisms that allow an object to be both a type and an instance at the same time.

During the last year or so a number of voices have been raised suggesting that everything in a topic map should be a topic -- or at least that it should be possible to regard anything in a topic map as a topic. The rationale for doing this may vary from person to person, but the regularity with which the issue crops up seems to suggest a general requirement. Here are some examples of the uses to which such a feature could be put:

If topicmaps could be regarded as topics it would be possible to give topic maps a name (title) and associated documentation, such as an introduction, editorial guidelines, etc.
If associations could be regarded as topics it would be possible to name them, specify occurrences of them, and create associations between them.
If occurrences could be regarded as topics they could be assigned names that could be used as labels in the user interface, thus distinguishing multiple occurrences of the same topic that all play the same role (e.g. multiple synopses of operas). This would effectively extend the topic map paradigm to cover TOCs as well as indexes, thesauri, glossaries, and cross references. It would also be possible to create associations (such as "required reading") between occurrences.

We believe that there are enough such examples to justify extending the semantics of the topic map model in a simple way in order to enable anything to be regarded as a topic. This also fits with the definition of Subject as "any thing whatsoever ... about which anything whatsoever may be asserted by any means whatsoever"! And it would mean that topic maps really can be self-documenting in absolutely every particular. Finally it allows us to have our cake and eat it: The fundamental distinctions between topics and other objects is maintained, but we can still, in one sense, say that everything is a topic.

This also addresses a point made recently by Dale Hunscher on the xtm-wg list (26 Sep 2000 12:12:29 -0400):

There is a feeling when reading all the material on topic Maps that it is a great benefit that everything is a topic, and on one level that is true. On another level, though, if everything is a topic, but there are many different flavors of topics, processing of topics will rightfully ignore their topic-ness and focus on their flavors. Sometimes it is better to have different entities for different purposes, even though they may *also* have some part of their nature in common.

This appears to be the requirement underlying HHR/GDM's proposal for a 'nameservice' element type. However, as noted below, this is an ugly solution that complicates the data model considerably. A far simpler solution would be to simply allow any object to point to a topic that represents it. This is the rationale for the introduction of the -topic- attribute on the element types 'topicmap', 'occurs', 'assoc', and 'assocrl'.

(An alternative to introducing the -topic- attribute would be to use existing mechanisms and define one or more PSIs. Any object in a topic map, including the 'topicmap' element, and other elements, can be referenced as an occurrence of a topic. It would be possible to define standardized occurrence role semantics that indicated some kind of "characterized-by" relation between the "information resource" (i.e. the 'topicmap' element) and the topic containing the occurrence. However this would be much less clear semantically, and at least in one sense turns the matter on its head, so the -topic- attribute approach seems preferable.)

We recognize that this proposal strongly impacts the data model and would compromise ISO 13250 backwards compatibility (unless SC34 adopts the same proposal). As such it cannot be unilaterally decided by the syntax subgroup, and we submit it therefore to the consideration of all three subgroups (syntax, model, and use case).

We also recognize that such an extension has many implications that would need to be clarified before it can be accepted, for example:

can a single topic perform this kind of reification for multiple objects of the same type and, if so, what are the processing implications?
can a single topic perform this kind of reification on multiple objects of different types?
what is the relationship between the type(s) of a reifying topic and the type of the object (e.g. association) that it reifies?
what is the relationship between the scope of a reifying topic and the scope of the object(s) (e.g. association roles) that it reifies?
what are the implications when merging topics?
etc.

Such issues need to be clarified before the model can be extended and the accompanying syntactical changes effected. This may or may not be possible in time for the December 1 deadline.

TOPICMAP

'mergemap'

We propose adding a 'mergemap' element type to the content model of 'topicmap' instead of 'addthems' (see below).

'bos'

It seems to us that a 'bos' element type is unnecessary as far as topic map processing is concerned. We don't think that the bounded object set should be a property of the topic map itself, but rather of the application, or even the user session.

We believe there are two issues here:

How to know which topic maps to merge (or when to stop merging chained topic maps), and
How to determine the complete set of objects that a "processing system is responsible for knowing about and acting appropriately upon".

On the first issue, we believe that the candidate set of topic maps to merge is all topic maps containing a topic that is referenced from the hub topic map, or a merged topic map, by any of the following:

-types- (or -instance-of-) attributes on 'topic' elements
-type- (or -role-) attributes on 'occurs' elements
-type- (or -instance-of-) attributes on 'assoc' elements
-type- (or -role-) attributes on 'assocrl' elements
-href- attributes on 'assocrl' elements
-type- (or -property-) attributes on 'facet' elements
-type- (or -value-) attributes on 'fvalue' elements
-href- attributes on 'mergemap' elements
-scope- and -addthems- attributes
-topic- attributes

Note: We agree with Steve N. that not all topic maps that are referenced as occurrences should be automatically merged. Once the 'mergemap' element type is introduced, it can be used to specify any topic maps not covered by the above list that should be merged, including some which may also be referenced as occurrences. Alternatively, XTM could define a PSI with the semantics "topic map that should be merged" that could be used to indicate a special occurrence role type (or supertype).

However, whether or not all these candidate topic maps are actually merged should be left to the discretion of the application, in interaction with the user, at run-time.

Once the set of relevant topic maps has been determined, the set of objects that a processing systems should be "responsible for knowing about and acting appropriately upon" is simply the union of these and all resources referenced from occurrences within them. From the point of view of generalized topic map processing it is not necessary to include other objects containing references to these.

-addthms-

We propose renaming the -addthms- attribute to -scope-, for clarity. This naming could be considered slightly inaccurate since topic maps do themselves not have scope and this attribute is actually specifying themes to be added to the scope of all characteristic assignments in the topic map. However, the usage is at least consistent with that on 'topic', where exactly the same applies (except that the themes are only added to the scopes of name and occurrence characteristic assignments for that particular topic).

Alternatively, -scope- on 'topic' should be renamed to -addthms- for consistency.

-topic-

We propose adding a new -topic- attribute that references a topic that reifies the topic map (see the general discussion on making everything a topic, above).

TOPIC

-supertypes-

After much soul-searching we have come to the conclusion that we should not add a -supertypes- attribute in XTM.

First of all, having a short cut for the supertype/subtype relationship has less benefit than having one for the type/instance relationship (which is what -types- amounts to), since there will almost always be far fewer of the former than the latter.

We understand Steve N.'s dissatisfaction with -types-, but we believe it would be overly purist to remove it, both because it really does fulfill a need, and because it is already in ISO 13250 and widely used. On the other hand, adding a -supertypes- attribute would be insufficiently purist! (Both attributes incur the same cost in terms of explanation, but the benefit of -supertypes- is far less.)

Having said that, there is a crying need for standardization of the semantics of the supertype/subtype relationship. Therefore the "price" for not adding -supertypes- must be that a public subject identifier is defined for the supertype/subtype relationship. And in order to give users a real possibility of using associations for the type/instance relationship (instead of the -types- attribute), a PSI should also be defined for this relationship. (We have a proposal for PSIs below.)

-types-

Since this attribute has lead to so much confusion, we have considered renaming it to -instance-of-. The downside of this is that it would make the concept of "topic type" less intuitive. The Ontopia jury is still out on this one, and we submit it for the group's attention.

-identity-

It needs to be clarified whether the value of this attribute should be a locator or a string, or if it can be either. This attribute is the basis for templates, merging, interoperability and much else, so its syntax and semantics have to be clarified. Our proposal is that the content should be one or more PSI (public subject identifiers), and that a PSI should be a URI (i.e. either a URL or a URN). (See below.)

DISPNAME

The content model ANY for 'dispname' is too vague. We propose the following:

    <!ELEMENT dispname (#PCDATA) >
    <!ATTLIST dispname
              scope CDATA #IMPLIED
              href CDATA #IMPLIED >

If -href- is specified then any content is assumed to play the role of -alt- (alternative text, as in HTML's IMG elements, to be displayed if the referenced object for some reason cannot be displayed).

OCCURS

-occrl-

This was removed by HHR/GDM, presumably because 'nameservice' was intended to provide a more powerful alternative.

However, the purpose of these two is different. -occrl- is a mnemonic that characterizes the role played by the occurrence, just as -type- does (albeit in a less formal and less robust way, since it is just a string rather than a reference to a topic). Our understanding of 'nameservice' is that it is intended to provide a name by which to characterize this particular occurrence (e.g. if the resource located by the occurrence is an article, 'nameservice' might contain its title). This to us is something quite different.

Since they are different, the one should not replace the other. However, we raise the question whether -occrl- should not be removed anyway, since it really is just a brain-dead alternative to -type-. By removing it altogether, along with other "mnemonics", we will encourage the good practice of using the more powerful alternative.

-type-

It has been pointed out that -type- does not really point to a class of which this occurrence is an instance, but simply to a topic which more fully characterize the role played by the occurrence. Therefore we propose renaming it to -role-.

-topic-

We propose adding a new -topic- attribute that references a topic that reifies the occurrence (see the general discussion on making everything a topic, above).

'nameservice'

We find the 'nameservice' proposal messy and potentially very confusing. It also has such huge implications for the data model that any proposals in this direction should come out of the data model discussion rather than the syntax discussion. However we think we understand -- and sympathise with -- the rationale, which seems to relate to the wish to make "everything" a topic. Our proposal is to use the -topic- attribute instead (see above).

ASSOC

-type-

We propose reinstating -type- instead of -types- for the reasons given above. If a decision is taken to rename -types- to -instance-of- on 'topic' (see above), the same should be done here.

-identity-

We propose to use the -topic- attribute for the purpose for which -identity- was intended here.

-topic-

We propose adding a new -topic- attribute that references a topic that reifies the association (see the general discussion on making everything a topic, above).

'nameservice'

See the comments under 'occurs' (above) and on the subject of making everything a topic (above).

ASSOCRL

-href-

If we want XLink processors to recognize and process 'assoc' elements as xlinks (and we think this is highly desirable), the 'assocrl' locators must use URL syntax rather than IDREFs syntax. This means using the '#' prefix to the IDs of elements in the current document.

Both for consistency and to enable cross-document addressing, we propose that all attributes that address 'topic' elements do so using URL syntax (see above).

-anchrole-

This attribute was replaced by 'nameservice' in the HHR/GDM proposal, but this does not make sense for the same reasons as for -occrl- (see above).

MB says this is the most important attribute and should be required... We disagree. For us, -type- is more important since it performs the same function but in a more powerful way. -anchrole- should definitely not be required and we raise the issue if it shouldn't be left out altogether, along with other "mnemonics".

-type-

Should be renamed to -role- (as per -occrl-) and it should be kept in the singular (for any topic that plays multiple roles in the same association there should be multiple 'assocrl' elements).

-topic-

We propose adding a new -topic- attribute that references a topic that reifies the association role (see the general discussion on making everything a topic, above).

'nameservice'

See the comments under 'occurs' (above) and regarding making everything a topic (above).

FACET

-type-

Should be renamed to -property- and kept in the singular.

FVALUE

-type-

We propose renaming -type- to -value-, to make the semantics clearer, and retaining the singular.

In addition, we propose removing -facetval- and using the content of the element instead. (It has been agreed in SC34 that the declared value of -facetval- should be changed from NAME to CDATA, in order to enable the use of any string or number as the value of a facet. It is a fairly logical extension of this to use the content of the element instead of an attribute.) If specified, -value- takes precedence:

    <facet property="weight">
      <fvalue value="x23g" href="http://lightbook.com">23g</fvalue>
    </facet>

ADDTHMS/ADDTHEMS/MERGEMAP

There are interactions between the 'addthms' element type, the -addthems- attribute on the 'topicmap' element type, and the proposed 'mergemap' element type. Both 'addthms' and -addthems- provide ways of adding themes (globally or selectively) to the scope of characteristic assignments in the current and/or another topic map; and both 'addthms' and 'mergemap' result in the merging of topic maps. By "cutting the cake" differently, we believe it is possible to achieve a simpler and more elegant solution, as follows:

In order to add themes to the current topic map

globally:
reinstate -addthems- on 'topicmap' under the name -scope- (which is consistent with the "misuse" of -scope- on 'topic'), or -themes- (which is more correct, but inconsistent with 'topic')
selectively:
use -scope- on the characteristic assigner elements in question (in other words, don't provide a short cut)

In order to merge other topic maps

with global addition of themes,
with selective addition of themes, or
without the addition of themes:

rename 'addthms' to 'mergemap', and change -addthems- to -scope- leaving it #implied.

PUBLIC SUBJECT IDENTIFIERS

We believe it is important that XTM define a base set of public subject identifiers, and we propose that these should always be URIs (i.e. either URLs or URNs; see RFC 2396 "URI syntax" and RFC 2611 "URN namespace definition mechanisms"). We recommend encouraging the use of URNs in the long term. Possible examples might be as follows:

urn:psi:ch.iso.639.no (the country Norway)
urn:psi:ch.iso.3166.nor (the language Norwegian)
urn:psi:org.topicmaps.psi.association.type-instance
urn:psi:org.topicmaps.psi.association.supertype-subtype

However this requires the specification of a URN scheme (which will take time), so URLs should be defined in the XTM spec in the meantime, e.g.

http://topicmaps.org/psi/association/type-instance
http://topicmaps.org/psi/association/type-instance/type
http://topicmaps.org/psi/association/type-instance/instance
http://topicmaps.org/psi/association/supertype-subtype
http://topicmaps.org/psi/association/supertype-subtype/supertype
http://topicmaps.org/psi/association/supertype-subtype/subtype

At these URLs there should be real resources that are the public subject descriptors.

In addition, PSIs should be defined for all basic TM objects.

MERGING

Some brief comments on merging:

XTM should make absolutely clear what happens when two topics are merged (especially with reference to -id- and -identity-).
It also needs to be made clear that any reference to a topic in another topic map automatically leads to that topic map being merged in (although applications may choose to do the merging lazily)
In addition, the whole process of merging two topic maps needs to be clarified.