Towards a General Theory of Scope

Steve Pepper <pepper@ontopia.net>
Geir Ove Grønmo <grove@ontopia.net>

$Id: scope.htm,v 1.5 2002/10/14 19:08:11 pepper Exp $

This paper is concerned with the issue of scope in topic maps.

Topic maps are a form of knowledge representation suitable for solving a number of complex problems in the area of information management, ranging from findability (navigation and querying) to knowledge management and enterprise application integration (EAI).

The topic map paradigm has its roots in efforts to understand the essential semantics of back-of-book indexes in order that they might be captured in a form suitable for computer processing. Once understood, the model of a back-of-book index was generalised in order to cover the needs of digital information, and extended to encompass glossaries and thesauri, as well as indexes. The resulting core model, of typed topics, associations, and occurrences, has many similarities with the semantic networks developed by the artificial intelligence community for representing knowledge structures. (For a more detailed introduction to the topic map model, see [Pepp00].)

One key requirement of topic maps from the earliest days was to be able to merge indexes from disparate origins. This requirement accounts for two further concepts that greatly enhance the power of topic maps: subject identity and scope. This paper concentrates on scope, but also includes a brief discussion of the feature known as the topic naming constraint, with which it is closely related. It is based on the authors' experience in creating topic maps (in particular, the Italian Opera Topic Map [Pepp01]), and in implementing processing systems for topic maps (in particular, the Ontopia Topic Map Engine and Navigator [Onto01]).

1. What is scope?

1.1 Topic map basics

The organising principle of a topic map is the topic. A topic is essentially an object within the computer that represents (or reifies) some subject of discourse. Topics have subject identity and characteristics. Subject identity is both the metaphysical relationship between a topic (inside the map) and the subject (outside it) that is reified by the topic, and also the means by which the author of a topic map asserts, as unambiguously as possible, what that subject is. Subject identity is the primary and preferred (but not the only) basis for merging topics, and hence also topic maps. If two topics have the same subject identity, then by definition they "represent" the same subject, and must be merged.

In addition to subject identity, topics usually have characteristics. There are three kinds of topic characteristic: names, occurrences, and roles played in associations with other topics.

Names, as the name implies, are the handles by which humans know the subjects, and by which they look them up, for example in a back-of-book index. Sometimes a subject has multiple names, and the topic map model allows for this. Such synonyms are catered for by "see" references in an index (e.g., "Holland, see Netherlands").

Occurrences are information resources that are pertinent to a subject. Again, a topic can have multiple occurrences of different types. In an index, occurrences are what you arrive at if you follow the page number references in the index entry.

Associations express relationships between subjects, e.g. the "takes place in" relationship between the Extreme Markup conference and the city of Montréal. In this relationship, the conference plays the role of "event" and Montréal the role of "location". A topic can play roles in any number of associations. Back-of-book indexes support primitive (i.e., untyped) associations in the form of "see also" references.

1.2 The purpose of scope

Names, occurrences, and roles played in associations are thus the kinds of characteristics that can be assigned to a topic.

Any time we make an assignment of such a characteristic to a topic, we are essentially making an assertion about that topic. However, not all assertions are universally valid:

a name (e.g., "St. Petersburg") may be applicable in some contexts (pre-1914 and post-1991), but not in others;
an occurrence might be pertinent in some situations (e.g., when the user has clearance for top secret documents, or is an expert in the field), but not in others;
an association might state an opinion (e.g., that Stalin was the hero of the Russian revolution) that is not shared by others.

The purpose of scope, simply stated, is to allow the topic map author to express the limits within which such assertions (or characteristic assignments) have validity.

One useful and potentially very powerful application of scope is to permit the capture of different "Weltanschauungen", or world views, of the subject. This is extremely important when merging topic maps, since it permits knowledge of which assertions came from which source to be retained: The individual names, occurrences, and associations can be scoped in such a way as to indicate where they originated. However, this is not the only application of scope, as we shall see.

From the preceding discussion it is obvious, at an intuitive level, that scope is intimately related to the notion of context, which is a concept of major importance in many fields, ranging from philosophy and cognitive psychology to linguistics and artificial intelligence. However, despite its importance, satisfactory definitions are hard to come by and, as [Sowa00] points out, "the word context has been used with a variety of conflicting meanings." Sowa cites the ambiguity of the English word as one source of the confusion and describes the two major senses of the word commonly found in dictionaries:

(basic meaning) A section of linguistic text or discourse that surrounds some word or phrase of interest.
(derived meaning) A nonlinguistic situation, environment, domain, setting, background, or milieu that includes some entity, subject, or topic of interest.

We will assume the second of these as a sufficient working definition for our purposes.

This paper starts by examining scope as defined in ISO 13250 and XTM 1.0 and presents examples of ways in which scope is used, explaining concepts such as "the unconstrained scope" and the use of scope to establish topic name spaces ("the topic naming constraint").

It then goes on to define and advocate a distinction (within the realm of topic mapping) between the terms "scope" and "context", and show how these relate to each other. It argues that scope actually covers a number of related but quite distinct phenomena and that to achieve its goals it requires some form of structuring.

The final sections discuss different forms of processing requirement on scope, and describe some approaches that have already proved fruitful and that could pave the way towards a "general theory of scope".

2. Scope in the topic map context

This section describes scope as defined in the two topic map specifications, ISO 13250 [ISO00] and XTM 1.0 [TMOrg01]. It presents examples of ways in which scope may be used and explains the concepts of "the unconstrained scope" and "the topic naming constraint" (and their attendant problems).

2.1 Definitions of scope and related concepts

The topic map paradigm was first standardised by the ISO in January 2000 after nearly ten years of development [Pepp99]. It is appropriate, therefore, to begin with the ISO definition of scope:

scope

The extent of the validity of a topic characteristic assignment (see the definition of "topic characteristic assignment"): the context in which a name or an occurrence is assigned to a given topic, and the context in which topics are related through associations. This International Standard does not require that scopes be specified explicitly. If the scope of a topic characteristic assignment is not explicitly specified via one or more scope attributes, the scope within which the topic characteristic applies to the topic includes all the topics in the entire topic map; this special scope is called "the unconstrained scope". If a scope is specified, the specification consists of a set of topics, which, in the context of their role as members of such a set, are called "themes". Each theme contributes to the extent of the scope that the themes collectively define; a given scope is the union of the subjects of the set of themes used to specify that scope.

NOTE 3 If it is desired to specify a scope which is the intersection (rather than the union) of two topics, this can be accomplished by creating a topic whose subject is that intersection, and then by using that topic as a theme.

ISO 13250 further defines "theme" and "unconstrained scope" as follows:

theme

A member of the set of topics comprising a scope within which a topic characteristic assignment is valid. See also the definitions of 'scope' and 'topic'.

unconstrained scope

The scope comprised of all of the topics in a topic map. When no applicable scope attributes are explicitly specified as governing a topic characteristic assignment, the scope within which the topic characteristic assignment is made is the unconstrained scope.

NOTE 10 In other words, the unconstrained scope is the default scope. Thus, for example, in a given topic map, if no scope attributes are explicitly specified for the name characteristics of any topics, any two topic links that have any of the same names will be merged, due to the effect of the topic naming constraint.

The key features in these definitions are as follows:

scope delimits the validity of a topic characteristic assignment;
all such assignments have scope;
when no scope is specified explicitly, the scope is deemed to be "unconstrained", and the characteristic assignment has universal validity;
the unconstrained scope consists of the set of all topics in the topic map;
scope is defined in terms of a set of topics, in this context known as "themes";
the more themes that define a scope, the greater its extent;
scope establishes a name space in which no two topics may have the same name without being merged.

The XML Topic Maps specification (XTM) attempted to explain the same concepts in terms that would be more readily understandable in the Web community. It provides the following definitions:

scope

The extent of the validity of a topic characteristic assignment. The context in which a name or an occurrence is assigned to a given topic, and the context in which topics are related through associations.

The set of topics specified via a <scope> element.

See also unconstrained scope.

This specification places no constraints on how applications interpret scope.

unconstrained scope

The absence of a specified scope in the assignment of a topic characteristic.

topic naming constraint

The constraint, imposed by the topic map paradigm, that any topics having the same base name in the same scope implicitly refer to the same subject and therefore should be merged.

These definitions are paraphrased in the "Concepts" section of XTM 1.0 in the following terms:

2.2.1.6 Scope

Scope specifies the extent of the validity of a topic characteristic assignment. It establishes the context in which a name or an occurrence is assigned to a given topic, and the context in which topics are related through associations. Every characteristic has a scope, which may be specified either explicitly, as a set of topics, or implicitly, in which case it is known as the unconstrained scope. Assignments made in the unconstrained scope are always valid.

Scope is considered to establish a namespace for the base names of topics. This leads to the constraint, imposed by the topic map paradigm, called the topic naming constraint, that any topics having the same base name in the same scope implicitly refer to the same subject and therefore should be merged. With the exception of this constraint, the interpretation of a characteristic's scope and its effect on processing is left to the application and is in no way constrained by this specification.

2.2 Scope in 13250 and XTM

Irrespective of whether one believes that the explication of scope in XTM is more readily understandable than that in 13250, it is clear that it is essentially the same concept. XTM chose not to use the term "theme" on the grounds that it causes unnecessary confusion since a theme is in fact a topic. XTM has one usage of the term "scoping topic" (in section "3.3.1 <scope> Element"), which has been proposed as an alternative to "theme". However it is the authors' opinion, having worked extensively with the concept of scope, that the term "theme" can sometimes be useful for disambiguation, and we will use it often in this paper.

There is another subtle difference between 13250 and XTM relating to the unconstrained scope, which 13250 declares to be "comprised of all of the topics" in the topic map, and which XTM carefully avoids characterising in any way other than as the absence of an explicitly specified scope. We shall return to this point later.

One further difference between 13250 and XTM deserves to be mentioned. It concerns the interchange syntaxes defined in the two specifications. In 13250, scope is specified via a "scope" attribute, whereas XTM uses an element type, also called "scope". This is merely a syntactical issue and has no special relevance. However, there are differences as to where scope may be specified:

In 13250, "scope" attributes may appear on <topic>, <topname>, <basename>, <sortname>, <dispname>, <occurs>, and <assoc> elements.
In XTM, "scope" elements may appear as subelements of <basename>, <occurrence>, and <association> elements only.

Furthermore, in 13250, "added themes" may be specified on <topicmap> elements via "addthems" attributes. Finally, sort names and display names are generalised in XTM as "variant names" that have "parameters" consisting of non-empty sets of topics.

Despite these apparent differences, it remains the case that in both 13250 and XTM, scope only applies to topic characteristics (i.e., names, occurrences, and roles played in associations). The "scope" attributes on <topic> and <topname> elements (and the "addthems" attribute on <topicmap> elements) are merely syntactic conveniences for specifying themes that contribute to the scope of all characteristic assignments contained within the elements that have them. Thus, themes specified via a "scope" attribute on a <topname> element are added to the scopes of the base names, sort names, and display names that it contains; themes specified via a "scope" attribute on a <topic> element are added to the scope of the base names, sort names, display names, and occurrences that it contains; and (added) themes specified via an "addthems" attribute on a <topicmap> element are added to the scope of every base name, sort name, display name, occurrence, and association in the whole topic map.

Failure to understand this can lead to misconceptions such as that "topics have scope". Topics do not have scope; topic elements may have scope attributes (in 13250), but only topic characteristics (or more precisely, assignments of topic characteristics) have scope.

2.3 Some typical uses of scope

Having presented the underlying concepts of scope, we now turn to some concrete examples that illustrate the application of scope in practice. We do so by looking at each kind of topic characteristic in turn.

2.3.1 Names

Any subject can potentially have many names, and in real life, most do. To the extent that a topic map author wishes to use multiple names, scope provides the means for indicating when each name is appropriate. Themes used for scoping topic names can often be grouped into classes, some of which might be:

natural languages: e.g., English ("country"), French ("pays"), Norwegian ("land")
controlled vocabularies: e.g., ISO 3166 alpha 2 ("NO"), ISO 3166 alpha 3 ("NOR"), ISO 3166 numeric ("578")
usage: e.g., nickname, colloquial name, nom de plûme, real name etc. ("The Big Apple", "The Eternal City", "The Land of the Midnight Sun", "George Orwell", "Eric Blair", "Lev Bronstein")
historical period: e.g., 1703-1914 ("St. Petersburg"), 1914-1924 ("Petrograd"), 1924-1991 ("Leningrad"), 1991-present ("St. Petersburg")

One interesting question that this list of examples raises is whether there would be any use for scope (on names) if the topic map paradigm had a construct for typed names? We think that the answer is yes, in that historical period, and perhaps also natural language, would be inconvenient to model using type. However, controlled vocabularies and usages such as nom de plûme, could just as easily and probably more correctly be modelled using type.

Another, more application specific usage of scope on names, is for distinguishing between multiple names (or labels) for association types. Topics that represent classes of relationships, i.e., association types, can be named (in many languages) according to a number of conventions:

by a noun that expresses the nature of the relationship (e.g., "first performance")
by a compound created from the names of the roles involved (e.g., "teacher/pupil")
by a verb that expresses the nature of the relationship (e.g., "born in")

The last of these provides the opportunity for an interesting application of scope. The use of a verb implies the use of a subject-verb-object construction (at least for binary associations), and this in turn implies a direction. However, in the topic map paradigm, there is no notion of directionality in associations: The relationship described by the statement "Puccini was born in Lucca" can equally well be described by the statement "Lucca was the birthplace of Puccini". Whether one describes the relationship as "born in" or "birthplace of" really depends on the vantage point from which one is viewing the relationship, i.e., whether it is viewed from the vantage point of the topic playing the role of person or from the vantage point of the topic playing the role of place.

That being the case, it makes sense to give the topic that types the association multiple names, including names scoped by the roles person and place, as follows:

<topic id="birthplace">
  <topname>
    <basename>birthplace</basename>
  <topname>
  <topname scope="person">
    <basename>born in</basename>
  <topname>
  <topname scope="place">
    <basename>birthplace of</basename>
  <topname>
</topic>

Given this topic, used to type associations, an application can choose to label associations of this type differently depending on the context. Ontopia's Topic Map Navigator, which has a user interface built around the notion of "the current topic", uses this facility to label the relationship as "born in" when the current topic plays the role of person, and "birthplace of" when the current topic plays the role of place. This can be seen as an application specific convention for using scope. However, in this particular case, the convention does seem intuitive and useful enough to be worthy of general adoption. We can therefore extend our list of classes of usage as follows:

role-based naming of relationships: e.g., "born in" (person), "birthplace of" (place); "teacher of" (teacher), "pupil of" (pupil)

A further category of scope usages on base names is due to the topic naming constraint and is explained in section 2.4.

2.3.2 Variant names

As mentioned above, XTM has generalised the concepts of sort names and display names into those of variant names. A variant name is

an alternative form of a base name, that is optimized for a particular computational purpose, such as sorting or display. It may be any kind of a resource, including a string. An application chooses among variant names by evaluating their parameters...

Parameters are information, in the form of a set of topics, that expresses the appropriate processing context for a variant name. Having selected a particular topic name, an application may choose to examine the parameters of its variants (if any) in order to select the most suitable form of that name.

In order to maintain compatibility with 13250, XTM also defined published subject indicators for the concepts of "suitability for sorting" and "suitability for display". Other possible uses of variants could be to express alternative word forms (such as irregular plurals) or alternative search forms for a given name (e.g., "Trotski" and "Trotskij" for "Trotsky").

The XTM interchange syntax allows variants to be nested, but for processing they are "flattened" and any parameters in outer layers are inherited inwards. Since parameters are topics, variants can be considered to be "scoped" with respect to their base names with parameters performing exactly the same function as themes. We will not discuss variants in any further depth in this paper since the same basic considerations apply as for the scoping of topic characteristics.

2.3.3 Occurrences

At the time of writing, the topic map community seems to have had rather less experience with scope applied to occurrences and association roles than it has with scope applied to base names. In the case of occurrences, it seems that the following axes of scope are likely to be among the most commonly used:

access level: e.g., security (unclassified, classified, top secret, etc.); user level (beginner, intermediate, expert; 1st grade, 5th grade, 9th grade; etc.)
domain: e.g., subject area (history, biography, culture, politics); historical period (ancient, classical, medieval, modern; past, present)
natural language: e.g., English, French, Norwegian
resource location: e.g., offline/online
viewpoint: i.e., which authority asserts the pertinence of the resource to the topic

In all of these cases (except the last), and perhaps in particular in the case of resource location and natural language, the axis to which the theme belongs and the theme itself constitute a kind of property-value pair (e.g., "language = French"). This raises the question whether that property properly belongs with the occurrence or with the resource. Given that a resource can be an occurrence of more than one topic, there is a significant difference between the two. For example, while it is conceivable (although not perhaps very likely) that a certain resource might be regarded as "intermediate level" when viewed as an occurrence of one topic, and "beginner level" when viewed as an occurrence of another, it is unlikely ever to be the case that one and the same resource has validity in the scope "English" when viewed as an occurrence of one topic and validity in the scope "French" when viewed as an occurrence of another.

In cases like this, should scope be used at all? The argument can be made that a property like language should rather be expressed using resource metadata such as facets (in 13250) or RDF. On the other hand, within a controlled environment, and if there is reasonable certainty that individual resources will seldom (if ever) be occurrences of multiple topics, the use of scope to express resource properties might be deemed acceptable.

The Italian Opera Topic Map originally used scope on occurrences to express language and resource location properties. The current version (at least, at the time of writing) takes the view that these are more properly expressed using facets (which is possible since the IOTM is maintained using the 13250 syntax). However, this information is represented via scope when the topic map is converted to XTM syntax, since the representation of 13250 facets in XTM has not yet been formally articulated. (The process involves reifying the resource as a topic and representing the property values as either occurrence or association characteristics, but the details have still be to spelled out.)

2.3.4 Association roles

The ways in which scope can be applied to association roles are similar to the ways in which it can be applied to occurrences. The main areas of application would seem once again to be access level, domain, and viewpoint.

The use of natural languages to scope associations would seem to be restricted to topic map applications whose subject matter is language-related; e.g., the topic "pays" of type "word" could be an instance of the class "noun" in the scope "French" and an instance of the class "verb" in the scope "English".

With occurrences and associations there is sometimes an interaction between the class to which the characteristic belongs (its type) and the theme by which it is scoped. For example, in the case of scoping by subject domain, it would be inconsistent to scope some associations of type "born in" by the theme "biography" but not other associations of the same type. In this case the theme "biography" is more like a property of the association type than a property of the association itself. (The ability to specify scope rules through some kind of topic map schema would help prevent inconsistencies in this kind of situation.)

Scoping by historical period, on the other hand, is more often a property of the individual association than the association type, as the following example (furnished by Lars Marius Garshol) illustrates:

/past written-in( [vietnamese], [chinese-script] )
/present written-in( [vietnamese], [latin-script] )

(The notation shown above is the Linear Topic Map notation [Gars01]. The example shows two associations of type "written-in" scoped by the themes "past" and "present" respectively.)

2.4 Scoping due to the topic naming constraint

A full treatment of the topic naming constraint (TNC) is beyond the scope of this paper and the following discussion will confine itself to the impact of the TNC on our understanding of the topic map concept of scope.

The TNC states that no two subjects can have the same base name in the same scope; if a topic map interchange document has two <topic> elements with the same name in the same scope, they must be merged during processing. As a result of this constraint, a scope can be considered to establish a topic name space within which all base names are unique.

2.4.1 Examples of topic name conflicts

Obviously, in the real world, the same name may refer to more than one subject. A typical example would be the name "Paris", which refers (perhaps among other things) to:

the capital of France,
a city in Texas,
a Trojan prince in Greek mythology,
a character in Shakespeare's Romeo and Juliet, and
a genus of temperate woodland plants.

Other examples, taken from the Italian Opera Topic Map, are:

Macbeth (an opera by Verdi, and the play by Shakespeare on which it is based)
Iris (an opera by Mascagni, and also its principal character)
Lucca (a city in Tuscany and the name of a music publisher)
La Bohème (two operas, by Puccini and Leoncavallo)
Beppe (characters in the operas Pagliacci and L'amico Fritz)

In order to avoid having such disparate subjects merged into single topics, a topic map author is forced to either (1) qualify names in such a way that they are unique; (2) scope the names; or (3) both of the above.

There are as yet no clearly established methods of best practice for choosing among these approaches. (But if such conventions could be established, they would greatly enhance the mergeability of topic maps.) To a certain extent, the choice will be depend on the application, and one's view of the unconstrained scope (about which, more below). For example, if one believes (as some do) that it is usually good practice to ensure that every topic has a name in the unconstrained scope, it becomes necessary to use qualifiers in order to achieve uniqueness:

<topic id="macbeth1" types="play">
  <topname>
    <basename>Macbeth (play)</basename>
  </topname>
</topic>

<topic id="macbeth2" types="opera">
  <topname>
    <basename>Macbeth (opera)</basename>
  </topname>
</topic>

If, on the other hand, there is no requirement for every topic to have a name in the unconstrained scope, the use of themes to scope base names will solve the problem on its own:

<topic id="macbeth1" types="play">
  <topname scope="play">
    <basename>Macbeth</basename>
  </topname>
</topic>

<topic id="macbeth2" types="opera">
  <topname scope="opera">
    <basename>Macbeth</basename>
  </topname>
</topic>

Whichever approach is chosen, a lot of the same considerations will apply, and there will often be a close correlation between the qualifiers and the themes, as the preceding examples show ("play" and "opera" in both cases above). In fact, one might want to use both approaches at the same time, as follows:

<topic id="macbeth2" types="opera">
  <topname>
    <basename>Macbeth (opera)</basename>
  </topname>
  <topname scope="opera">
    <basename>Macbeth</basename>
  </topname>
</topic>

2.4.2 Living with the topic naming constraint

One useful way of approaching the question of how best to qualify or scope base names is to ask how homonyms (identical names) are disambiguated in other forms of human communication, for example in everyday conversation. It turns out that this is almost invariably done by reference to another, associated topic:

Macbeth: "Do you mean the play or the opera?". Or alternatively, "Do you mean Verdi's Macbeth or Shakespeare's?"
Iris: "Iris the opera or Iris the character?"
Lucca: "The city Lucca or the publisher?"
La Bohème: "Puccini's or Leoncavallo's?"
Beppe: "The one in Pagliacci or the one in L'amico Fritz?"
Paris: (assuming that some kind of geographical context has already been established) "Do you mean Paris, France or Paris, Texas?"; (assuming a literary context) "The Paris of The Iliad or the Paris of Romeo and Juliet?"; (assuming a broader cultural context) "The Paris of Greek mythology or the Paris of Shakespearean drama?"

As these examples indicate, disambiguation is achieved either by reference to the class to which the topic belongs (play or opera, opera or character, city or publisher), or by reference to some other topic with which it is associated by some kind of defining relationship.

It seems that humans most readily disambiguate by reference to a classification scheme based on a class hierarchy. When that fails (because the subjects in question belong to the same or similar class), the fallback is either to classification by subject domain (geography, literature, culture), or some property that is fundamental to all instances of the class in question (in the case of operas, the "composed by" relationship; in the case of cities, the "located in" relationship; in the case of characters, the "appears in" relationship; etc.).

These insights can provide pointers as to how one might go about using scope to disambiguate base names in an orderly and intuitive manner. For example, one might formulate the following rules:

like-named subjects belonging to different classes are scoped by their classes (provided those classes are sufficiently different; "city" and "town", for example, might be deemed to be insufficiently different for the purpose of disambiguation)
like-named subjects belonging to the same class are scoped by the topics with which they are associated by that class's defining relationship

This is the approach that has been used in the Italian Opera Topic Map, with the result that the following topics are used (in most cases once only) as themes:

poem, novel, play, publisher, character (classes of topics used to disambiguate topics of those types)
Le Villi, Pagliacci, L'amico Fritz (operas used to disambiguate topics of type "character"); Boito, Mascagni, and Leoncavallo (composers used to disambiguate topics of type "opera")

These themes are, in a sense, incidental, in that their only purpose is to disambiguate in very specific situations, rather than to define broad scopes that have applicability throughout the map. (We shall return to this point later.)

2.5 The unconstrained scope and its attendant problems

The unconstrained scope is defined in 13250 as "the scope comprised of all of the topics in a topic map". It is the default scope for any characteristics that are not assigned within an explicitly specified scope. Characteristics in the unconstrained scope are always valid.

After publication of 13250, doubts were raised as to whether it is correct to regard the unconstrained scope as the scope comprised of all topics in the map, and for that reason XTM 1.0 avoided saying so explicitly. Topicmaps.net's Processing Model for XTM 1.0, published by two of the editors of 13250 but currently without formal status, explicitly states that the unconstrained scope is "comprised of the null set of topics" [TMnet01].

Despite these differences, there is general agreement that characteristics in the unconstrained scope are always valid, and that is sufficient for the purpose of this paper.

One question that is raised by the existence of the unconstrained scope has already been alluded to: Whether it is good practice to always give topics names in the unconstrained scope or, on the contrary, to avoid ever giving topics names in the unconstrained scope. The arguments for and against are as follows:

Topics that do not have a name in the unconstrained scope will tend to "disappear" in certain constrained user contexts. While this may be the desired effect, it could also have unexpected side-effects. For example, how should an application handle a situation in which a topic has an occurrence or association characteristic in a certain scope, but no name in either that scope or the unconstrained scope? If a policy of allowing topics to have no name in the unconstrained scope is followed, it would seem to be advisable to at least ensure that every topic has a name in every scope in which it also has characteristics. This could be very resource-consuming.

If, on the other hand, there is no particular context within which the application should apply scope, what should the policy be with respect to topics that only have names within specific scopes? Presumably they should not "disappear" (since the user context is not constrained), but in that case, how should the application choose which name to use? Some situations might permit the use of all the names, but others will require the selection of a single name.

If the avoidance of names in the unconstrained scope can be problematic, so too (thanks to the topic naming constraint) is the use of them. First of all, within a single topic map, the more names in the unconstrained scope, the greater the likelihood that the author will be forced to qualify those names in more or less intuitive ways. Secondly, the greater the use of names in the unconstrained scope, the greater the danger that undesired merging of topics will occur when two topic maps are merged.

There is no easy answer to this dilemma, but it is in the nature of information management, especially when the goal is no less than global knowledge federation, that some problems are simply hard. At least the topic map paradigm forces awareness of the problem and provides a starting point for tackling it.

2.6 Scope as a set of themes

According to the 13250 definition, scope is "the union [our emphasis] of the subjects of the set of themes used to specify that scope". NOTE 3 underlines this by making clear that in order to express the intersection of two topics, a new topic must be created. Thus a scope constituted by the themes "history" and "economics" covers the sum total of both of those subject domains. To describe their intersection (i.e., a single domain that has both a history component and an economics component), a new topic must be created (e.g., "economic history" or "history of economics").

As a result, the act of adding a theme to a scope usually has the effect of broadening or extending the scope. However, if the unconstrained scope is regarded as "the scope comprised of all of the topics in a topic map", then this is not always the case: Adding a theme to the scope of a characteristic assignment that is valid in the unconstrained scope has the effect of drastically narrowing the scope. (It is not clear how proponents of the view of the unconstrained scope as being the "null set of topics" regard the same operation.)

2.7 Scope in perspective

This section has presented the core ideas of scope and related concepts as defined in the topic map paradigm. From the discussion and the examples provided it is clear that the concept of scope covers a multitude of sins. There are three or four quite different notions -- effectivity, relevance, viewpoint, and name spacing -- all trying to work using the same mechanism. The question is, how can all these things be usefully applied within an application? The next section attempts to provide a basis for answering this question by examining the relationship between scope and context.

3. Putting context into topic maps

This section introduces a distinction, particular to the realm of topic maps, between "scope" and "context", discusses various aspects of the latter in the context of topic maps, and shows how the two concepts can be made to relate to each other.

Scopes, as defined and discussed in the previous section, are clearly a constituent or property of topic maps themselves. They can be regarded as static in the sense that they do not change unless the topic map itself changes, i.e., for any given assignment of a characteristic to a topic the scope is fixed by the author of the topic map.

Context, on the other hand, exists outside the topic map. It is the combined "situation, environment, domain, setting, background, or milieu" (to use our working definition) that surrounds a user's interaction with a topic map. As such it is dynamic, varying from user to user, from session to session, and even with the mood of the user.

Context in this sense can be broken down into a number of different parameters, all of which are extrinsic to the topic map (although sometimes intimately related to it). We have identified the following types of parameter:

user profile
user preferences
session history
application context

User profile is about who the user is, both objectively and from the viewpoint of the application. It covers attributes such as age, gender, skill set, and language proficiency, that are more or less objective facts about the user, and also attributes such as subscription type and security clearance level, that relate to the user's relationship to the application. Such attributes may (and usually are) consciously supplied by the user, but they may also be inferred by the application on the basis of the user's past and present interactions.

User preferences are about what the user wants, i.e., his or her immediate (subjective) goals or interests. These usually relate to a specific session of interaction with the system. Examples might be that the current interest is biographical, rather than geographical or musical, or that the preferred language is some other than the default specified by the user profile, etc.

Session history is about what the user has done, i.e., the history of the current session: which interactions have been made, which navigational paths have been chosen, which topics have already been visited, etc. It is based on the user's actions but is under the interpretation and control of the application.

Application context is about the application itself, its design, architecture, and user interface, and the ways in which these can encourage the interpretation of scope in certain ways rather than others. An example of application context is the behaviour, described in section 2.3.1 above, of the Ontopia Topic Map Navigator in selecting names for association types whenever there is a "current topic", based on the role played by that topic in the association.

Despite the fact that all of these parameters exist outside the topic map, there is no reason why they could not be expressed in terms of topics -- indeed, in terms of a similar set of themes to those used to express scope within the topic map -- and this is the approach we advise. Its advantage is that it enables applications to determine how to use the guidance of scopes by evaluating two sets of topics against each other.

The next section takes this approach as its starting point and discusses some of the processing issues relating to scope.

4. Some approaches for processing scope

The discussion of the uses of scope in sections 2.3 and 2.4 was entirely from the point of view of an author wishing to express his or her intention as precisely as possible. But what is the "business problem" -- from the perspective of the user of the topic map -- that scope can help solve?

The short answer is, noise. The prime use of scope for a user is to eliminate the unnecessary, the uninteresting, or the irrelevant, in order to end up with a useful and manageable subset of the information contained within the map, such that it can be presented in a manner that is maximally appropriate for the user. The essential operations that need to be performed on the basis of scope are:

filtering, defined as the removal of irrelevant topic characteristics,
ranking, defined as the ordering of characteristics according to their relevance, and
selecting, defined as choosing the most relevant characteristic in a given situation.

The results of filtering a topic map by scope should be a new topic map whose characteristics are a subset of the original topic map's characteristics. (A filtered topic map can therefore be regarded as the equivalent of a "view table" in a relational database.)

Filtering can be performed on a whole topic map at once; ranking, on the other hand, only makes sense among narrowly restricted sets of characteristics, for example among the occurrences of a certain topic. And selection is really only a special case of ranking: To select the most relevant characteristic from a set one simply ranks them and then chooses whatever achieves the highest ranking.

While filtering, ranking, and selecting can be applied to all three kinds of characteristics (names, occurrences, and roles played in associations), there seems to be a tendency for names to have slightly different processing requirements. For occurrences and associations, the most common requirement is to filter first and then rank what is left. For names, the most common requirement is to rank first, and then select.

4.1 Filtering by scope

Since both scope and context are expressed in terms of sets of topics, or themes, our first approach was to consider the evaluation of scope through operations on sets. We deliberately excluded both the unconstrained scope and what might be called the "unspecified context" from consideration. The reason for doing so is not the uncertainty surrounding whether the unconstrained scope (and perhaps also the "unspecified context") consists of the null set or the set of all topics, but because they are known to have special status. If a characteristic is in the unconstrained scope it is always valid, and should therefore never be filtered out. If the context is unspecified, there are no criteria by which to determine filtering and therefore no filtering should take place.

Given two (non-empty) sets of themes, A and B, there are a number of possibilities that are worth investigating:

A is a superset of B
A is a subset of B
(A is a superset of B) AND (A is a subset of B)
(A is a superset of B) OR (A is a subset of B)
The intersection of A and B is non-empty

If we let A represent scope and B represent context, these possibilities can be further characterised as follows:

1. scope is a superset of context: For a characteristic to be retained during filtering, all of the context themes must be present in the characteristic's scope. For example, if a user has specified an interest in the themes "French" and "beginner", then only characteristics that are scoped by both of those themes (and possibly others as well) should be retained; characteristics that are only scoped by "French" (or "beginner") will be removed. (This is in some ways the equivalent to the use of AND between keywords in a search engine.)
2. scope is a subset of context: For a characteristic to be retained during filtering, all of its scope themes must be present in the context. For example, a characteristic scoped by "French" and "beginner" will only be retained if the user specifies an interest in both of these themes; it will be removed if the user only specifies an interest in one of them.
3. scope is both a superset AND a subset of context: For a characteristic to be retained during filtering, the set of themes that contribute to its scope must be identical to those that constitute the context. A characteristic scoped by "French" and "beginner" will only be retained if the context is defined by exactly those two themes, no more and no less.
4. scope is either a superset OR a subset of context: For a characteristic to be retained during filtering, either all the scope themes must be present in the context, or all the context themes must be present in the scope. The "French/beginner" characteristic will be retained if one or the other of those themes are present in the context (provided the context does not include additional themes), or if both those themes (and possibly others) are present in the context.
5. scope and context have a non-empty intersection: For a characteristic to be retained during filtering, its scope must have at least one theme in common with the context. (The "French/beginner" characteristic will be retained if either or both of those themes are present in the context.)

In order to investigate the usefulness of the evaluation methods described above, we use the following simple example of a topic with multiple occurrences in different scopes (here we use 13250 syntax for brevity):

<topicmap>

  <!-- topics for A, B, and C omitted -->

  <topic id="foo">
    <occurs scope="A">...</occurs>
    <occurs scope="B">...</occurs>
    <occurs scope="A B">...</occurs>
    <occurs scope="A B C">...</occurs>
  </topic>

</topicmap>

In set notation, this topic map has the following scopes: {A}, {B}, {A, B}, and {A, B, C}.

The map is filtered using methods 1-5, above, and the scopes are evaluated in six different contexts: {A}, {B}, {C}, {A, B}, {A, C}, and {A, B, C}.

The results for the first filtration method (1. superset: all the themes that constitute the context must contribute to the characteristic's scope) are as follows:

Context = {A}:

<occurs scope="A">...</occurs>
<occurs scope="A B">...</occurs>
<occurs scope="A B C">...</occurs>

Context = {B}:

<occurs scope="B">...</occurs>
<occurs scope="A B">...</occurs>
<occurs scope="A B C">...</occurs>

Context = {C}:

<occurs scope="A B C">...</occurs>

Context = {A, B}:

<occurs scope="A B">...</occurs>
<occurs scope="A B C">...</occurs>

Context = {A, C}:

<occurs scope="A B C">...</occurs>

Context = {A, B, C}:

<occurs scope="A B C">...</occurs>

For ease of comparison, rather than present the results of all five algorithms for each of the four scopes in each of the five contexts in the form shown above, they are summarised in Table 1. Each cell in the body of the table contains the same four scopes at the intersection of an algorithm and a context. Scopes shown in bold are considered by the algorithm in whose column they appear to be "in scope" with respect to the context in whose row they appear.

What conclusions can be gleaned from these results?

Looking at the first column (describing the "superset" algorithm) we see that the more themes that constitute the context, the more characteristics get filtered out. Is this an intuitive result? No. The fact that a user adds a new theme to his or her preferences surely extends the domain of interest?

Then what about the second column (describing the "subset" algorithm)? Here we see that for any given context, the more themes that contribute to a characteristic's scope, the more likely the characteristic is to be filtered out. Is this an intuitive result? Again, no. The definition of scope states that additional themes extend the validity of the characteristic.

The AND algorithm (both superset and subset) shown in the third column only returns, as might be expected, characteristics whose scope is identical to the set of themes comprising the context. This might be useful when a "closest match" (i.e., selection) is required, but would seem to be unnecessarily restrictive in the general case. For example, in context {C}, is not a characteristic whose scope is {A, B, C} to be considered valid? Similarly, if the context is {A, C}, would it always make sense to filter out a characteristic in the scope {A, B, C}?

The OR algorithm (either superset or subset), column 4, is much less restrictive. A characteristic is considered to be "in scope" if its scope has one or more themes in common with the context, unless both scope and context contain a theme not present in the other. This latter proviso seems to be somewhat arbitrary. If the context consists of "N" themes, and the characteristic is scoped by all but one of them, is it not likely that the user would at least sometimes be interested in that characteristic?

The final algorithm, which tests for a non-empty intersection of the two sets, produces results very similar to the OR algorithm. The only difference is that the characteristic in the scope {A, B} is not removed in the context {A, C}. However, the similarity between the two sets of results is possibly deceptive, being due to the close correspondence between the set of scopes and the set of contexts being evaluated against each other. (Four of the six contexts are identical to the four scopes, and the other two, while not identical, at least consist of themes that are already present as themes in the topic map.) It is to be expected that the less overlap there is between the contexts and the scopes, the more algorithm 5 would tend to produce different (and less restrictive) results than algorithm 4.

Our first conclusion from this work has been that none of the four algorithm based on the idea of supersets and subsets produces consistently intuitive and useful results. At first we found this surprising but, on reflection, it seems to confirm a gut intuition that scope as a flat set of themes is insufficiently expressive. It is probably also related to the fact that adding a single theme to the scope produces different effects depending on whether the initial state is the unconstrained scope or some specific scope (see section 2.6, above).

It seems that scope may need structure in order to be truly useful. (We return to the idea of structured scope later in this section.) In the absence of any agreed upon way of structuring scope, the algorithm that provides the most consistently intuitive and useful results for filtering is the one that looks for a non-empty intersection between scope and context, and this is the one that is currently implemented in the Ontopia Navigator.

4.2 Ranking by scope

The other commonly required operation on topic characteristics that relates to scope is that of ranking. Some characteristics (e.g., occurrences) may be more relevant for a certain user than others. This does not necessarily mean that those other characteristics are completely irrelevant, merely that they are of less interest. They should not be filtered out, but they should be given less prominence; in a list of occurrences, they should appear towards the end, whereas the most relevant ones should appear towards the top.

Selection (of the single most relevant or appropriate characteristic from a set) is, as we have seen, a specialisation of ranking, in that ranking must be performed first in order to know which characteristic is most relevant. This kind of processing is most common in the case of names, since in many situations an application needs to find the single, most appropriate, label for a topic. But it can also be applied to other kinds of characteristic (e.g. selecting the most pertinent occurrence of type "definition").

Selection could be performed using the AND algorithm described above and the results would be as desired whenever there was a characteristic whose scope was comprised of exactly the same set of topics as the context. However, it would fail whenever no characteristic fulfilled that requirement, and it cannot handle the more general problem of ranking.

Given that there is as yet no well-defined way of structuring scope, Ontopia has implemented a ranking algorithm based on a flat set of themes. The heart of the algorithm is a simple comparator that takes as parameters two scoped objects (A and B) and a context, and returns 1, -1, or 0, depending on whether A is more relevant than B, less relevant, or equally relevant:

// Count number of matching themes
int matches = 0;
Iterator iter = scope.iterator();
while (iter.hasNext()) {
  TopicIF theme = (TopicIF)iter.next();
  if (scope1.contains(theme)) matches = matches + 1;
  if (scope2.contains(theme)) matches = matches - 1;
}

// Rank by matched themes
if (matches > 0)
  return -1;
else if (matches < 0)
  return 1;

// Rank by lesser scope
if ((scope1.isEmpty() && !scope2.isEmpty()) || (scope1.size() < scope2.size()))
  return -1;
else if ((scope2.isEmpty() && !scope1.isEmpty()) || (scope2.size() < scope1.size()))
  return 1;

// System.out.println("1:" + obj1 + scope1.size() + " 2:" + obj2 + scope2.size());
if (subcomparator == null) return 0;

// Use subcomparator when equally ranked
return subcomparator.compare(obj1, obj2);

The basic criterion for deciding what is most relevant is the size of the intersection (in terms of the number of themes) between the characteristic's scope and the context: The more themes there are in common, the more relevant the characteristic is considered to be. If both characteristics have the same number of themes in common with the context, then the characteristic that is scoped by the least number of themes is considered most relevant.

(If the result is still a draw, further evaluation can be handed off to a subcomparator, if one exists. However, that subcomparator will only be able to adjudicate on the basis of some knowledge of the semantics of the themes and their relative importance to the user. Of necessity this will be application and domain specific since there is, as yet, no way of expressing such information in a standardised manner.)

By calling the ScopedIFComparator method repeatedly, higher level routines can easily build a list of characteristics ordered by their relevance. This list can then be presented in its entirety (or truncated at some defined cut-off point), or else the single most relevant characteristic can be selected from the top of the list.

4.3 Structured scope

Space does not permit a more detailed examination of the underlying reasons why the expression of scope as a flat set of themes is less than satisfactory. However, the preceding sections seem to suggest that this is the case and that one approach to solving this problem might be to introduce some notion of structure in scope. This section presents some ideas that might be relevant in that connection.

4.3.1 "Principal" and "incidental" themes

There would appear to be a rather fundamental difference between themes that are used consciously by a topic map author to "segment" a topic map into more manageable parts, and those themes that an author is "obliged" to use in order to avoid the topic naming constraint, or for some other application-related purpose:

The topics "Leoncavallo" and "Puccini", used once only as themes to disambiguate the name "La Bohème", are clearly not in the same league as themes such as "French", "biography", or "beginner", that are used consistently and repeatedly throughout the map.
Neither are the themes "person" and "place" used to scope base names for the association typing topic "birthplace" described in section 2.3.1, above.

Such themes can be viewed as in a sense "incidental" and should not appear in a user interface designed to allow a user to specify his or her preferences in terms of the themes that are used within the topic map. Rules for their exclusion could be formulated as follows:

Rule 1: IF topic "A" is a theme that scopes a base name of topic "B" AND ("B" is associated with "A" OR "B" is an instance of "A" ) THEN exclude theme "A"
Rule 2: IF topic "A" is a theme that scopes a base name of topic "B" AND "B" is an association type AND "A" is the role type in at least one association of type "B" THEN exclude theme "A"

4.3.2 Scoping by type of characteristic

From the discussion in sections 2.3 and 2.4, above, it is clear that scope is used differently for different kinds of characteristics. For example, scoping by natural language is relevant for names and occurrences, but not (usually) associations. At the start of this section we noted that scope processing differs according to the kind of characteristic. For examples, names are less frequently ranked, while associations and occurrences are seldom selected.

One conclusion that could be drawn from this is that it may be worthwhile to maintain separate contexts for names, occurrences, and associations. Another is that this could be a useful way of subdividing themes, for example in a user interface for indicating preferred themes.

4.3.3 Axes of scope

From a user's point of view at least, themes often fall into categories that can be thought of as representing different axes of scope. Typical axes might be:

natural language
type of name
historical period
subject domain
location
viewpoint
access level

Utilising this fact will at the very least result in better interfaces through which users can indicate the themes they are interested in. It may also turn out that such axes may be used to convey additional information that can make the evaluation of scope by an application more robust. One way in which this can work is by using the classes to which scoping topics belong as the axes of scope. This requires topic map authors to be consistent in specifying class structure, but this is in general a good design principle to apply when creating a topic map.

This leads to the more general point that a topic map author should have a conscious and planned approach to the use of scope, in order to make it consistent, and thus more useful from the user's point of view. Ideally, scope should be part of the schema of the topic map, so that guidelines for the use of scope within the application are clearly documented and adherence to them can be verified.

4.3.4 Weighted themes

Going one step further, one could also envisage the specification of different weights to different axes of scope (e.g., indicating that language is more important that viewpoint). This could be extended to weightings within a single axis (e.g., ranking languages in order of preference) and perhaps complemented by the ability to specify a cut-off point (e.g., any theme with a weighting less than 0.3 is of no interest).

4.3.5 Applying structure

It remains to be seen how effective such scope structuring principles will be for the processing of scope, but even at the simple level of providing a user interface for specifying themes of interest, the results are encouraging.

Consider as an example the Italian Opera Topic Map which uses the following themes (all 64 of them):

Arizona Opera; biography; Boito, Arrigo; Centro studi Giacomo Puccini; character; CIA; city; composer; containee; container; English; French; full name; geography; I Vespri siciliani; ISO 639-2:1996 Alpha-3 language codes (bibliographic); ISO 639-2:1996 Alpha-3 language codes (terminological); ISO 639:1988 language codes; Italian; Italian; L'amico Fritz; Land of Verdi; Le Villi; Leoncavallo, Ruggero; librettist; literature; Lyle K. Neff; Mark D. Lew; Mascagni, Pietro; Mascagni.org; method; music; Naxos; nom de guerre; nom de plume; Norwegian; novel; offline; online; ontology; Ontopus; opera; Opera Glass; Opera News; Opera Web; Opera-l synopsis project; OperaResource; Pagliacci; perpetrator; place; play; poem; publisher; Puccini, Giacomo; pupil; source; Store Norske Leksikon; style; subclass; superclass; teacher; theatre; work; writer

Without the application of structuring principles, a user being asked to specify which themes he or she was interested in would be confronted with a long and chaotic, even incoherent, list and the task would be almost impossible.

By contrast, the following table (based on the interface and functionality of the Ontopia Topic Map Navigator) shows the results of filtering out incidental themes, and then grouping the remainder first by type of characteristic, and then by axis (or category, as it is called here):

Name Context
Select the name themes you are interested in.

category: code type
	ISO 639-2:1996 Alpha-3 language codes (bibliographic)
	ISO 639-2:1996 Alpha-3 language codes (terminological)
	ISO 639:1988 language codes
category: language
	English
	Italian
category: name type
	full name
	nom de guerre
	nom de plume

Association Context
Select the association themes you are interested in.

category: subject domain
	biography
	geography
	literature
	music
	ontology

Occurrence Context
Select the occurrence themes you are interested in.

category: language
	Italian
	Norwegian
category: location
	offline
	online
category: publisher
	Arizona Opera
	Centro studi Giacomo Puccini
	CIA
	Land of Verdi
	Lyle K. Neff
	Mark D. Lew
	Mascagni.org
	Naxos
	Ontopus
	Opera Glass
	Opera News
	Opera Web
	Opera-l synopsis project
	OperaResource
	Store Norske Leksikon

This illustration demonstrates quite effectively the usefulness of structured scope. In terms of actual processing of scope the use of structured scope may prove even more useful. However, more work needs to be done before this can be demonstrated.

5. Towards a general theory of scope

This paper has provided background for understanding the nature of scope in topic maps. It has presented the both basic and related concepts and provided examples of how scope might be used by topic map authors. It has also discussed the issue of processing scope and how an application might use the guidance offered by scope in order to enhance the experience of the user.

Our goal was not only to demonstrate the power of scope, but also to expose some of its complexity. We have posed a number of questions and presented a lot of ideas, but provided few clear answers.

This is because there are no easy answers. A lot of work remains to be done on understanding the interactions between scope and context, on defining conventions for the application of scope, and on devising approaches for processing it. In short, there is a need for a general theory of scope. We hope this paper can serve to provoke further research and experimentation towards that goal.

Acknowledgements

Thanks to Ann Wrightson and Lars Marius Garshol for many useful comments on earlier drafts of this paper.

References

[ISO00] ISO/IEC 13250:2000 Topic Maps, ISO, Geneva, available online at http://www.y12.doe.gov/sgml/sc34/document/0129.pdf.

[Gars01] Garshol, Lars Marius (2001) The Linear Topic Map Notation, Ontopia, Oslo, available online at http://www.ontopia.net/download/ltm.html.

[Onto01] Ontopia (2001) Ontopia Software Products, Ontopia, Oslo, http://www.ontopia.net/solutions/products.html.

[Pepp99] Pepper, Steve (1999) "Navigating haystacks and discovering needles," Markup Languages, Vol 1, No 4, MIT Press.

[Pepp00] Pepper, Steve (2000) The TAO of Topic Maps, Proceedings of XML Europe 2000, GCA, available online at http://www.ontopia.net/topicmaps/materials/tao.pdf.

[Pepp01] Pepper, Steve (2001) The Italian Opera Topic Map, Ontopia, Oslo, available online at http://www.ontopia.net/topicmaps/examples/opera/opera.iso.

[Sowa00] Sowa, John (2000) Knowledge Representation: Logical, Philosophical, and Computational Foundations, Brooks/Cole, Pacific Grove, portions available online at http://www.bestweb.net/~sowa/ontology/contexts.htm.

[TMnet01] Topicmaps.net (2001) Processing Model for XTM 1.0, version 1.0.1 Topicmaps.net, available online at http://www.topicmaps.net/pmtm4.htm.

[TMOrg01] TopicMaps.Org 2001 XML Topic Maps (XTM) 1.0 Specification, TopicMaps.Org, available online at http://www.topicmaps.org/xtm/1.0.