#topicmaps@irc.freenode.net log for 2003-02-14

This log is automatically generated by an IRC bot from the traffic on the #topicmaps IRC channel on the irc.freenode.net IRC server. This file has the traffic for 2003-02-14. If you have questions regarding this log, please contact larsga@ontopia.net.

03:05:18 GabeW GabeW has quit None ("Client Exiting")
09:04:05 larsbot larsbot has quit None (Read error: 104 (Connection reset by peer))
09:04:05 botlars botlars has joined #topicmaps
09:36:45 botlars botlars has quit None ("[x]chat")
09:59:15 larsbot larsbot has joined #topicmaps
10:22:56 gra gra has joined #topicmaps
10:23:10 larsbot hi there
10:23:16 gra morning
10:23:21 gra we have a problem?
10:23:30 larsbot nah, not really
10:23:59 larsbot it's this thing:
10:24:12 larsbot tmbot: show: merge-srcloc-vs-subjid
10:25:04 gra what did we decide before
10:25:45 gra oh yes
10:25:58 gra whats wrong with this?
10:30:40 larsbot look at the sc34wg3 occurrence
10:30:43 larsbot it explains the problem
10:34:03 arnarl arnarl has joined #topicmaps
10:34:08 arnarl mornin
10:34:14 larsbot morning :)
10:36:34 abcoates abcoates has joined #topicmaps
10:36:55 gra ok
10:36:57 larsbot morning, tony
10:37:00 abcoates Gidday!
10:39:04 larsbot gra: do you feel you understand the issue?
10:39:08 gra no
10:39:14 larsbot ok :-)
10:39:18 gra the sc34 doesnt help me
10:39:26 larsbot hmmmmm
10:40:03 gra fill in the blank....
10:40:28 gra if we merge two topics becuase of src loc/subj ind match and keep that value as a srcloc thats bad becuase [blank]
10:40:50 larsbot in some cases you'll find people referring to subject indicators using <topicRef/>
10:41:05 larsbot a typical example is when people refer to the stuff in core.xtm
10:41:25 pepper pepper has joined #topicmaps
10:41:26 larsbot if they do that the PSI URIs become source locators (with the resolution we chose in Baltimore)
10:41:54 gra hmmm
10:42:06 larsbot which has the result that when you check if something is the sort name topic, for example, by looking at its subject identifiers
10:42:12 larsbot you'll find that it's not (but in fact it is)
10:42:32 larsbot and it's even worse, actually: referring to core.xtm on the web using <mergeMap/> will have the same effect
10:42:45 larsbot reading <topic id="sort">...</topic> has the effect of assigning a source locator
10:42:52 larsbot which causes a merge, and blahblahblah
10:43:14 gra ok, i see the problem
10:43:35 pepper my feeling is that subject identifiers are just too central to be thrown away
10:43:49 larsbot the sc34 mail lists three possible resolutions
10:43:55 pepper if we are going to throw something away, let it be the source locator, but why not just keep both?
10:43:56 larsbot mr. pepper proposes a fourth
10:44:14 gra yep - but i thought we werent throwing them away - but that the string would also be the srcloc on the topic
10:44:24 gra as well as the subj ind
10:44:32 larsbot that's not allowed by the current rules
10:44:38 gra oh
10:45:07 larsbot see the second SAM constraint at http://www.isotopicmaps.org/sam/sam-model/#d0e740
10:45:18 gra steve are you proposing we keep both?
10:45:37 pepper es
10:45:38 pepper yes
10:46:07 pepper as i said, we *can't* get rid of the SI - it's just too important to the workings of the topic map...
10:46:11 gra lars theres no contradiciton their? the topic of course reifies itslef
10:46:27 pepper if a user has said "this is my subject indicator" we should never remove that statement
10:46:52 gra i agree
10:46:54 pepper we *could* get rid of the source locator, but I
10:47:05 pepper I'm wary about doing that without very good reason...
10:47:30 gra lars: i dont see the issue with keeping both
10:47:50 pepper after deserializing from, say, XTM, we'd have a ton of topics, all of which have source locators, *except* one or two that happen to have been subject to this kind of merging
10:47:52 gra processing wise you need to be careful not to keep merging the same topic with itslef
10:48:10 gra but thats an engine issue
10:48:45 larsbot we have two namespaces as it is now: that of source locators and that of subject identifiers
10:49:03 larsbot they are separate, but for topics they overlap
10:49:47 pepper is that a problem? seems to me it's in the nature of things. it's what makes a topic special, if you like
10:50:49 pepper are we agreed that we cannot and should not throw away the subject identifier?
10:51:01 larsbot I haven't made up my mind on that
10:51:22 larsbot to resume: they are separate, but for topics they overlap
10:51:39 larsbot now we are also going to say that a topic may have the same value in both namespaces at the same time
10:51:48 pepper yes
10:52:18 larsbot as graham says, it means implementors, when allowing this, have to check for other topics having the same source locator and subject identifier
10:52:29 larsbot then, if they find one, they have to make sure it's not the *same* topic
10:52:43 larsbot because if it is, it's ok
10:53:39 pepper depending on your algorithm, you may already have to make sure you don't merge a topic with itself
10:53:51 larsbot it seems conceptually messy to me to allow this
10:53:58 larsbot I don't see the problem with losing the source locator
10:54:29 larsbot you'll have topics that have no source locators in several other common cases as well
10:54:50 pepper such as?
10:55:05 larsbot in LTM if you use sort names or display names you get topics without source locators for those
10:55:25 larsbot in XTM, if you have a <subjectIndicatorRef/> with no corresponding <topic> you'll get a topic with no source locator
10:55:36 pepper i guess what it boils down to (for me) is this:
10:55:38 larsbot if you generate a topic map using the API you'll also get topics with no source locators
10:56:39 larsbot I do think we're focusing on this issue from the wrong angle
10:56:41 pepper If we are absolutely certain that we will never need the source locator (i.e. that having the subject identifier is enough), then it can be thrown away. If there is any doubt at all, it should be retained.
10:57:07 larsbot I'm not sure the most important thing is which locators are stored where
10:57:25 larsbot I think the key issue is really what the right way to look up topics by locators is
10:57:52 larsbot let's say that you refer to the sort name topic using only a <topicRef/>
10:58:05 larsbot you use the right URI, but there's no <subjectIndicatorRef/>
10:58:16 larsbot should applications then recognize that as the sort name topic?
10:58:23 larsbot *that*, to me, is the real issue
10:59:39 pepper well, we have always said that a topicRef is just a special kind of subjectIndicatorRef...
10:59:57 larsbot exactly
11:00:14 pepper if that's the case, then when encountering a topicRef the application should do everything it would do with a subjectIndicatorRef...and (maybe) then some
11:00:32 larsbot let's separate application and implementation, please
11:00:36 larsbot implementation == topic map engine
11:00:46 larsbot application == something that does something useful with topic maps, using an engine
11:01:03 pepper s/application/implementation/
11:01:30 abcoates In any case, surely the engine, on encountering a topicRef, should just check it refers to a topic, and then treat it as a subjectIndicatorRef, nothing more.
11:01:44 larsbot maybe
11:01:49 abcoates :-)
11:01:54 abcoates I knew somebody would say that.
11:02:03 larsbot but that would mean losing the distinction between <topicRef>s and <subjectIndicatorRef>s
11:02:32 larsbot which again would mean that when roundtripping TMs you are going to accumulate an ever-increasing number of subject indicators
11:02:44 abcoates Actually, I thought the only other distinction was that topicRefs tend to force merges, whereas subjectIndicatorRefs don't.
11:02:49 larsbot and you also won't be able to tell which are real subject indicators and which are <topic> elements
11:02:58 larsbot abcoates: they both force merges
11:03:22 larsbot see http://www.isotopicmaps.org/sam/sam-model/#source-locator
11:03:53 larsbot let's assume that we do remove the srcloc/subjectid distinction
11:04:00 larsbot for topics it's easy
11:04:10 larsbot but what about base names? do they have subject identifiers?
11:04:15 larsbot that seems kind of backwards to me
11:04:35 larsbot and reification, in that case, really becomes a kind of merging of the base name with the reifying topic
11:04:45 larsbot (because they have the same subject)
11:04:58 larsbot that makes sense, but is decidedly an odd way to look at it
11:05:56 larsbot I'm not happy with the concept of source locators, but I can't really think of a better way to do this, either
11:07:19 larsbot thoughts, anyone?
11:08:19 pepper Not sure I understand the implications of this
11:08:46 larsbot I said it was subtle :)
11:08:58 pepper Are you suggesting that there might be an alternative to source locators?
11:09:17 larsbot I'm saying I'm not entirely happy with the concept
11:09:34 abcoates Actually, I found the source locators to be more confusing than helpful when I was having a look at the GooseWorks stuff.
11:09:48 larsbot there are of course alternatives, but I haven't found one I like
11:09:58 larsbot abcoates: lots of people do seem to find them confusing
11:10:21 abcoates To me they are a historical record of where the data came from, which is only of pedagogical interest. I mean, from a database perspective, you store the data and relationships, not a list of where each datum came from.
11:10:41 larsbot the trouble is that to get merging etc to work you need this information
11:10:51 pepper Isn't the "historical record" of paramount importance for merging?
11:11:10 larsbot exactly, and also for reification, though reification *could* be recorded directly
11:11:12 abcoates Engines should only need it temporarily for merging.
11:11:22 abcoates I don't like the idea that it is long-term information.
11:11:35 larsbot at the moment it's kind of in-between
11:11:45 larsbot if you export the TM back out to XTM the source locators are lost
11:11:50 abcoates Why should I *have* to assign a URL to each XTM file just so I can merge it into my engine.
11:12:12 larsbot because otherwise you won't be able to keep the IDs from different documents apart
11:12:36 larsbot and also because these URIs are useful for referring to topics later on
11:12:42 abcoates You need to preserve PSIs & other SIs, but not the details of every topic, I hope.
11:12:49 larsbot the tolog syntax uses them to let you write human-readable queries, for example
11:13:14 abcoates I'm just concerned about having to create too many artificial URIs for XTM files.
11:13:31 larsbot do you mean artificial URIs for the XTM *files* or for the topics therein?
11:13:59 abcoates If you need a subject locator for each XTM topic, you really need a URL for each XTM file, no?
11:14:08 larsbot correct
11:14:22 larsbot the OKS doesn't allow you to load an XTM document without assigning a URI
11:14:35 larsbot I think TM4J does allow it, but then uses a special base URI to create URIs
11:14:46 abcoates I think that adds a burden of URI management that users should need and won't want.
11:15:06 larsbot I'm not sure that burden is so onerous
11:15:17 larsbot usually the XTM files come from files or from the web, and then they have URIs
11:15:28 abcoates Onerous enough to put people off if they aren't already part of the converted.
11:15:48 larsbot really? but in what situation will you have a URI-less XTM document?
11:16:39 abcoates Support I have a standard PSI set. Not that I've followed the latest PubSubj stuff, but I didn't think that PSIs *must* point to a topic in a TM.
11:17:13 larsbot actually, the recommendation is that they shouldn't
11:17:20 larsbot they should point to something human-readable
11:17:28 abcoates So, I could have an XTM file filled with topic, each of which has a PSI URL. I don't *need* a URL for the topic map in order to have the topics well identified, so I shouldn't have to create one.
11:17:45 abcoates (with topic -> with topics)
11:18:05 larsbot ok, but in what cases would you have to create one? that is, when wouldn't it already *have* a URI?
11:18:53 larsbot if it's in a file, it already has a URI
11:19:00 larsbot if it's on the web, it already has a URI
11:19:01 abcoates If it is just on a file system, it would have a "file:" URI, and these are never good to use, particularly for processes like merging. Also, if I receive the XTM file from an XML messaging queue, there is no reason why it should have a URL.
11:19:03 gra i havent liked the srclocs for a while, but like lars at the moment they do a job
11:19:22 gra my original position on the srclocs was that they really arent part of the model
11:19:32 gra and are there only to help compute the refied property
11:19:47 gra if the reified property is not computed and a direct property
11:20:05 gra then its up to the engine to maintain srclocs while it brings in the map
11:20:39 abcoates Sure, but you don't need a URL for the map for that.
11:20:55 gra another reason to hang on to them, which is app specific, is so that subsequent imports can reference topics already imported by src loc
11:20:56 larsbot *and* you have to bring in all maps at once. if you wait you'll lose sourcelocs, and later merges may fail
11:21:15 larsbot abcoates: formally the XTM deserialization spec is based on the XML Infoset
11:21:22 larsbot the XML Infoset requires a base URI: http://www.w3.org/TR/xml-infoset/#infoitem.document
11:21:23 gra i think thats the issue - do we let later merges fail?
11:21:34 larsbot that's certainly a major part of it
11:21:48 abcoates You only need a base URI if you use relative paths.
11:21:48 larsbot another is: do we want these srclocs to be usable as topic identifiers?
11:22:00 larsbot abcoates: parsers only require it then, I agree
11:22:24 larsbot abcoates: note that the document does *not* say it can have no values, but it does for other properties
11:22:29 larsbot the implication is that the property is required
11:22:43 gra in some ways they have to be identifiers in order to compute the reified prop
11:23:00 larsbot if we want it to be computed, yes
11:23:28 larsbot but let's say you've loaded the TM and you want to refer to a topic that has no subject identifier
11:23:40 larsbot how do you do that in a way that is not implementation-specific?
11:23:59 larsbot this is something people want to do all the time
11:24:16 larsbot every time they write a query on a TM where not all important topics have subject indicators, for example
11:24:17 abcoates Well, the only other way is to search on a natural key.
11:24:29 larsbot most topics don't have one, in my experience
11:24:32 abcoates The basename, or any other name, or an occurrence value.
11:24:55 abcoates However, I believe that some topics won't be reachable, and I don't have a problem with that.
11:25:07 gra i think the idea of internal topic identifier is a very useful
11:25:14 gra and perhaps we should make that part of the model
11:25:28 gra internally, every topic has a SINGLE assigned system id
11:25:28 abcoates Beware, it can hurt you too.
11:25:42 abcoates DB guys can tell you what pain there is in maintaining such identifiers.
11:25:51 abcoates Sometimes it is better just to use natural keys.
11:25:54 abcoates More robust.
11:26:10 gra its our system, we control the creation of all topics etc
11:26:15 larsbot well, I do think it should be allowed for applications to remove the source locators if they don't want them
11:26:15 gra topics are about identity
11:27:06 abcoates What does that mean?
11:27:25 gra we have concepts in terms of subj ind, res ref for knowing about when things are the same
11:27:47 gra what we are talking about now is having some system id for these topics to support addressing a topic
11:27:56 gra unambigously when it is inside a tm engine
11:28:13 gra i dont really see what can hurt us?
11:28:19 abcoates I expect many engines will have something like that, but I think it should only be an implementation issue.
11:28:36 gra no i dont - i think its a standardisation issue
11:28:51 gra in order to allow distributed p2p, server client tm engine to integrate
11:28:54 abcoates ID management becomes an issue when you have parallel operations occurring on a data store.
11:29:01 gra we need to standardise many things,
11:29:26 gra one of those things is the property that contains/has the single unique system id for the thing you want to link to
11:29:29 gra reference etc
11:30:12 abcoates You have to worry about IDs going stale, and things like that.
11:30:23 larsbot what do you mean by "stale"?
11:30:35 gra OODB has been around for a long time now and i dont think they suffer from not being able to have parallel ops on their objects
11:30:50 abcoates User 1 requests an ID for a topic. User 2 deletes the topic. User 1 makes a request using the ID, and there is no matching topic.
11:31:40 gra yes, this is called concurrent access but has nothing to do with assigning ids to topics
11:31:47 gra if the id is the name or natural key
11:31:53 gra the topic can still have been deleted
11:32:10 larsbot true
11:32:19 abcoates Sure, but that tends to be easier to manage, and to understand.
11:32:32 gra i dont see why
11:32:38 abcoates Tracking opaque IDs can really kill your productivity.
11:32:54 gra opaque?
11:33:17 abcoates If you just assign an ID, it will have no relationship to what it describes, so it will be opaque.
11:33:30 abcoates 0124536374 is an opaque ID for a currency.
11:33:41 abcoates The code "GBP" is a natural identifier.
11:33:59 larsbot source locators are more natural identifiers than opaque ones, actually
11:34:07 gra when you say no relationship - you mean there are no properties of the object you can derive from the id
11:34:22 larsbot or tend to be, I should say
11:34:31 abcoates Yes, if you only have the ID, and something goes wrong in the system, you have no idea what was being referred to.
11:34:58 gra what kinds of things?
11:35:10 gra i mean do you encode EVERY thing about an object into its id?
11:35:21 gra no, i dont think so
11:35:28 abcoates I'm not saying that you do.
11:35:38 abcoates For example ...
11:35:57 abcoates You could refer to me as prol #45677493676, and look me up that way.
11:36:28 abcoates Or, you could look for surname=Coates, firstname=Anthony. This is the natural key approach, using a composite key, and it is more robust.
11:36:57 gra thats lookup based on query or properties of an object
11:37:07 larsbot abcoates: I agree it is, but I think there's an angle on this that you've missed
11:37:16 abcoates Shoot.
11:37:28 larsbot when you want to do a query on an RDBMS that's easy, because tables and columns have defined names that you can use
11:37:49 larsbot in a topic map there's no distinction between topic types and ordinary instance topics
11:38:11 larsbot this means that to do the query "find all instances of person" you have to find the "person" topic first
11:38:13 abcoates If you are telling me that DBs are much easier to query than TMs, we might as well give up here and now.
11:38:32 larsbot they're different; that's all
11:38:45 larsbot so the question is: how do you find "person"?
11:38:57 abcoates There will always have to be certain topics with PSIs (or at least SIs) that you can use comparitively to column names.
11:39:01 gra a topicmap system conceptually is the same as a object based system with a runtime model, object system are based on the fact that
11:39:02 larsbot if it has a subject identifier it's easy, although slightly awkward (you have to deal with long URIs)
11:39:22 gra every object is uniquely identifiable by something other than the values or the properties of that object
11:39:29 larsbot yep
11:39:33 gra you can still query on those properties
11:39:45 larsbot if it has a source locator it is even easier: you can use that to find it, relative to the base URI of the TM
11:39:50 gra but you know you can always rely on one property of every objectbeing there
11:40:03 gra and thats it unique id within that system
11:40:08 larsbot using that approach the query in, say, tolog you can write "instance-of($A, person)?" and be done
11:40:40 larsbot for users to have to assign full subject indicators to all their topics before they can start querying their TM is not a very friendly approach
11:40:51 larsbot managing the URIs of each individual document is far easier
11:40:54 abcoates I think you will have trouble getting TM systems to scale properly if you think you can quickly enough assign a unique ID to everything.
11:41:04 larsbot how so?
11:41:23 gra i think we've done it, and i think ontopia have done it
11:41:47 abcoates It's expensive to track which have been used and which have not. It puts you behind the DB guys, who have tried and rejected that approach (assuming they aren't Access users)
11:42:07 gra there are far more challenging issues than assigning unique ids
11:42:22 larsbot huh? that's actually the approach recommended by all object-relational mapping guides I've ever seen
11:42:25 abcoates I worked for Reuters, and I can tell you that at that scale, IDs are a challenging issue.
11:42:42 larsbot I have no diffculty accepting that
11:42:46 gra let me give you a scenario tony...
11:42:54 larsbot but we need to make this work at both the low end *and* the high end
11:42:56 abcoates Lars, the "Scott Ambler" approach of assigning IDs to everything is very popular, but only because application developers know so little about databases.
11:43:17 larsbot so what's the problem?
11:44:19 gra i ahve a entity of type person who has a name and an age say and they have a ref to a company for whom they work
11:44:29 gra i would model that by having an id on the company
11:44:43 abcoates Hard to maintain, and tend to reduce performance. In real DB systems, you only have per-table IDs, and lots of tables. This improves the performance. I'll be honest, I'm not the worlds top DBA, but I spent a lot of time talking to guys who are, and I feel convinced on this.
11:44:45 gra such that if the company was still the same company but changed it s name my system would still work
11:45:01 pepper yes
11:45:05 pepper yes
11:45:16 gra using the natuarl key approach i would reference
11:45:16 pepper yes
11:45:25 gra the string name of the company
11:45:34 pepper yes
11:45:35 pepper yes
11:45:40 gra and if the name changes but the company is teh same my system is shot
11:45:42 pepper pepper has quit None ()
11:45:54 abcoates One down.
11:46:41 pepper pepper has joined #topicmaps
11:46:47 pepper gra: yes
11:46:54 pepper (sorry - lost my connection)
11:47:09 larsbot pepper: you did say "yes" 6 times before doing so, though
11:47:26 pepper OK - something hung itself up here :-)
11:47:37 larsbot yep. I was about to kick you off the channel :-)
11:47:39 pepper what I actually meant was "yes"
11:48:06 larsbot abcoates: "only per-table IDs". that doesn't sound so different to me, but maybe I'm missing something
11:48:14 larsbot the world's top DBA I'm definitely not :)
11:48:20 pepper gra: do you see a problem with keeping both?
11:48:33 abcoates Table IDs tend to be smaller, but more to the point, they are decoupled from each other.
11:48:49 gra ??
11:48:58 abcoates It is much quicker to assign a new ID in table that assign to new global ID which is unique across all tables and databases.
11:49:12 pepper gra: ?? to what - my question?
11:49:20 gra pepper: no i dont see a problem
11:49:34 pepper then i propose the following:
11:49:42 pepper (1) subj ident *must* be retained
11:49:52 pepper (2) source locator *may* be retained
11:50:02 pepper (3) it is (obviously) not an error to have both
11:50:03 gra tony, that is just part of the engineering - but at the conceptual level every obj has a unique id
11:50:16 gra <db><table><localid>
11:50:40 larsbot pepper: in that case (2) must be changed to "must" rather than "may"
11:50:50 pepper why?
11:51:06 larsbot what's the point of a "may" rule?
11:51:17 larsbot in any case, this is only half the issue, so we're not through yet
11:51:34 larsbot and I would like a rethink of the whole source locators thing before settling this
11:51:39 pepper may allows implementations that need to keep a record of the source hanging around to do so, without imposing that overhead on others
11:52:19 larsbot so you mean that (2) should apply in general, not just in this specific case?
11:52:32 pepper rethink: then we will have to put the whole thing off until London
11:52:39 larsbot not necessarily
11:53:00 gra i agree with lars i'd rather have all the issues regarding idenity sorted and understood rather than a fix/soln to the current issue
11:53:09 pepper (2) apply in general: no. only when subj ident makes src loc redundant for the purpose of merging
11:53:10 abcoates I should note too that Reuters always loses money if people treat table IDs as persistent IDs. They aren't, and making them so makes data maintenance difficult if not impossible.
11:53:51 pepper maybe we should use the RM as an "analytical tool" to help us get to the bottom of this?
11:55:04 larsbot abcoates: I find this ID thing interesting, but I'm not sure it is relevant to source locators
11:55:32 abcoates It is if people expect to use them to refer to particular topics in an engine.
11:57:55 gra heres my proposal, ditch src locs from the model, add a new property on topicmapobject called id, and refied becomes real rather than computed
11:58:12 pepper * pepper has to leave for a while...
11:58:14 pepper pepper has quit None ()
11:58:24 gra i'm not sure its a great proposal, but its an option
11:58:45 abcoates The problem isn't so much the ID, but the persistence.
11:58:47 larsbot I think that proposal is equivalent to the old, seen from abcoates's perspective
11:58:55 abcoates Persistent IDs are the problem.
11:59:05 abcoates If you allow the ID to change over time, it is OK.
11:59:44 larsbot it sounds like this has more to do with how we recommend that people use these URIs than with the structure of the data model
12:00:21 gra maybe, although specifically designating one property in the data model to be the system id
12:00:25 gra for that object
12:00:40 abcoates Yes, but is it persistent.
12:00:41 gra is more than just a usage point
12:00:45 gra yes
12:00:52 gra object systems seem to work perfectly well with persistent ids why cant topicmaps
12:00:59 abcoates I think you will have a management problem there. Reuters always did.
12:01:06 abcoates Object systems are small, databases are big.
12:01:12 abcoates This is the scaling issue.
12:02:02 abcoates From a scaling perspective, the solution is at the API level rather than the model level, that when you request a system ID, you get one, but you also get an expiry date/time after which you need to request a new system ID.
12:02:33 abcoates Add that to the SAM API by all means.
12:02:59 gra please explain to me how the identity for the object with name 'graham moore' id #85383853 will change over time
12:03:46 abcoates You may run out of numbers, find the 90% of the assigned numbers are no longer needed as the recipients are dead, so you want to reassign the numbers.
12:04:06 abcoates ID churn can become a serious issue.
12:04:09 gra ok - lets say we use guids
12:04:26 gra is the issue here just the size of these things?
12:04:35 abcoates You can, but they are slow to create, which is why DBAs avoid them. They are a theoretical solution, but impact performance.
12:04:47 abcoates (i.e. GUIDs are slow to create)
12:04:51 gra sure
12:05:14 gra but there are engineering solns to this
12:05:29 abcoates Well, Oracle hasn't got that solution yet.
12:05:29 gra such as create them in advance etc
12:05:32 abcoates If you have, great.
12:05:50 abcoates However, DB performance has to be the target, to my mind.
12:08:09 larsbot abcoates: the bit I don't understand is why they can't be persistent (and why this is not a usage issue)
12:09:12 abcoates If it were easy and didn't have serious performance overheads, DBAs would do it all the time, because it is conceptually the most obvious thing to do.
12:09:45 abcoates They don't do it, because it doesn't work in systems with lots of fast changes.
12:09:53 abcoates In a system with few changes, it isn't a problem.
12:10:06 gra fast changes to what though?
12:10:25 gra the values of the properties of the objects?
12:10:26 abcoates The thing is, I don't like to ignore what DBAs do now, because they've tried a lot of things, and they know which ones do and do not work.
12:10:31 mariyo mariyo has joined #topicmaps
12:10:49 abcoates No, not changes to simple property values.
12:11:01 abcoates Changes to objects, or effectively rows in database tables.
12:11:30 abcoates In a transaction processing system, these are created and deleted all of the time.
12:12:09 abcoates If you wanted to do a topic map containing real-time financial data (and why not, after all), you would have continual creation of topics.
12:12:58 gra how are the values in a row not simple properties of that object?
12:13:39 larsbot evening, mariyo :-)
12:13:46 larsbot * larsbot goes afk for 10 minutes
12:13:52 gra hi mariyo
12:14:01 mariyo good day. trying to catch up on this conversation :)
12:14:01 abcoates The values are, but you need to create/delete rows, not just change values.
12:14:12 mariyo hi gra and abcoates.
12:14:19 abcoates Hi.
12:14:49 gra ok, so in this case you'd say that the creation of each row and a unique id for each row is too heavy an op
12:15:31 mariyo abcoates: why do you say that?
12:15:40 abcoates DBA experience.
12:15:44 abcoates (not mine)
12:15:58 abcoates The smaller the index range, the smaller the problem.
12:16:03 gra i thought that was what you were implying
12:16:08 abcoates Hence the use of separate tables wherever possible.
12:16:25 gra i.e. if the function nextID(); was always lightning fast everything would have ids
12:16:35 abcoates Until you reach the maximum integer.
12:16:42 abcoates And that *can* happen in real systems.
12:16:54 gra ok, lets assume that it is a guid
12:16:56 abcoates Enterprise systems can't just use the Access "autonumber" approach to IDs.
12:17:05 gra indeed
12:17:21 abcoates You can always theoretically assume a GUID. They simply are slow to create, hence DBAs avoid them as much as possible.
12:17:32 abcoates I suggested such a thing once, and was told not to even think about it.
12:17:50 gra thats why I said if the function nextID() was lightning fast, even on guids
12:17:56 gra then things would have ids
12:18:09 abcoates Yes, it you could do things better than Oracle do, it might work. But that is a big call.
12:18:25 gra agreed
12:19:08 abcoates Really, if you just have the SAM API so that system IDs come with an expiry date/time, it will help long term management.
12:19:22 abcoates That's all you need.
12:20:04 abcoates Of course, it forces you to have an alternative way of specifying which topic you want, but I think that can only lead to better design elsewhere.
12:23:07 gra out of interest i just checked with k42 dev team and we can store ~8 trillion uniquely id objects, where each id requires 128bits in the index, and a topicIDRef is 1-8 bytes
12:24:34 abcoates Yes, that sounds like a lot, but you run out after 1 million create/delete cycles on each of 8 million objects, and that can happen.
12:24:35 gra now ok - we arent oracle, and maybe we havent tested with the volume of data at reuters but i'd be impressed to see that tm run out of unique ids
12:25:24 mariyo * mariyo looking over the logs.
12:25:47 gra ok - we can recycle ids
12:26:03 mariyo abcoates: you said this, From a scaling perspective, the solution is at the API level rather than the model level, that when you request a system ID, you get one, but you also get an expiry date/time after which you need to request a new system ID.
12:26:30 mariyo this would be the way you would handle this, but you don't want this specified in a standard, do you?
12:27:18 abcoates What I didn't want specified in the standard is that you can request a persistent system ID for any topic.
12:27:38 abcoates And if you recycle IDs, they are no longer persistent.
12:27:42 mariyo ok, got you. I agree with this.
12:32:15 mariyo (1) subj ident *must* be retained
12:32:28 mariyo (2) source locator *may* be retained
12:32:53 mariyo this is what we have agreed to so far?
12:33:43 mariyo gra: you want (2) to be *must*?
12:33:59 abcoates Well, (2) is what we have been discussing. I don't know if you can call it agreed or not. Also (3) is whether the TM model should include a system ID for each topic, and if so, should it be persistent.
12:36:55 mariyo * mariyo goes back to read logs again.
12:39:03 mariyo (3) seems to be implementation specific, but i guess you have been discussing whether it should be or not.
12:39:55 abcoates Well, it was seen as an alternative to source locators.
12:40:10 abcoates The issue is how you specify a particular topic in an engine.
12:40:27 abcoates Do you always have some kind of persistent ID, or do you have to locate it by some query on values.
12:40:49 abcoates I favour the query approach as being more robust and having fewer key management issues.
12:41:24 mariyo ok that's what i thought you were getting at, so if you use these as perisitent ids you don't want to thow out the source locators then.
12:42:01 abcoates Well, we were discussing, in part, whether system IDs were a suitable replacement for locators.
12:42:10 abcoates My argument was that both had similar problems.
12:42:41 mariyo so really no solution yet :(
12:43:24 abcoates Well, no complete agreement yet (not that I noticed).
12:43:54 abcoates As I said, I think system IDs can work, as long as they expire after a period.
12:43:58 abcoates Forever is a *very* long time.
12:44:58 abcoates That way, Gra can allow his IDs to live for 100 years if he wants. If I were running a system, I would be more likely to expire IDs after a day.
12:47:31 mariyo why do you need these daily expirations for financial data?
12:50:08 abcoates Well, it's partly a management thing.
12:50:28 abcoates The Reuters databases are *huge*. It publishes something like a terabyte of financial data each day.
12:51:35 abcoates If you need to make changes to the database structure, to improve performance or implement new features, you need to know how long old structures need to be maintained in parallel with new structures for changeover.
12:51:59 abcoates The longer IDs persist, the longer you are forced to leave old structures in place, and update them in parallel with new structures.
12:52:31 abcoates For many applications, asking them to get a new ID once per day is not onerous. For the rest of the day, access is fast.
12:53:42 abcoates If users believe IDs are persistent, they code up fragile applications that will fail if the IDs ever need to be changed.
12:55:33 mariyo i get your point. it really all depends on the scale them. I am not working at this scale and in a very different context.
12:55:54 abcoates Sure, almost any approach will work if the problem is small enough.
12:56:20 abcoates I just don't want the TM spec to be limited because TMs now are smaller than enterprise databases.
12:58:18 mariyo i am just thinking about the various systems i work with, how ids are assigned now and what that would mean in the future in the context of TMS. that's why i am intereste in this discussion. it is very close to home.
12:58:29 abcoates OK.
12:59:00 abcoates It turned out to be a vastly more complicated topic with Reuters than I would have expected, coming from my naive Java development background.
13:10:12 abcoates I have to go now. Bye!
13:10:14 abcoates abcoates has left #topicmaps ()
13:20:46 mariyo * mariyo away for a while.
13:57:31 mariyo well, have a good day. bye!
13:58:14 mariyo mariyo has quit None ("time to call it a night. bye everyone and see you tomorrow!")
14:37:41 larsbot * larsbot got tied up with lots of discussions
14:37:41 larsbot sorry
14:38:18 larsbot gra: I'm collecting lots of SAM comments now. will consolidate and send to you
14:38:32 larsbot we also need to identify XTM issues
14:41:43 gra gra has quit None (Read error: 54 (Connection reset by peer))
17:30:25 larsbot * larsbot -> home
17:30:28 larsbot larsbot has quit None ("[x]chat")
18:43:10 larsbot larsbot has joined #topicmaps
19:37:52 SeeTemp SeeTemp has quit None ("Client Exiting")
21:00:33 larsbot hmmmm. xchat 2.0 is out
22:27:08 larsbot larsbot has quit None (leguin.freenode.net irc.freenode.net)
22:27:08 xover xover has quit None (leguin.freenode.net irc.freenode.net)
22:27:09 em em has quit None (leguin.freenode.net irc.freenode.net)
22:27:10 grove grove has quit None (leguin.freenode.net irc.freenode.net)
22:27:10 arnarl arnarl has quit None (leguin.freenode.net irc.freenode.net)
22:28:36 larsbot larsbot has joined #topicmaps
22:28:36 xover xover has joined #topicmaps
22:28:36 em em has joined #topicmaps
22:28:37 grove grove has joined #topicmaps
22:29:00 em em has quit None (leguin.freenode.net irc.freenode.net)
22:29:00 xover xover has quit None (leguin.freenode.net irc.freenode.net)
22:29:00 larsbot larsbot has quit None (leguin.freenode.net irc.freenode.net)
22:29:01 grove grove has quit None (leguin.freenode.net irc.freenode.net)
22:29:09 arnarl arnarl has joined #topicmaps
22:29:33 larsbot larsbot has joined #topicmaps
22:29:33 xover xover has joined #topicmaps
22:29:34 em em has joined #topicmaps
22:29:35 grove grove has joined #topicmaps
22:30:14 GabeW GabeW has joined #topicmaps
23:17:41 larsbot hi there, GabeW
23:18:08 GabeW hi there larsbot
23:19:22 larsbot is this the latest version of the XRI motivations/model doc: http://lists.oasis-open.org/archives/xri/200301/msg00045.html?
23:31:15 larsbot * larsbot ploughing into document in the hope that it may be the latest one
23:32:14 GabeW yeah
23:32:16 GabeW it is
23:32:19 GabeW its the only one
23:32:25 GabeW thx much
23:32:31 GabeW i havne't gotten much feedback at all on it
23:33:08 larsbot you'll get some now :)
23:35:09 GabeW ut oh
23:36:57 GabeW one request: be nice to me ;-)
23:37:23 larsbot I'll keep the tone nice, don't worry :)
23:37:53 larsbot to communicate with me it will need more work, though
23:38:02 GabeW ok, thats fine
23:38:07 GabeW in fact thats very good to hera
23:38:08 GabeW hear
23:38:49 larsbot the first section is good, but after that it gets very confusing
23:38:57 GabeW hehehh
23:39:07 larsbot there's too much about directories in general, and too little about XRI's intentions towards them
23:39:11 GabeW hmm
23:39:12 larsbot I think shortening the document is a good idea
23:39:45 GabeW this is literally the only substantive feedback I've gotten so far, so it counts a lot
23:39:51 GabeW I'm trying to cover a lot of ground in it
23:40:01 GabeW not all of it (maybe not most of it) in response to your concerns
23:40:25 larsbot I think the document needs what I think of as "combing"
23:40:30 GabeW heh
23:40:41 larsbot that is, pulling the important stuff towards the front, and pushing the less important towards the back
23:40:45 GabeW right
23:40:47 GabeW i'm all for that
23:41:07 GabeW in fact, i'm happy to hear that because thats what my gut feeling was
23:41:13 GabeW it definitely rambles
23:41:54 larsbot trying to give some constructive advice in my reply
23:42:15 GabeW well, anything beyond "it sucks" I can usually take constructively
23:42:46 GabeW this is a very rough first draft, so focusing on higher level issues (rather than grammar, etc) would be most helpful - substantive stuff, you might say
23:44:11 larsbot I'm trying :)
23:46:21 larsbot GabeW: is the document trying to tell us what the XRI TC is here to accomplish?
23:46:46 GabeW well, ok, so you caught me a little - it started out as a "model" document - the way we see the world
23:46:50 GabeW and sorta migrated in purpose
23:47:00 GabeW why do you ask?
23:47:03 larsbot right. in that context it makes more sense
23:47:12 larsbot it just doesn't seem very concerned with talking specifically about the XRI TC
23:47:16 GabeW ah
23:47:18 GabeW got it
23:47:28 larsbot it seems more like a "General notes on the tech context in which the TC works" kind of thing
23:47:32 GabeW yeah
23:47:41 GabeW so in that sense, it doesn't address your concerns as direclty
23:47:55 larsbot aha. is it then inappropriate for me to criticise it for failing to do that?
23:48:29 GabeW uh, well, its probably appropriate to make that criticism of the effort in general, and i can note that this document doesn't help in that direction...
23:48:36 GabeW how about that for diplomacy-speak?
23:48:48 GabeW * GabeW has been boning up on that with all the UN machinations going on recently
23:49:27 larsbot we probably shouldn't talk about real-life politics here (keep the temperature down :)
23:49:59 larsbot yeah, but I'm accumulating notes now on how it fails to tell me about the TC
23:50:06 larsbot I'm wondering if there's any point in sending that in
23:50:18 larsbot I mean, if you're not trying to do that...
23:51:00 GabeW hmm
23:51:59 GabeW i'd like to hear all of your thoughts, but if it'd save you time, I can stipulate that the document fails miserably at the task of telling you what the TC is planning on doing in any detail
23:52:20 larsbot I don't mind it failing to do that, since I can then at least tell you *how* it fails
23:52:29 larsbot what's more troublesome is if it wasn't meant to do that at all
23:52:52 larsbot if I tell you how it fails to do something it was never meant to do, I'm probably wasting time for everybody concerned
23:52:55 larsbot that's why I'm asking
23:53:21 GabeW i don't think I intended it while I was writing it
23:53:27 GabeW so ignore that facet
23:53:38 GabeW but clearly there is still a gap there
23:53:46 GabeW whether in this document or elsewhere
23:54:03 GabeW i'm worried that you don't have other feedback..
23:54:35 larsbot well, I have no other feedback because before I know what you are trying to do there's really nothing I can say
23:54:43 GabeW hmm
23:54:56 larsbot at this stage I don't care why you are trying to do X or how you are trying to do X
23:55:00 larsbot I want to know what X is :-)
23:55:08 GabeW ok
23:55:09 larsbot *then* we can get to why and how
23:55:13 GabeW well
23:56:33 GabeW we apparently have the challenging task of understanding whats not obvious to others because I *think* what we are doing is very straightforward - clearly we (participants in the XRI TC) have some unstated assumptions here that I'm trying to uncover so that folks like you understand what we are trying to do.
23:56:59 larsbot it seems that way, yes
23:57:00 GabeW you asked what we were going to deliver...
23:57:09 larsbot should I send this directly to you first so you can look at it?
23:57:14 GabeW sure...
23:57:57 GabeW i was hoping by writing this document that I'd help to describe our thinking so that others could see some discrepancies or disconnects and by that process we'd begin to uncover the hidden assumptions
23:58:09 larsbot maybe it will work
23:58:19 GabeW didn't work with you *yet* ;-)
23:58:20 larsbot I sent a rough comments draft to you now so you can look it over
23:58:27 larsbot then we'll see :)
23:58:29 GabeW thanks a million!
23:58:50 GabeW I really do want to answer your questions - I'm not trying to be obtuse - I'm actually not really good at being obtuse, even when its called for...
23:59:37 larsbot :)
23:59:38 GabeW hey, your comments are really useful!
23:59:46 larsbot ah, excellent!
23:59:52 larsbot I was wondering if I was way off the page or what
23:59:52 GabeW at least the half of them I've read so far