This log is automatically generated by an IRC bot from the traffic on the #topicmaps IRC channel on the irc.freenode.net IRC server. This file has the traffic for 2003-02-14. If you have questions regarding this log, please contact larsga@ontopia.net.
| 03:05:18 | GabeW | GabeW has quit None ("Client Exiting") |
| 09:04:05 | larsbot | larsbot has quit None (Read error: 104 (Connection reset by peer)) |
| 09:04:05 | botlars | botlars has joined #topicmaps |
| 09:36:45 | botlars | botlars has quit None ("[x]chat") |
| 09:59:15 | larsbot | larsbot has joined #topicmaps |
| 10:22:56 | gra | gra has joined #topicmaps |
| 10:23:10 | larsbot | hi there |
| 10:23:16 | gra | morning |
| 10:23:21 | gra | we have a problem? |
| 10:23:30 | larsbot | nah, not really |
| 10:23:59 | larsbot | it's this thing: |
| 10:24:12 | larsbot | tmbot: show: merge-srcloc-vs-subjid |
| 10:25:04 | gra | what did we decide before |
| 10:25:45 | gra | oh yes |
| 10:25:58 | gra | whats wrong with this? |
| 10:30:40 | larsbot | look at the sc34wg3 occurrence |
| 10:30:43 | larsbot | it explains the problem |
| 10:34:03 | arnarl | arnarl has joined #topicmaps |
| 10:34:08 | arnarl | mornin |
| 10:34:14 | larsbot | morning :) |
| 10:36:34 | abcoates | abcoates has joined #topicmaps |
| 10:36:55 | gra | ok |
| 10:36:57 | larsbot | morning, tony |
| 10:37:00 | abcoates | Gidday! |
| 10:39:04 | larsbot | gra: do you feel you understand the issue? |
| 10:39:08 | gra | no |
| 10:39:14 | larsbot | ok :-) |
| 10:39:18 | gra | the sc34 doesnt help me |
| 10:39:26 | larsbot | hmmmmm |
| 10:40:03 | gra | fill in the blank.... |
| 10:40:28 | gra | if we merge two topics becuase of src loc/subj ind match and keep that value as a srcloc thats bad becuase [blank] |
| 10:40:50 | larsbot | in some cases you'll find people referring to subject indicators using <topicRef/> |
| 10:41:05 | larsbot | a typical example is when people refer to the stuff in core.xtm |
| 10:41:25 | pepper | pepper has joined #topicmaps |
| 10:41:26 | larsbot | if they do that the PSI URIs become source locators (with the resolution we chose in Baltimore) |
| 10:41:54 | gra | hmmm |
| 10:42:06 | larsbot | which has the result that when you check if something is the sort name topic, for example, by looking at its subject identifiers |
| 10:42:12 | larsbot | you'll find that it's not (but in fact it is) |
| 10:42:32 | larsbot | and it's even worse, actually: referring to core.xtm on the web using <mergeMap/> will have the same effect |
| 10:42:45 | larsbot | reading <topic id="sort">...</topic> has the effect of assigning a source locator |
| 10:42:52 | larsbot | which causes a merge, and blahblahblah |
| 10:43:14 | gra | ok, i see the problem |
| 10:43:35 | pepper | my feeling is that subject identifiers are just too central to be thrown away |
| 10:43:49 | larsbot | the sc34 mail lists three possible resolutions |
| 10:43:55 | pepper | if we are going to throw something away, let it be the source locator, but why not just keep both? |
| 10:43:56 | larsbot | mr. pepper proposes a fourth |
| 10:44:14 | gra | yep - but i thought we werent throwing them away - but that the string would also be the srcloc on the topic |
| 10:44:24 | gra | as well as the subj ind |
| 10:44:32 | larsbot | that's not allowed by the current rules |
| 10:44:38 | gra | oh |
| 10:45:07 | larsbot | see the second SAM constraint at http://www.isotopicmaps.org/sam/sam-model/#d0e740 |
| 10:45:18 | gra | steve are you proposing we keep both? |
| 10:45:37 | pepper | es |
| 10:45:38 | pepper | yes |
| 10:46:07 | pepper | as i said, we *can't* get rid of the SI - it's just too important to the workings of the topic map... |
| 10:46:11 | gra | lars theres no contradiciton their? the topic of course reifies itslef |
| 10:46:27 | pepper | if a user has said "this is my subject indicator" we should never remove that statement |
| 10:46:52 | gra | i agree |
| 10:46:54 | pepper | we *could* get rid of the source locator, but I |
| 10:47:05 | pepper | I'm wary about doing that without very good reason... |
| 10:47:30 | gra | lars: i dont see the issue with keeping both |
| 10:47:50 | pepper | after deserializing from, say, XTM, we'd have a ton of topics, all of which have source locators, *except* one or two that happen to have been subject to this kind of merging |
| 10:47:52 | gra | processing wise you need to be careful not to keep merging the same topic with itslef |
| 10:48:10 | gra | but thats an engine issue |
| 10:48:45 | larsbot | we have two namespaces as it is now: that of source locators and that of subject identifiers |
| 10:49:03 | larsbot | they are separate, but for topics they overlap |
| 10:49:47 | pepper | is that a problem? seems to me it's in the nature of things. it's what makes a topic special, if you like |
| 10:50:49 | pepper | are we agreed that we cannot and should not throw away the subject identifier? |
| 10:51:01 | larsbot | I haven't made up my mind on that |
| 10:51:22 | larsbot | to resume: they are separate, but for topics they overlap |
| 10:51:39 | larsbot | now we are also going to say that a topic may have the same value in both namespaces at the same time |
| 10:51:48 | pepper | yes |
| 10:52:18 | larsbot | as graham says, it means implementors, when allowing this, have to check for other topics having the same source locator and subject identifier |
| 10:52:29 | larsbot | then, if they find one, they have to make sure it's not the *same* topic |
| 10:52:43 | larsbot | because if it is, it's ok |
| 10:53:39 | pepper | depending on your algorithm, you may already have to make sure you don't merge a topic with itself |
| 10:53:51 | larsbot | it seems conceptually messy to me to allow this |
| 10:53:58 | larsbot | I don't see the problem with losing the source locator |
| 10:54:29 | larsbot | you'll have topics that have no source locators in several other common cases as well |
| 10:54:50 | pepper | such as? |
| 10:55:05 | larsbot | in LTM if you use sort names or display names you get topics without source locators for those |
| 10:55:25 | larsbot | in XTM, if you have a <subjectIndicatorRef/> with no corresponding <topic> you'll get a topic with no source locator |
| 10:55:36 | pepper | i guess what it boils down to (for me) is this: |
| 10:55:38 | larsbot | if you generate a topic map using the API you'll also get topics with no source locators |
| 10:56:39 | larsbot | I do think we're focusing on this issue from the wrong angle |
| 10:56:41 | pepper | If we are absolutely certain that we will never need the source locator (i.e. that having the subject identifier is enough), then it can be thrown away. If there is any doubt at all, it should be retained. |
| 10:57:07 | larsbot | I'm not sure the most important thing is which locators are stored where |
| 10:57:25 | larsbot | I think the key issue is really what the right way to look up topics by locators is |
| 10:57:52 | larsbot | let's say that you refer to the sort name topic using only a <topicRef/> |
| 10:58:05 | larsbot | you use the right URI, but there's no <subjectIndicatorRef/> |
| 10:58:16 | larsbot | should applications then recognize that as the sort name topic? |
| 10:58:23 | larsbot | *that*, to me, is the real issue |
| 10:59:39 | pepper | well, we have always said that a topicRef is just a special kind of subjectIndicatorRef... |
| 10:59:57 | larsbot | exactly |
| 11:00:14 | pepper | if that's the case, then when encountering a topicRef the application should do everything it would do with a subjectIndicatorRef...and (maybe) then some |
| 11:00:32 | larsbot | let's separate application and implementation, please |
| 11:00:36 | larsbot | implementation == topic map engine |
| 11:00:46 | larsbot | application == something that does something useful with topic maps, using an engine |
| 11:01:03 | pepper | s/application/implementation/ |
| 11:01:30 | abcoates | In any case, surely the engine, on encountering a topicRef, should just check it refers to a topic, and then treat it as a subjectIndicatorRef, nothing more. |
| 11:01:44 | larsbot | maybe |
| 11:01:49 | abcoates | :-) |
| 11:01:54 | abcoates | I knew somebody would say that. |
| 11:02:03 | larsbot | but that would mean losing the distinction between <topicRef>s and <subjectIndicatorRef>s |
| 11:02:32 | larsbot | which again would mean that when roundtripping TMs you are going to accumulate an ever-increasing number of subject indicators |
| 11:02:44 | abcoates | Actually, I thought the only other distinction was that topicRefs tend to force merges, whereas subjectIndicatorRefs don't. |
| 11:02:49 | larsbot | and you also won't be able to tell which are real subject indicators and which are <topic> elements |
| 11:02:58 | larsbot | abcoates: they both force merges |
| 11:03:22 | larsbot | see http://www.isotopicmaps.org/sam/sam-model/#source-locator |
| 11:03:53 | larsbot | let's assume that we do remove the srcloc/subjectid distinction |
| 11:04:00 | larsbot | for topics it's easy |
| 11:04:10 | larsbot | but what about base names? do they have subject identifiers? |
| 11:04:15 | larsbot | that seems kind of backwards to me |
| 11:04:35 | larsbot | and reification, in that case, really becomes a kind of merging of the base name with the reifying topic |
| 11:04:45 | larsbot | (because they have the same subject) |
| 11:04:58 | larsbot | that makes sense, but is decidedly an odd way to look at it |
| 11:05:56 | larsbot | I'm not happy with the concept of source locators, but I can't really think of a better way to do this, either |
| 11:07:19 | larsbot | thoughts, anyone? |
| 11:08:19 | pepper | Not sure I understand the implications of this |
| 11:08:46 | larsbot | I said it was subtle :) |
| 11:08:58 | pepper | Are you suggesting that there might be an alternative to source locators? |
| 11:09:17 | larsbot | I'm saying I'm not entirely happy with the concept |
| 11:09:34 | abcoates | Actually, I found the source locators to be more confusing than helpful when I was having a look at the GooseWorks stuff. |
| 11:09:48 | larsbot | there are of course alternatives, but I haven't found one I like |
| 11:09:58 | larsbot | abcoates: lots of people do seem to find them confusing |
| 11:10:21 | abcoates | To me they are a historical record of where the data came from, which is only of pedagogical interest. I mean, from a database perspective, you store the data and relationships, not a list of where each datum came from. |
| 11:10:41 | larsbot | the trouble is that to get merging etc to work you need this information |
| 11:10:51 | pepper | Isn't the "historical record" of paramount importance for merging? |
| 11:11:10 | larsbot | exactly, and also for reification, though reification *could* be recorded directly |
| 11:11:12 | abcoates | Engines should only need it temporarily for merging. |
| 11:11:22 | abcoates | I don't like the idea that it is long-term information. |
| 11:11:35 | larsbot | at the moment it's kind of in-between |
| 11:11:45 | larsbot | if you export the TM back out to XTM the source locators are lost |
| 11:11:50 | abcoates | Why should I *have* to assign a URL to each XTM file just so I can merge it into my engine. |
| 11:12:12 | larsbot | because otherwise you won't be able to keep the IDs from different documents apart |
| 11:12:36 | larsbot | and also because these URIs are useful for referring to topics later on |
| 11:12:42 | abcoates | You need to preserve PSIs & other SIs, but not the details of every topic, I hope. |
| 11:12:49 | larsbot | the tolog syntax uses them to let you write human-readable queries, for example |
| 11:13:14 | abcoates | I'm just concerned about having to create too many artificial URIs for XTM files. |
| 11:13:31 | larsbot | do you mean artificial URIs for the XTM *files* or for the topics therein? |
| 11:13:59 | abcoates | If you need a subject locator for each XTM topic, you really need a URL for each XTM file, no? |
| 11:14:08 | larsbot | correct |
| 11:14:22 | larsbot | the OKS doesn't allow you to load an XTM document without assigning a URI |
| 11:14:35 | larsbot | I think TM4J does allow it, but then uses a special base URI to create URIs |
| 11:14:46 | abcoates | I think that adds a burden of URI management that users should need and won't want. |
| 11:15:06 | larsbot | I'm not sure that burden is so onerous |
| 11:15:17 | larsbot | usually the XTM files come from files or from the web, and then they have URIs |
| 11:15:28 | abcoates | Onerous enough to put people off if they aren't already part of the converted. |
| 11:15:48 | larsbot | really? but in what situation will you have a URI-less XTM document? |
| 11:16:39 | abcoates | Support I have a standard PSI set. Not that I've followed the latest PubSubj stuff, but I didn't think that PSIs *must* point to a topic in a TM. |
| 11:17:13 | larsbot | actually, the recommendation is that they shouldn't |
| 11:17:20 | larsbot | they should point to something human-readable |
| 11:17:28 | abcoates | So, I could have an XTM file filled with topic, each of which has a PSI URL. I don't *need* a URL for the topic map in order to have the topics well identified, so I shouldn't have to create one. |
| 11:17:45 | abcoates | (with topic -> with topics) |
| 11:18:05 | larsbot | ok, but in what cases would you have to create one? that is, when wouldn't it already *have* a URI? |
| 11:18:53 | larsbot | if it's in a file, it already has a URI |
| 11:19:00 | larsbot | if it's on the web, it already has a URI |
| 11:19:01 | abcoates | If it is just on a file system, it would have a "file:" URI, and these are never good to use, particularly for processes like merging. Also, if I receive the XTM file from an XML messaging queue, there is no reason why it should have a URL. |
| 11:19:03 | gra | i havent liked the srclocs for a while, but like lars at the moment they do a job |
| 11:19:22 | gra | my original position on the srclocs was that they really arent part of the model |
| 11:19:32 | gra | and are there only to help compute the refied property |
| 11:19:47 | gra | if the reified property is not computed and a direct property |
| 11:20:05 | gra | then its up to the engine to maintain srclocs while it brings in the map |
| 11:20:39 | abcoates | Sure, but you don't need a URL for the map for that. |
| 11:20:55 | gra | another reason to hang on to them, which is app specific, is so that subsequent imports can reference topics already imported by src loc |
| 11:20:56 | larsbot | *and* you have to bring in all maps at once. if you wait you'll lose sourcelocs, and later merges may fail |
| 11:21:15 | larsbot | abcoates: formally the XTM deserialization spec is based on the XML Infoset |
| 11:21:22 | larsbot | the XML Infoset requires a base URI: http://www.w3.org/TR/xml-infoset/#infoitem.document |
| 11:21:23 | gra | i think thats the issue - do we let later merges fail? |
| 11:21:34 | larsbot | that's certainly a major part of it |
| 11:21:48 | abcoates | You only need a base URI if you use relative paths. |
| 11:21:48 | larsbot | another is: do we want these srclocs to be usable as topic identifiers? |
| 11:22:00 | larsbot | abcoates: parsers only require it then, I agree |
| 11:22:24 | larsbot | abcoates: note that the document does *not* say it can have no values, but it does for other properties |
| 11:22:29 | larsbot | the implication is that the property is required |
| 11:22:43 | gra | in some ways they have to be identifiers in order to compute the reified prop |
| 11:23:00 | larsbot | if we want it to be computed, yes |
| 11:23:28 | larsbot | but let's say you've loaded the TM and you want to refer to a topic that has no subject identifier |
| 11:23:40 | larsbot | how do you do that in a way that is not implementation-specific? |
| 11:23:59 | larsbot | this is something people want to do all the time |
| 11:24:16 | larsbot | every time they write a query on a TM where not all important topics have subject indicators, for example |
| 11:24:17 | abcoates | Well, the only other way is to search on a natural key. |
| 11:24:29 | larsbot | most topics don't have one, in my experience |
| 11:24:32 | abcoates | The basename, or any other name, or an occurrence value. |
| 11:24:55 | abcoates | However, I believe that some topics won't be reachable, and I don't have a problem with that. |
| 11:25:07 | gra | i think the idea of internal topic identifier is a very useful |
| 11:25:14 | gra | and perhaps we should make that part of the model |
| 11:25:28 | gra | internally, every topic has a SINGLE assigned system id |
| 11:25:28 | abcoates | Beware, it can hurt you too. |
| 11:25:42 | abcoates | DB guys can tell you what pain there is in maintaining such identifiers. |
| 11:25:51 | abcoates | Sometimes it is better just to use natural keys. |
| 11:25:54 | abcoates | More robust. |
| 11:26:10 | gra | its our system, we control the creation of all topics etc |
| 11:26:15 | larsbot | well, I do think it should be allowed for applications to remove the source locators if they don't want them |
| 11:26:15 | gra | topics are about identity |
| 11:27:06 | abcoates | What does that mean? |
| 11:27:25 | gra | we have concepts in terms of subj ind, res ref for knowing about when things are the same |
| 11:27:47 | gra | what we are talking about now is having some system id for these topics to support addressing a topic |
| 11:27:56 | gra | unambigously when it is inside a tm engine |
| 11:28:13 | gra | i dont really see what can hurt us? |
| 11:28:19 | abcoates | I expect many engines will have something like that, but I think it should only be an implementation issue. |
| 11:28:36 | gra | no i dont - i think its a standardisation issue |
| 11:28:51 | gra | in order to allow distributed p2p, server client tm engine to integrate |
| 11:28:54 | abcoates | ID management becomes an issue when you have parallel operations occurring on a data store. |
| 11:29:01 | gra | we need to standardise many things, |
| 11:29:26 | gra | one of those things is the property that contains/has the single unique system id for the thing you want to link to |
| 11:29:29 | gra | reference etc |
| 11:30:12 | abcoates | You have to worry about IDs going stale, and things like that. |
| 11:30:23 | larsbot | what do you mean by "stale"? |
| 11:30:35 | gra | OODB has been around for a long time now and i dont think they suffer from not being able to have parallel ops on their objects |
| 11:30:50 | abcoates | User 1 requests an ID for a topic. User 2 deletes the topic. User 1 makes a request using the ID, and there is no matching topic. |
| 11:31:40 | gra | yes, this is called concurrent access but has nothing to do with assigning ids to topics |
| 11:31:47 | gra | if the id is the name or natural key |
| 11:31:53 | gra | the topic can still have been deleted |
| 11:32:10 | larsbot | true |
| 11:32:19 | abcoates | Sure, but that tends to be easier to manage, and to understand. |
| 11:32:32 | gra | i dont see why |
| 11:32:38 | abcoates | Tracking opaque IDs can really kill your productivity. |
| 11:32:54 | gra | opaque? |
| 11:33:17 | abcoates | If you just assign an ID, it will have no relationship to what it describes, so it will be opaque. |
| 11:33:30 | abcoates | 0124536374 is an opaque ID for a currency. |
| 11:33:41 | abcoates | The code "GBP" is a natural identifier. |
| 11:33:59 | larsbot | source locators are more natural identifiers than opaque ones, actually |
| 11:34:07 | gra | when you say no relationship - you mean there are no properties of the object you can derive from the id |
| 11:34:22 | larsbot | or tend to be, I should say |
| 11:34:31 | abcoates | Yes, if you only have the ID, and something goes wrong in the system, you have no idea what was being referred to. |
| 11:34:58 | gra | what kinds of things? |
| 11:35:10 | gra | i mean do you encode EVERY thing about an object into its id? |
| 11:35:21 | gra | no, i dont think so |
| 11:35:28 | abcoates | I'm not saying that you do. |
| 11:35:38 | abcoates | For example ... |
| 11:35:57 | abcoates | You could refer to me as prol #45677493676, and look me up that way. |
| 11:36:28 | abcoates | Or, you could look for surname=Coates, firstname=Anthony. This is the natural key approach, using a composite key, and it is more robust. |
| 11:36:57 | gra | thats lookup based on query or properties of an object |
| 11:37:07 | larsbot | abcoates: I agree it is, but I think there's an angle on this that you've missed |
| 11:37:16 | abcoates | Shoot. |
| 11:37:28 | larsbot | when you want to do a query on an RDBMS that's easy, because tables and columns have defined names that you can use |
| 11:37:49 | larsbot | in a topic map there's no distinction between topic types and ordinary instance topics |
| 11:38:11 | larsbot | this means that to do the query "find all instances of person" you have to find the "person" topic first |
| 11:38:13 | abcoates | If you are telling me that DBs are much easier to query than TMs, we might as well give up here and now. |
| 11:38:32 | larsbot | they're different; that's all |
| 11:38:45 | larsbot | so the question is: how do you find "person"? |
| 11:38:57 | abcoates | There will always have to be certain topics with PSIs (or at least SIs) that you can use comparitively to column names. |
| 11:39:01 | gra | a topicmap system conceptually is the same as a object based system with a runtime model, object system are based on the fact that |
| 11:39:02 | larsbot | if it has a subject identifier it's easy, although slightly awkward (you have to deal with long URIs) |
| 11:39:22 | gra | every object is uniquely identifiable by something other than the values or the properties of that object |
| 11:39:29 | larsbot | yep |
| 11:39:33 | gra | you can still query on those properties |
| 11:39:45 | larsbot | if it has a source locator it is even easier: you can use that to find it, relative to the base URI of the TM |
| 11:39:50 | gra | but you know you can always rely on one property of every objectbeing there |
| 11:40:03 | gra | and thats it unique id within that system |
| 11:40:08 | larsbot | using that approach the query in, say, tolog you can write "instance-of($A, person)?" and be done |
| 11:40:40 | larsbot | for users to have to assign full subject indicators to all their topics before they can start querying their TM is not a very friendly approach |
| 11:40:51 | larsbot | managing the URIs of each individual document is far easier |
| 11:40:54 | abcoates | I think you will have trouble getting TM systems to scale properly if you think you can quickly enough assign a unique ID to everything. |
| 11:41:04 | larsbot | how so? |
| 11:41:23 | gra | i think we've done it, and i think ontopia have done it |
| 11:41:47 | abcoates | It's expensive to track which have been used and which have not. It puts you behind the DB guys, who have tried and rejected that approach (assuming they aren't Access users) |
| 11:42:07 | gra | there are far more challenging issues than assigning unique ids |
| 11:42:22 | larsbot | huh? that's actually the approach recommended by all object-relational mapping guides I've ever seen |
| 11:42:25 | abcoates | I worked for Reuters, and I can tell you that at that scale, IDs are a challenging issue. |
| 11:42:42 | larsbot | I have no diffculty accepting that |
| 11:42:46 | gra | let me give you a scenario tony... |
| 11:42:54 | larsbot | but we need to make this work at both the low end *and* the high end |
| 11:42:56 | abcoates | Lars, the "Scott Ambler" approach of assigning IDs to everything is very popular, but only because application developers know so little about databases. |
| 11:43:17 | larsbot | so what's the problem? |
| 11:44:19 | gra | i ahve a entity of type person who has a name and an age say and they have a ref to a company for whom they work |
| 11:44:29 | gra | i would model that by having an id on the company |
| 11:44:43 | abcoates | Hard to maintain, and tend to reduce performance. In real DB systems, you only have per-table IDs, and lots of tables. This improves the performance. I'll be honest, I'm not the worlds top DBA, but I spent a lot of time talking to guys who are, and I feel convinced on this. |
| 11:44:45 | gra | such that if the company was still the same company but changed it s name my system would still work |
| 11:45:01 | pepper | yes |
| 11:45:05 | pepper | yes |
| 11:45:16 | gra | using the natuarl key approach i would reference |
| 11:45:16 | pepper | yes |
| 11:45:25 | gra | the string name of the company |
| 11:45:34 | pepper | yes |
| 11:45:35 | pepper | yes |
| 11:45:40 | gra | and if the name changes but the company is teh same my system is shot |
| 11:45:42 | pepper | pepper has quit None () |
| 11:45:54 | abcoates | One down. |
| 11:46:41 | pepper | pepper has joined #topicmaps |
| 11:46:47 | pepper | gra: yes |
| 11:46:54 | pepper | (sorry - lost my connection) |
| 11:47:09 | larsbot | pepper: you did say "yes" 6 times before doing so, though |
| 11:47:26 | pepper | OK - something hung itself up here :-) |
| 11:47:37 | larsbot | yep. I was about to kick you off the channel :-) |
| 11:47:39 | pepper | what I actually meant was "yes" |
| 11:48:06 | larsbot | abcoates: "only per-table IDs". that doesn't sound so different to me, but maybe I'm missing something |
| 11:48:14 | larsbot | the world's top DBA I'm definitely not :) |
| 11:48:20 | pepper | gra: do you see a problem with keeping both? |
| 11:48:33 | abcoates | Table IDs tend to be smaller, but more to the point, they are decoupled from each other. |
| 11:48:49 | gra | ?? |
| 11:48:58 | abcoates | It is much quicker to assign a new ID in table that assign to new global ID which is unique across all tables and databases. |
| 11:49:12 | pepper | gra: ?? to what - my question? |
| 11:49:20 | gra | pepper: no i dont see a problem |
| 11:49:34 | pepper | then i propose the following: |
| 11:49:42 | pepper | (1) subj ident *must* be retained |
| 11:49:52 | pepper | (2) source locator *may* be retained |
| 11:50:02 | pepper | (3) it is (obviously) not an error to have both |
| 11:50:03 | gra | tony, that is just part of the engineering - but at the conceptual level every obj has a unique id |
| 11:50:16 | gra | <db><table><localid> |
| 11:50:40 | larsbot | pepper: in that case (2) must be changed to "must" rather than "may" |
| 11:50:50 | pepper | why? |
| 11:51:06 | larsbot | what's the point of a "may" rule? |
| 11:51:17 | larsbot | in any case, this is only half the issue, so we're not through yet |
| 11:51:34 | larsbot | and I would like a rethink of the whole source locators thing before settling this |
| 11:51:39 | pepper | may allows implementations that need to keep a record of the source hanging around to do so, without imposing that overhead on others |
| 11:52:19 | larsbot | so you mean that (2) should apply in general, not just in this specific case? |
| 11:52:32 | pepper | rethink: then we will have to put the whole thing off until London |
| 11:52:39 | larsbot | not necessarily |
| 11:53:00 | gra | i agree with lars i'd rather have all the issues regarding idenity sorted and understood rather than a fix/soln to the current issue |
| 11:53:09 | pepper | (2) apply in general: no. only when subj ident makes src loc redundant for the purpose of merging |
| 11:53:10 | abcoates | I should note too that Reuters always loses money if people treat table IDs as persistent IDs. They aren't, and making them so makes data maintenance difficult if not impossible. |
| 11:53:51 | pepper | maybe we should use the RM as an "analytical tool" to help us get to the bottom of this? |
| 11:55:04 | larsbot | abcoates: I find this ID thing interesting, but I'm not sure it is relevant to source locators |
| 11:55:32 | abcoates | It is if people expect to use them to refer to particular topics in an engine. |
| 11:57:55 | gra | heres my proposal, ditch src locs from the model, add a new property on topicmapobject called id, and refied becomes real rather than computed |
| 11:58:12 | pepper | * pepper has to leave for a while... |
| 11:58:14 | pepper | pepper has quit None () |
| 11:58:24 | gra | i'm not sure its a great proposal, but its an option |
| 11:58:45 | abcoates | The problem isn't so much the ID, but the persistence. |
| 11:58:47 | larsbot | I think that proposal is equivalent to the old, seen from abcoates's perspective |
| 11:58:55 | abcoates | Persistent IDs are the problem. |
| 11:59:05 | abcoates | If you allow the ID to change over time, it is OK. |
| 11:59:44 | larsbot | it sounds like this has more to do with how we recommend that people use these URIs than with the structure of the data model |
| 12:00:21 | gra | maybe, although specifically designating one property in the data model to be the system id |
| 12:00:25 | gra | for that object |
| 12:00:40 | abcoates | Yes, but is it persistent. |
| 12:00:41 | gra | is more than just a usage point |
| 12:00:45 | gra | yes |
| 12:00:52 | gra | object systems seem to work perfectly well with persistent ids why cant topicmaps |
| 12:00:59 | abcoates | I think you will have a management problem there. Reuters always did. |
| 12:01:06 | abcoates | Object systems are small, databases are big. |
| 12:01:12 | abcoates | This is the scaling issue. |
| 12:02:02 | abcoates | From a scaling perspective, the solution is at the API level rather than the model level, that when you request a system ID, you get one, but you also get an expiry date/time after which you need to request a new system ID. |
| 12:02:33 | abcoates | Add that to the SAM API by all means. |
| 12:02:59 | gra | please explain to me how the identity for the object with name 'graham moore' id #85383853 will change over time |
| 12:03:46 | abcoates | You may run out of numbers, find the 90% of the assigned numbers are no longer needed as the recipients are dead, so you want to reassign the numbers. |
| 12:04:06 | abcoates | ID churn can become a serious issue. |
| 12:04:09 | gra | ok - lets say we use guids |
| 12:04:26 | gra | is the issue here just the size of these things? |
| 12:04:35 | abcoates | You can, but they are slow to create, which is why DBAs avoid them. They are a theoretical solution, but impact performance. |
| 12:04:47 | abcoates | (i.e. GUIDs are slow to create) |
| 12:04:51 | gra | sure |
| 12:05:14 | gra | but there are engineering solns to this |
| 12:05:29 | abcoates | Well, Oracle hasn't got that solution yet. |
| 12:05:29 | gra | such as create them in advance etc |
| 12:05:32 | abcoates | If you have, great. |
| 12:05:50 | abcoates | However, DB performance has to be the target, to my mind. |
| 12:08:09 | larsbot | abcoates: the bit I don't understand is why they can't be persistent (and why this is not a usage issue) |
| 12:09:12 | abcoates | If it were easy and didn't have serious performance overheads, DBAs would do it all the time, because it is conceptually the most obvious thing to do. |
| 12:09:45 | abcoates | They don't do it, because it doesn't work in systems with lots of fast changes. |
| 12:09:53 | abcoates | In a system with few changes, it isn't a problem. |
| 12:10:06 | gra | fast changes to what though? |
| 12:10:25 | gra | the values of the properties of the objects? |
| 12:10:26 | abcoates | The thing is, I don't like to ignore what DBAs do now, because they've tried a lot of things, and they know which ones do and do not work. |
| 12:10:31 | mariyo | mariyo has joined #topicmaps |
| 12:10:49 | abcoates | No, not changes to simple property values. |
| 12:11:01 | abcoates | Changes to objects, or effectively rows in database tables. |
| 12:11:30 | abcoates | In a transaction processing system, these are created and deleted all of the time. |
| 12:12:09 | abcoates | If you wanted to do a topic map containing real-time financial data (and why not, after all), you would have continual creation of topics. |
| 12:12:58 | gra | how are the values in a row not simple properties of that object? |
| 12:13:39 | larsbot | evening, mariyo :-) |
| 12:13:46 | larsbot | * larsbot goes afk for 10 minutes |
| 12:13:52 | gra | hi mariyo |
| 12:14:01 | mariyo | good day. trying to catch up on this conversation :) |
| 12:14:01 | abcoates | The values are, but you need to create/delete rows, not just change values. |
| 12:14:12 | mariyo | hi gra and abcoates. |
| 12:14:19 | abcoates | Hi. |
| 12:14:49 | gra | ok, so in this case you'd say that the creation of each row and a unique id for each row is too heavy an op |
| 12:15:31 | mariyo | abcoates: why do you say that? |
| 12:15:40 | abcoates | DBA experience. |
| 12:15:44 | abcoates | (not mine) |
| 12:15:58 | abcoates | The smaller the index range, the smaller the problem. |
| 12:16:03 | gra | i thought that was what you were implying |
| 12:16:08 | abcoates | Hence the use of separate tables wherever possible. |
| 12:16:25 | gra | i.e. if the function nextID(); was always lightning fast everything would have ids |
| 12:16:35 | abcoates | Until you reach the maximum integer. |
| 12:16:42 | abcoates | And that *can* happen in real systems. |
| 12:16:54 | gra | ok, lets assume that it is a guid |
| 12:16:56 | abcoates | Enterprise systems can't just use the Access "autonumber" approach to IDs. |
| 12:17:05 | gra | indeed |
| 12:17:21 | abcoates | You can always theoretically assume a GUID. They simply are slow to create, hence DBAs avoid them as much as possible. |
| 12:17:32 | abcoates | I suggested such a thing once, and was told not to even think about it. |
| 12:17:50 | gra | thats why I said if the function nextID() was lightning fast, even on guids |
| 12:17:56 | gra | then things would have ids |
| 12:18:09 | abcoates | Yes, it you could do things better than Oracle do, it might work. But that is a big call. |
| 12:18:25 | gra | agreed |
| 12:19:08 | abcoates | Really, if you just have the SAM API so that system IDs come with an expiry date/time, it will help long term management. |
| 12:19:22 | abcoates | That's all you need. |
| 12:20:04 | abcoates | Of course, it forces you to have an alternative way of specifying which topic you want, but I think that can only lead to better design elsewhere. |
| 12:23:07 | gra | out of interest i just checked with k42 dev team and we can store ~8 trillion uniquely id objects, where each id requires 128bits in the index, and a topicIDRef is 1-8 bytes |
| 12:24:34 | abcoates | Yes, that sounds like a lot, but you run out after 1 million create/delete cycles on each of 8 million objects, and that can happen. |
| 12:24:35 | gra | now ok - we arent oracle, and maybe we havent tested with the volume of data at reuters but i'd be impressed to see that tm run out of unique ids |
| 12:25:24 | mariyo | * mariyo looking over the logs. |
| 12:25:47 | gra | ok - we can recycle ids |
| 12:26:03 | mariyo | abcoates: you said this, From a scaling perspective, the solution is at the API level rather than the model level, that when you request a system ID, you get one, but you also get an expiry date/time after which you need to request a new system ID. |
| 12:26:30 | mariyo | this would be the way you would handle this, but you don't want this specified in a standard, do you? |
| 12:27:18 | abcoates | What I didn't want specified in the standard is that you can request a persistent system ID for any topic. |
| 12:27:38 | abcoates | And if you recycle IDs, they are no longer persistent. |
| 12:27:42 | mariyo | ok, got you. I agree with this. |
| 12:32:15 | mariyo | (1) subj ident *must* be retained |
| 12:32:28 | mariyo | (2) source locator *may* be retained |
| 12:32:53 | mariyo | this is what we have agreed to so far? |
| 12:33:43 | mariyo | gra: you want (2) to be *must*? |
| 12:33:59 | abcoates | Well, (2) is what we have been discussing. I don't know if you can call it agreed or not. Also (3) is whether the TM model should include a system ID for each topic, and if so, should it be persistent. |
| 12:36:55 | mariyo | * mariyo goes back to read logs again. |
| 12:39:03 | mariyo | (3) seems to be implementation specific, but i guess you have been discussing whether it should be or not. |
| 12:39:55 | abcoates | Well, it was seen as an alternative to source locators. |
| 12:40:10 | abcoates | The issue is how you specify a particular topic in an engine. |
| 12:40:27 | abcoates | Do you always have some kind of persistent ID, or do you have to locate it by some query on values. |
| 12:40:49 | abcoates | I favour the query approach as being more robust and having fewer key management issues. |
| 12:41:24 | mariyo | ok that's what i thought you were getting at, so if you use these as perisitent ids you don't want to thow out the source locators then. |
| 12:42:01 | abcoates | Well, we were discussing, in part, whether system IDs were a suitable replacement for locators. |
| 12:42:10 | abcoates | My argument was that both had similar problems. |
| 12:42:41 | mariyo | so really no solution yet :( |
| 12:43:24 | abcoates | Well, no complete agreement yet (not that I noticed). |
| 12:43:54 | abcoates | As I said, I think system IDs can work, as long as they expire after a period. |
| 12:43:58 | abcoates | Forever is a *very* long time. |
| 12:44:58 | abcoates | That way, Gra can allow his IDs to live for 100 years if he wants. If I were running a system, I would be more likely to expire IDs after a day. |
| 12:47:31 | mariyo | why do you need these daily expirations for financial data? |
| 12:50:08 | abcoates | Well, it's partly a management thing. |
| 12:50:28 | abcoates | The Reuters databases are *huge*. It publishes something like a terabyte of financial data each day. |
| 12:51:35 | abcoates | If you need to make changes to the database structure, to improve performance or implement new features, you need to know how long old structures need to be maintained in parallel with new structures for changeover. |
| 12:51:59 | abcoates | The longer IDs persist, the longer you are forced to leave old structures in place, and update them in parallel with new structures. |
| 12:52:31 | abcoates | For many applications, asking them to get a new ID once per day is not onerous. For the rest of the day, access is fast. |
| 12:53:42 | abcoates | If users believe IDs are persistent, they code up fragile applications that will fail if the IDs ever need to be changed. |
| 12:55:33 | mariyo | i get your point. it really all depends on the scale them. I am not working at this scale and in a very different context. |
| 12:55:54 | abcoates | Sure, almost any approach will work if the problem is small enough. |
| 12:56:20 | abcoates | I just don't want the TM spec to be limited because TMs now are smaller than enterprise databases. |
| 12:58:18 | mariyo | i am just thinking about the various systems i work with, how ids are assigned now and what that would mean in the future in the context of TMS. that's why i am intereste in this discussion. it is very close to home. |
| 12:58:29 | abcoates | OK. |
| 12:59:00 | abcoates | It turned out to be a vastly more complicated topic with Reuters than I would have expected, coming from my naive Java development background. |
| 13:10:12 | abcoates | I have to go now. Bye! |
| 13:10:14 | abcoates | abcoates has left #topicmaps () |
| 13:20:46 | mariyo | * mariyo away for a while. |
| 13:57:31 | mariyo | well, have a good day. bye! |
| 13:58:14 | mariyo | mariyo has quit None ("time to call it a night. bye everyone and see you tomorrow!") |
| 14:37:41 | larsbot | * larsbot got tied up with lots of discussions |
| 14:37:41 | larsbot | sorry |
| 14:38:18 | larsbot | gra: I'm collecting lots of SAM comments now. will consolidate and send to you |
| 14:38:32 | larsbot | we also need to identify XTM issues |
| 14:41:43 | gra | gra has quit None (Read error: 54 (Connection reset by peer)) |
| 17:30:25 | larsbot | * larsbot -> home |
| 17:30:28 | larsbot | larsbot has quit None ("[x]chat") |
| 18:43:10 | larsbot | larsbot has joined #topicmaps |
| 19:37:52 | SeeTemp | SeeTemp has quit None ("Client Exiting") |
| 21:00:33 | larsbot | hmmmm. xchat 2.0 is out |
| 22:27:08 | larsbot | larsbot has quit None (leguin.freenode.net irc.freenode.net) |
| 22:27:08 | xover | xover has quit None (leguin.freenode.net irc.freenode.net) |
| 22:27:09 | em | em has quit None (leguin.freenode.net irc.freenode.net) |
| 22:27:10 | grove | grove has quit None (leguin.freenode.net irc.freenode.net) |
| 22:27:10 | arnarl | arnarl has quit None (leguin.freenode.net irc.freenode.net) |
| 22:28:36 | larsbot | larsbot has joined #topicmaps |
| 22:28:36 | xover | xover has joined #topicmaps |
| 22:28:36 | em | em has joined #topicmaps |
| 22:28:37 | grove | grove has joined #topicmaps |
| 22:29:00 | em | em has quit None (leguin.freenode.net irc.freenode.net) |
| 22:29:00 | xover | xover has quit None (leguin.freenode.net irc.freenode.net) |
| 22:29:00 | larsbot | larsbot has quit None (leguin.freenode.net irc.freenode.net) |
| 22:29:01 | grove | grove has quit None (leguin.freenode.net irc.freenode.net) |
| 22:29:09 | arnarl | arnarl has joined #topicmaps |
| 22:29:33 | larsbot | larsbot has joined #topicmaps |
| 22:29:33 | xover | xover has joined #topicmaps |
| 22:29:34 | em | em has joined #topicmaps |
| 22:29:35 | grove | grove has joined #topicmaps |
| 22:30:14 | GabeW | GabeW has joined #topicmaps |
| 23:17:41 | larsbot | hi there, GabeW |
| 23:18:08 | GabeW | hi there larsbot |
| 23:19:22 | larsbot | is this the latest version of the XRI motivations/model doc: http://lists.oasis-open.org/archives/xri/200301/msg00045.html? |
| 23:31:15 | larsbot | * larsbot ploughing into document in the hope that it may be the latest one |
| 23:32:14 | GabeW | yeah |
| 23:32:16 | GabeW | it is |
| 23:32:19 | GabeW | its the only one |
| 23:32:25 | GabeW | thx much |
| 23:32:31 | GabeW | i havne't gotten much feedback at all on it |
| 23:33:08 | larsbot | you'll get some now :) |
| 23:35:09 | GabeW | ut oh |
| 23:36:57 | GabeW | one request: be nice to me ;-) |
| 23:37:23 | larsbot | I'll keep the tone nice, don't worry :) |
| 23:37:53 | larsbot | to communicate with me it will need more work, though |
| 23:38:02 | GabeW | ok, thats fine |
| 23:38:07 | GabeW | in fact thats very good to hera |
| 23:38:08 | GabeW | hear |
| 23:38:49 | larsbot | the first section is good, but after that it gets very confusing |
| 23:38:57 | GabeW | hehehh |
| 23:39:07 | larsbot | there's too much about directories in general, and too little about XRI's intentions towards them |
| 23:39:11 | GabeW | hmm |
| 23:39:12 | larsbot | I think shortening the document is a good idea |
| 23:39:45 | GabeW | this is literally the only substantive feedback I've gotten so far, so it counts a lot |
| 23:39:51 | GabeW | I'm trying to cover a lot of ground in it |
| 23:40:01 | GabeW | not all of it (maybe not most of it) in response to your concerns |
| 23:40:25 | larsbot | I think the document needs what I think of as "combing" |
| 23:40:30 | GabeW | heh |
| 23:40:41 | larsbot | that is, pulling the important stuff towards the front, and pushing the less important towards the back |
| 23:40:45 | GabeW | right |
| 23:40:47 | GabeW | i'm all for that |
| 23:41:07 | GabeW | in fact, i'm happy to hear that because thats what my gut feeling was |
| 23:41:13 | GabeW | it definitely rambles |
| 23:41:54 | larsbot | trying to give some constructive advice in my reply |
| 23:42:15 | GabeW | well, anything beyond "it sucks" I can usually take constructively |
| 23:42:46 | GabeW | this is a very rough first draft, so focusing on higher level issues (rather than grammar, etc) would be most helpful - substantive stuff, you might say |
| 23:44:11 | larsbot | I'm trying :) |
| 23:46:21 | larsbot | GabeW: is the document trying to tell us what the XRI TC is here to accomplish? |
| 23:46:46 | GabeW | well, ok, so you caught me a little - it started out as a "model" document - the way we see the world |
| 23:46:50 | GabeW | and sorta migrated in purpose |
| 23:47:00 | GabeW | why do you ask? |
| 23:47:03 | larsbot | right. in that context it makes more sense |
| 23:47:12 | larsbot | it just doesn't seem very concerned with talking specifically about the XRI TC |
| 23:47:16 | GabeW | ah |
| 23:47:18 | GabeW | got it |
| 23:47:28 | larsbot | it seems more like a "General notes on the tech context in which the TC works" kind of thing |
| 23:47:32 | GabeW | yeah |
| 23:47:41 | GabeW | so in that sense, it doesn't address your concerns as direclty |
| 23:47:55 | larsbot | aha. is it then inappropriate for me to criticise it for failing to do that? |
| 23:48:29 | GabeW | uh, well, its probably appropriate to make that criticism of the effort in general, and i can note that this document doesn't help in that direction... |
| 23:48:36 | GabeW | how about that for diplomacy-speak? |
| 23:48:48 | GabeW | * GabeW has been boning up on that with all the UN machinations going on recently |
| 23:49:27 | larsbot | we probably shouldn't talk about real-life politics here (keep the temperature down :) |
| 23:49:59 | larsbot | yeah, but I'm accumulating notes now on how it fails to tell me about the TC |
| 23:50:06 | larsbot | I'm wondering if there's any point in sending that in |
| 23:50:18 | larsbot | I mean, if you're not trying to do that... |
| 23:51:00 | GabeW | hmm |
| 23:51:59 | GabeW | i'd like to hear all of your thoughts, but if it'd save you time, I can stipulate that the document fails miserably at the task of telling you what the TC is planning on doing in any detail |
| 23:52:20 | larsbot | I don't mind it failing to do that, since I can then at least tell you *how* it fails |
| 23:52:29 | larsbot | what's more troublesome is if it wasn't meant to do that at all |
| 23:52:52 | larsbot | if I tell you how it fails to do something it was never meant to do, I'm probably wasting time for everybody concerned |
| 23:52:55 | larsbot | that's why I'm asking |
| 23:53:21 | GabeW | i don't think I intended it while I was writing it |
| 23:53:27 | GabeW | so ignore that facet |
| 23:53:38 | GabeW | but clearly there is still a gap there |
| 23:53:46 | GabeW | whether in this document or elsewhere |
| 23:54:03 | GabeW | i'm worried that you don't have other feedback.. |
| 23:54:35 | larsbot | well, I have no other feedback because before I know what you are trying to do there's really nothing I can say |
| 23:54:43 | GabeW | hmm |
| 23:54:56 | larsbot | at this stage I don't care why you are trying to do X or how you are trying to do X |
| 23:55:00 | larsbot | I want to know what X is :-) |
| 23:55:08 | GabeW | ok |
| 23:55:09 | larsbot | *then* we can get to why and how |
| 23:55:13 | GabeW | well |
| 23:56:33 | GabeW | we apparently have the challenging task of understanding whats not obvious to others because I *think* what we are doing is very straightforward - clearly we (participants in the XRI TC) have some unstated assumptions here that I'm trying to uncover so that folks like you understand what we are trying to do. |
| 23:56:59 | larsbot | it seems that way, yes |
| 23:57:00 | GabeW | you asked what we were going to deliver... |
| 23:57:09 | larsbot | should I send this directly to you first so you can look at it? |
| 23:57:14 | GabeW | sure... |
| 23:57:57 | GabeW | i was hoping by writing this document that I'd help to describe our thinking so that others could see some discrepancies or disconnects and by that process we'd begin to uncover the hidden assumptions |
| 23:58:09 | larsbot | maybe it will work |
| 23:58:19 | GabeW | didn't work with you *yet* ;-) |
| 23:58:20 | larsbot | I sent a rough comments draft to you now so you can look it over |
| 23:58:27 | larsbot | then we'll see :) |
| 23:58:29 | GabeW | thanks a million! |
| 23:58:50 | GabeW | I really do want to answer your questions - I'm not trying to be obtuse - I'm actually not really good at being obtuse, even when its called for... |
| 23:59:37 | larsbot | :) |
| 23:59:38 | GabeW | hey, your comments are really useful! |
| 23:59:46 | larsbot | ah, excellent! |
| 23:59:52 | larsbot | I was wondering if I was way off the page or what |
| 23:59:52 | GabeW | at least the half of them I've read so far |