[osis-users] Topic Maps

Troy A. Griffitts scribe at crosswire.org
Fri Jan 15 21:26:05 MST 2010


Thanks for all the useful info Patrick.  I hope you've gone to sleep and are reading this in the morning. Writing from my phone so must be rude and top-post...

Well, I've been asked to produce some statistical data about authors' use of place names. I'd like to report which letters, authors, genre, etc. have higher concentrations of place names.  This is a simple computation if the data is marked up correctly.

Would a topic map retain all referrer information from a base text?

Thanks again Patrick!

Troy 

Patrick Durusau <patrick at durusau.net> wrote:

>Troy,
>
>Just quickly because it is way past my bedtime!
>
>Troy A. Griffitts wrote:
>> Thanks Patrick.  So had we planned a subjectIdentifier attribute on 
>> either <w> or <name> (as Peter pointed out we added likely for proper 
>> name indication)?
>>
>> Steve, do you remember our discussion when we added marker to the <q> 
>> attribute, when we talked about a generalized defaulting mechanism 
>> which would allow the header to contain things like:
>>
>>  <default>//q[@level="1"]/@marker='"'</default>
>>  <default>//q[@level="2"]/@marker="'"</default>
>>  <default>//w[@lemma="([^:]*)"]/@lemma="strong:\1"</default>
>>
>> Anyway, I was just wondering what happened to this idea?  I'm not sure 
>> I'd want to implement a fullblown xquery parser like what would be 
>> required in my example above, but some basic defaulting mechanism 
>> would still be nice.
>>
>> Patrick, in your example, I'd like to be able to say something like:
>>
>> <default>//w[@subjectIdentifier="(.*)"]/@subjectIdentifier="http://crosswire.org/names/\1"</default> 
>>
>>
>> so I could simply use in my doc:
>>
>> <w subjectIdentifier="jerusalem1">Jerusalem</w>
>>
>>
>> But this is merely to clean up my markup in the event our docs are 
>> ever opened in an editor by a human, and to potentially prevent errors 
>> when hand editing.  Sorry, I just like to factor stuff out when possible.
>>
>>
>> Patrick Durusau wrote:
>>> The question is one of how much information do you want to store in 
>>> the identifier that appears when you mark a reference to a subject?
>>
>> Yes, having this level of indirection that a subjectIdentifier 
>> provides serves a great purpose and is perfect if I'm 'at' an element 
>> I want to dig deeper into.  But my current objective is to find all 
>> place names in a document, which would require me to dereference each 
>> identifier, querying the referent for the 'type' of each subject, 
>> e.g., "geo-city".
>>
>> Hence my poorly applied lemma/morph scheme:
>>
>> <w lemma="placenames:jerusalem1" 
>> morph="placenamestype:geo-city">Jerusalem</w>
>>
>> makes processing for my immediate objective easier.  You mentioned 
>> above that the question is 'how much information' to store in the 
>> identifier itself... So is this suggesting a solution like?:
>>
>> <w subjectIdentifier="geo/city/jerusalem1">Jerusalem</w>
>>
>> This would give me what I need to easily process the data (even if we 
>> had to specify the full:
>> subjectIdentifier="http://crosswire.org/names/geo/city/jerusalem1")
>>
>Sorry, why would you be parsing the text to find an entry of a 
>particular type? Why not query the topic map, which was built by parsing 
>the text. That is what information overlays bring to the table. I was 
>using the syntax I was just to illustrate how a user could markup a text 
>for later use in building a topic map to run over it.
>>
>> Thanks for the discussion on this!
>>
>>
>> I feel your pain.  My primary laptop died in December and I purchased 
>> a netbooky hp dm3 thingy to hold me over until I could order a 
>> replacement.  I just finished MOVING all of my data over to this new 
>> little thing's large (by comparison to my old system) 320Gig drive and 
>> days later the new drive crashed.  Now I'm booting Ubuntu on the new 
>> computer with my old 100Gig drive plugged into the USB port (old drive 
>> is PATA, new computer is SATA) until my real laptop replacement gets 
>> here.  And all my data on the 320Gig new drive is lost!  I was picking 
>> and choosing folders from my old drive and did moves instead of copies 
>> so I could remember what I had already grabbed.  Stupid me.  Did you 
>> find an affordable data recovery service?
>>
>No, I have talked to them but never actually used one of them.
>
>I was running mirrored drives so that helped avoid data loss but not the 
>down time.
>
>I have an external backup system that should arrive tomorrow that claims 
>you can have a constant backup and should your primaries fail, you can 
>plug the backup solution into another computer and boot from the 
>external usb drive. I won't trust that until I see it done but that 
>would be neat.
>
>Still running mirrored drives with that on top of it. Data loss is 
>always possible but with that plus copies to another drive I have of the 
>critical stuff, the chances should be remote.
>
>Sorry to hear about the new drive! There are a lot of things they can do 
>at the data recover services. Not cheap but doable.
>
>Hope you get good news on your drive real soon now!
>
>Patrick
>>
>> Troy
>>
>>
>>
>>
>>
>>>
>>> Take your example:
>>>
>>> <w 
>>> subjectIdentifier="http://www.crosswire.org/names/jerusalem">Jerusalem</w> 
>>>
>>>
>>> Elsewhere, there is a topic in a topic map that has that same 
>>> subjectIdentifier property and it is a records that the subject it 
>>> represents, is an instance of type place, along with names for it in 
>>> other languages and any other information you want to record about 
>>> that subject.
>>>
>>> The key is the use of a subjectIdentifier to identify the subject. Why?
>>>
>>> Because someone else, in another Bible project may have:
>>>
>>> <w 
>>> subjectIdentifier="htttp//www.otherproject.org/geonames/israel/jerusalem">Jerusalem</w> 
>>>
>>>
>>> Now what?
>>>
>>> Well, any topic can have a *set* of subjectIdentifier properties 
>>> which signals that both subjectIdentifiers identify the same subject.
>>>
>>> (Note I have used the XTM syntax for the attributes but it would be 
>>> possible to declare equivalent subject identifiers even if they were 
>>> in different formats or structures. I am working on an example using 
>>> XQuery to make that point. Probably won't be ready for a week or so. 
>>> My main system died last night but due to disk mirroring and paying a 
>>> lot of money, I got it back late this afternoon.)
>>>
>>> That will allow you to disambiguate all the names as well as to add 
>>> far more information that you could possibly put in an attribute. 
>>> Such as marking the morphology of a lemma and displaying for a user 
>>> the distribution of that lemma over a book or range of books. 
>>> (Assuming you represented all of those as occurrences or even 
>>> associations with explicit roles if you liked.
>>>
>>> Yes, I have been thinking about topic maps and biblical texts a lot. ;-)
>>>
>>> Hope you are having a great day!
>>>
>>> Patrick
>>>
>>
>>
>> _______________________________________________
>> osis-users mailing list
>> osis-users at crosswire.org
>> http://www.crosswire.org/mailman/listinfo/osis-users
>>
>
>-- 
>Patrick Durusau
>patrick at durusau.net
>Chair, V1 - US TAG to JTC 1/SC 34
>Convener, JTC 1/SC 34/WG 3 (Topic Maps)
>Editor, OpenDocument Format TC (OASIS), Project Editor ISO/IEC 26300
>Co-Editor, ISO/IEC 13250-1, 13250-5 (Topic Maps) 
>
>
>_______________________________________________
>osis-users mailing list
>osis-users at crosswire.org
>http://www.crosswire.org/mailman/listinfo/osis-users


More information about the osis-users mailing list