[osis-users] Topic Maps

Sat Jan 16 07:09:47 MST 2010

Troy,

Troy A. Griffitts wrote:
> Thanks for all the useful info Patrick.  I hope you've gone to sleep and are reading this in the morning. Writing from my phone so must be rude and top-post...
>
> Well, I've been asked to produce some statistical data about authors' use of place names. I'd like to report which letters, authors, genre, etc. have higher concentrations of place names.  This is a simple computation if the data is marked up correctly.
>
>   
True, but if by "marked up" you mean "in line" with the content, then 
you are:

1) limited to the information that was put in-line at some point (or 
must open the text up to more information, which may make it 
incompatible with software designed to work with the prior version of 
the syntax/information)

2) cannot allow others to add information because they might befoul the 
syntax or text

3) limit the processing of the text (assuming large annotations) to 
software/hardware that can handle extensive annotation. With a topic 
map, using the same reference as a multicore processor machine, a cell 
phone could retrieve only the information for a reference that it can 
handle. Same text, same identifier, just a different result.

4) not to mention that to be useful, you are going to have to enforce 
one way to name places for example, can't mix place names in multiple 
ancient or modern languages.

> Would a topic map retain all referrer information from a base text?
>
>   
I am not sure what you mean by "referrer information" but if you mean 
the attribute values, yes.

As a matter of fact, we could treat things like "morph" as subjects as 
well so we could map from other systems that use different terminology 
for their infrastructure.

Hope this helps!

Patrick

> Thanks again Patrick!
>
> Troy 
>
> Patrick Durusau <patrick at durusau.net> wrote:
>
>   
>> Troy,
>>
>> Just quickly because it is way past my bedtime!
>>
>> Troy A. Griffitts wrote:
>>     
>>> Thanks Patrick.  So had we planned a subjectIdentifier attribute on 
>>> either <w> or <name> (as Peter pointed out we added likely for proper 
>>> name indication)?
>>>
>>> Steve, do you remember our discussion when we added marker to the <q> 
>>> attribute, when we talked about a generalized defaulting mechanism 
>>> which would allow the header to contain things like:
>>>
>>>  <default>//q[@level="1"]/@marker='"'</default>
>>>  <default>//q[@level="2"]/@marker="'"</default>
>>>  <default>//w[@lemma="([^:]*)"]/@lemma="strong:\1"</default>
>>>
>>> Anyway, I was just wondering what happened to this idea?  I'm not sure 
>>> I'd want to implement a fullblown xquery parser like what would be 
>>> required in my example above, but some basic defaulting mechanism 
>>> would still be nice.
>>>
>>> Patrick, in your example, I'd like to be able to say something like:
>>>
>>> <default>//w[@subjectIdentifier="(.*)"]/@subjectIdentifier="http://crosswire.org/names/\1"</default> 
>>>
>>>
>>> so I could simply use in my doc:
>>>
>>> <w subjectIdentifier="jerusalem1">Jerusalem</w>
>>>
>>>
>>> But this is merely to clean up my markup in the event our docs are 
>>> ever opened in an editor by a human, and to potentially prevent errors 
>>> when hand editing.  Sorry, I just like to factor stuff out when possible.
>>>
>>>
>>> Patrick Durusau wrote:
>>>       
>>>> The question is one of how much information do you want to store in 
>>>> the identifier that appears when you mark a reference to a subject?
>>>>         
>>> Yes, having this level of indirection that a subjectIdentifier 
>>> provides serves a great purpose and is perfect if I'm 'at' an element 
>>> I want to dig deeper into.  But my current objective is to find all 
>>> place names in a document, which would require me to dereference each 
>>> identifier, querying the referent for the 'type' of each subject, 
>>> e.g., "geo-city".
>>>
>>> Hence my poorly applied lemma/morph scheme:
>>>
>>> <w lemma="placenames:jerusalem1" 
>>> morph="placenamestype:geo-city">Jerusalem</w>
>>>
>>> makes processing for my immediate objective easier.  You mentioned 
>>> above that the question is 'how much information' to store in the 
>>> identifier itself... So is this suggesting a solution like?:
>>>
>>> <w subjectIdentifier="geo/city/jerusalem1">Jerusalem</w>
>>>
>>> This would give me what I need to easily process the data (even if we 
>>> had to specify the full:
>>> subjectIdentifier="http://crosswire.org/names/geo/city/jerusalem1")
>>>
>>>       
>> Sorry, why would you be parsing the text to find an entry of a 
>> particular type? Why not query the topic map, which was built by parsing 
>> the text. That is what information overlays bring to the table. I was 
>> using the syntax I was just to illustrate how a user could markup a text 
>> for later use in building a topic map to run over it.
>>     
>>> Thanks for the discussion on this!
>>>
>>>
>>> I feel your pain.  My primary laptop died in December and I purchased 
>>> a netbooky hp dm3 thingy to hold me over until I could order a 
>>> replacement.  I just finished MOVING all of my data over to this new 
>>> little thing's large (by comparison to my old system) 320Gig drive and 
>>> days later the new drive crashed.  Now I'm booting Ubuntu on the new 
>>> computer with my old 100Gig drive plugged into the USB port (old drive 
>>> is PATA, new computer is SATA) until my real laptop replacement gets 
>>> here.  And all my data on the 320Gig new drive is lost!  I was picking 
>>> and choosing folders from my old drive and did moves instead of copies 
>>> so I could remember what I had already grabbed.  Stupid me.  Did you 
>>> find an affordable data recovery service?
>>>
>>>       
>> No, I have talked to them but never actually used one of them.
>>
>> I was running mirrored drives so that helped avoid data loss but not the 
>> down time.
>>
>> I have an external backup system that should arrive tomorrow that claims 
>> you can have a constant backup and should your primaries fail, you can 
>> plug the backup solution into another computer and boot from the 
>> external usb drive. I won't trust that until I see it done but that 
>> would be neat.
>>
>> Still running mirrored drives with that on top of it. Data loss is 
>> always possible but with that plus copies to another drive I have of the 
>> critical stuff, the chances should be remote.
>>
>> Sorry to hear about the new drive! There are a lot of things they can do 
>> at the data recover services. Not cheap but doable.
>>
>> Hope you get good news on your drive real soon now!
>>
>> Patrick
>>     
>>> Troy
>>>
>>>
>>>
>>>
>>>
>>>       
>>>> Take your example:
>>>>
>>>> <w 
>>>> subjectIdentifier="http://www.crosswire.org/names/jerusalem">Jerusalem</w> 
>>>>
>>>>
>>>> Elsewhere, there is a topic in a topic map that has that same 
>>>> subjectIdentifier property and it is a records that the subject it 
>>>> represents, is an instance of type place, along with names for it in 
>>>> other languages and any other information you want to record about 
>>>> that subject.
>>>>
>>>> The key is the use of a subjectIdentifier to identify the subject. Why?
>>>>
>>>> Because someone else, in another Bible project may have:
>>>>
>>>> <w 
>>>> subjectIdentifier="htttp//www.otherproject.org/geonames/israel/jerusalem">Jerusalem</w> 
>>>>
>>>>
>>>> Now what?
>>>>
>>>> Well, any topic can have a *set* of subjectIdentifier properties 
>>>> which signals that both subjectIdentifiers identify the same subject.
>>>>
>>>> (Note I have used the XTM syntax for the attributes but it would be 
>>>> possible to declare equivalent subject identifiers even if they were 
>>>> in different formats or structures. I am working on an example using 
>>>> XQuery to make that point. Probably won't be ready for a week or so. 
>>>> My main system died last night but due to disk mirroring and paying a 
>>>> lot of money, I got it back late this afternoon.)
>>>>
>>>> That will allow you to disambiguate all the names as well as to add 
>>>> far more information that you could possibly put in an attribute. 
>>>> Such as marking the morphology of a lemma and displaying for a user 
>>>> the distribution of that lemma over a book or range of books. 
>>>> (Assuming you represented all of those as occurrences or even 
>>>> associations with explicit roles if you liked.
>>>>
>>>> Yes, I have been thinking about topic maps and biblical texts a lot. ;-)
>>>>
>>>> Hope you are having a great day!
>>>>
>>>> Patrick
>>>>
>>>>         
>>> _______________________________________________
>>> osis-users mailing list
>>> osis-users at crosswire.org
>>> http://www.crosswire.org/mailman/listinfo/osis-users
>>>
>>>       
>> -- 
>> Patrick Durusau
>> patrick at durusau.net
>> Chair, V1 - US TAG to JTC 1/SC 34
>> Convener, JTC 1/SC 34/WG 3 (Topic Maps)
>> Editor, OpenDocument Format TC (OASIS), Project Editor ISO/IEC 26300
>> Co-Editor, ISO/IEC 13250-1, 13250-5 (Topic Maps) 
>>
>>
>> _______________________________________________
>> osis-users mailing list
>> osis-users at crosswire.org
>> http://www.crosswire.org/mailman/listinfo/osis-users
>>     
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> osis-users mailing list
>> osis-users at crosswire.org
>> http://www.crosswire.org/mailman/listinfo/osis-users
>>     

-- 
Patrick Durusau
patrick at durusau.net
Chair, V1 - US TAG to JTC 1/SC 34
Convener, JTC 1/SC 34/WG 3 (Topic Maps)
Editor, OpenDocument Format TC (OASIS), Project Editor ISO/IEC 26300
Co-Editor, ISO/IEC 13250-1, 13250-5 (Topic Maps)