[osis-core] 4 issues

Chris Little osis-core@bibletechnologieswg.org
Sat, 31 Jan 2004 11:10:41 -0600


Troy A. Griffitts wrote:

>     o    discuss best practices for saving VALUES.  My case:  I have a 
> Greek lexicon with OCCURANCE information.  I want to mark the occurance 
> text so I can optionally show or hide it, but would also like to save 
> the OCCURANCE DATA in an attribute.  Example:
> 
> logos - word
> 
>     <seg type="x-occurance:157">This word occurs one hundred and fifty 
> seven times in the New Testament</seg>
> 
> 
>     the above example is my BEST PRACTICE idea.  But would like to know 
> the thoughts of placing the 157 VALUE where I have.

How about <seg type="x-occurenceCount" n="157">This word...</seg>.


>     o    I was assured <w> ATTRIBUTES would NOT be forced to be an 
> osisID. This was the purpose of osisGenType.  But in forcing the prefix 
> (to which I DID WILLINGLY concede), we seem to have added additional 
> restrictions which make my documents invalid.  Here's what we did A LONG 
> TIME AGO:
> 
> <w
>     lemma="x-Strongs:1234|x-Strongs:2345"
>     morph="x-Robinsons:V-AAI1P|x-Robinsons:N-ASM"
>  >
> eternity
> </w>
> 
> 
> from the discussed change, I think I need to:
> 
> <w
>     lemma="strongs:1234 strongs:2345"
>     morph="robinsons:V-AAI1P robinsons:N-ASM"
>  >
> eternity
> </w>
> 
> But this is not valid.
> 
> Problem 1:  WE AGREE NOT TO LIMIT THE TEXT BEYOND NOT INCLUDING A SPACE. 
>  Which is what my last example shows, and is still invalid (I think 
> because of the '-').  I think this is just an oversight.
> 
> Problem 2:  I really liked the '|' better than the space.  I remember 
> discussing this with Patrick and I think we decided that we knew of 
> codes that included spaces.
> 
> MY ARGUMENT:
> 
> ' ' is a language script character of many languages.
> '|' is a NOT.  It is a computer symbol used expressly for delimeting 
> purposes.
> 
> 
> I can:
> 
> PREFER: morph="robinsons:V-AAI1P|robinsons:N-ASM"
> LIVE WITH: morph="robinsons:V-AAI1P robinsons:N-ASM"
> REALLY DON'T WANT: to transform lemma/morph values to osisID 
> restrictions with an escape character
> CANNOT LIVE WITH: forcing transformation of the lemma/morph values with 
> no common escape character

In light of our motion towards supporting PSIs and info:, perhaps we 
should adopt something like the URI format for our regex, allowing %HEX 
escapes and spaces to divide elements.  That provides alphanumerics plus 
"-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")". 
(http://www.rfc-editor.org/rfc/rfc2396.txt for details.)

HOWEVER, that said... it is still MY opinion that we should make these 
valid osisIDs (Troy's "REALLY DON'T WANT" case).  This gives us a place 
to look stuff up.  Failing this, we don't have a way to look these 
values up in other documents and find out what they mean (unless we 
define a mapping mechanism such as "reserved" -> "_").

--Chris