[osis-core] annotateRef: Question on whitespace

Patrick Durusau osis-core@bibletechnologieswg.org
Mon, 15 Sep 2003 10:27:45 -0400


Greetings!

Note the whitespace issue that Todd raised recently has NOT been 
resolved. Need comments, suggestions, etc.

 From the minutes:

Commentaries: special issues

Need mechanism to indicate the text that is being commented upon.

Decision: new attribute, annotateRef, global, use Ref regex, refers to 
work in
header (annotateRef a list of osisRefType (which will be global), space
delimited by their nature)

Example:

<p annotateRef="bible.kjv:1Tim.1.1-1Tim.1.5" 
annotateType="commentary"><catchWord osisRef="1Tim.1.1@s[Paul an
apostle]>Paul an apostle</catchWord> - Familiarity is to be set aside where
the things of God are concerned.  According to the commandment of God
- The authoritative appointment of God the Father. <catchWord
osisRef="1Tim.1.1@s[Our Saviour]">Our Savior</catchWord> - So
styled in many other places likewise, as being the grand orderer of
the whole scheme of our salvation.  And Christ our hope - That is, the
author, object, and ground, of all our hope.</p>

<snip>discussion of fix of catchWord moved to separate post</snip>

Decision: In user's manual, deprecate annotateWork.

Todd notes in a subsequent post:

> annotateRef="Esth.4.14@s[It could] John.3.16@s:[gave his] Gen.1.1@s:[and
> the earth]"
> 
> Would yield the following whitespace separated tokens 
> Esth.4.14@s[It
> could]
> John.3.16@s:[gave
> his]
> Gen.1.1@s:[and
> the
> earth]
> 
> rather than what is expected as follows:
> Esth.4.14@s[It could] 
> John.3.16@s:[gave his] 
> Gen.1.1@s:[and the earth]

I responded:

> Are you saying this is a problem with XML Schema regexes or with the regexes you are using in your application?
> 
> Seems to me, without checking, in the middle of something at the moment, that a regex should match
> 
> [chars + whitespace]
> 
> differently from
> 
> [chars + whitespace] [chars + whitespace] [chars + whitespace]
> 
> Note that I am not matching whitespace but each entire expression.
> 
> Requires better regex handling than simply splitting on whitespace. 

Harry says:

> It seems to me that the strings are well enough defined,
> but processing them with standard tools may be harder. For
> example, if XSLT had a function contains-token, you couldn't 
> use such a thing if some of the tokens contain whitespace.

Todd's most recent post (and the last traffic on this issue):

> The issue is not what software will do or what we say the rules are but
> the fact that with XML Schema a list is defined to be whitespace
> separated.  
> 
> It is possible to express in XML Schema that a simple type is a list of
> other simple types that allow whitespace.  But a list is defined as
> being whitespace separated.  So in practice you must not allow
> whitespace in a simple type you use to make a simple type that is a
> list.  
> 
> My suggestion is to not allow whitespace in the string portion of the
> @s: grain structure.

Todd: Is your proposal to non allow whitespace in the string portion of 
the @s: grain structure only for annotateRef?

Hmmm, I think having whitespace in the string portion of the @s: grain 
structure is fairly important in other uses of the osisRef regex.

General feeling on whether this will be confusing to allow whitespace 
sometimes but not others?

Could do a separate regex for annotateRef that does not allow the 
whitespace.

Suggestions, comments?

Hope everyone is having a great day!

Patrick

-- 
Patrick Durusau
Director of Research and Development
Society of Biblical Literature
Patrick.Durusau@sbl-site.org
Chair, V1 - Text Processing: Office and Publishing Systems Interface
Co-Editor, ISO 13250, Topic Maps -- Reference Model

Topic Maps: Human, not artificial, intelligence at work!