[osis-core] annotateRef: Question on whitespace

Mon, 15 Sep 2003 12:52:37 -0600

See below.

Todd
> 
> Greetings!
> 
> Note the whitespace issue that Todd raised recently has NOT been
> resolved. Need comments, suggestions, etc.
> 
>  From the minutes:
> 
> Commentaries: special issues
> 
> Need mechanism to indicate the text that is being commented upon.
> 
> Decision: new attribute, annotateRef, global, use Ref regex, refers to
> work in
> header (annotateRef a list of osisRefType (which will be global),
space
> delimited by their nature)
> 
> Example:
> 
> <p annotateRef="bible.kjv:1Tim.1.1-1Tim.1.5"
> annotateType="commentary"><catchWord osisRef="1Tim.1.1@s[Paul an
> apostle]>Paul an apostle</catchWord> - Familiarity is to be set aside
> where
> the things of God are concerned.  According to the commandment of God
> - The authoritative appointment of God the Father. <catchWord
> osisRef="1Tim.1.1@s[Our Saviour]">Our Savior</catchWord> - So
> styled in many other places likewise, as being the grand orderer of
> the whole scheme of our salvation.  And Christ our hope - That is, the
> author, object, and ground, of all our hope.</p>
> 
> <snip>discussion of fix of catchWord moved to separate post</snip>
> 
> Decision: In user's manual, deprecate annotateWork.

Does this mean leave it in the OSIS 2.0 schema and schedule its outright
removal in the next release (OSIS 2.5 or OSIS 3.0)?

> 
> Todd notes in a subsequent post:
> 
> > annotateRef="Esth.4.14@s[It could] John.3.16@s:[gave his]
Gen.1.1@s:[and
> > the earth]"
> >
> > Would yield the following whitespace separated tokens
> > Esth.4.14@s[It
> > could]
> > John.3.16@s:[gave
> > his]
> > Gen.1.1@s:[and
> > the
> > earth]
> >
> > rather than what is expected as follows:
> > Esth.4.14@s[It could]
> > John.3.16@s:[gave his]
> > Gen.1.1@s:[and the earth]
> 
> I responded:
> 
> > Are you saying this is a problem with XML Schema regexes or with the
> regexes you are using in your application?
> >
> > Seems to me, without checking, in the middle of something at the
moment,
> that a regex should match
> >
> > [chars + whitespace]
> >
> > differently from
> >
> > [chars + whitespace] [chars + whitespace] [chars + whitespace]
> >
> > Note that I am not matching whitespace but each entire expression.
> >
> > Requires better regex handling than simply splitting on whitespace.
> 
> Harry says:
> 
> > It seems to me that the strings are well enough defined,
> > but processing them with standard tools may be harder. For
> > example, if XSLT had a function contains-token, you couldn't
> > use such a thing if some of the tokens contain whitespace.
> 
> Todd's most recent post (and the last traffic on this issue):
> 
> > The issue is not what software will do or what we say the rules are
but
> > the fact that with XML Schema a list is defined to be whitespace
> > separated.
> >
> > It is possible to express in XML Schema that a simple type is a list
of
> > other simple types that allow whitespace.  But a list is defined as
> > being whitespace separated.  So in practice you must not allow
> > whitespace in a simple type you use to make a simple type that is a
> > list.
> >
> > My suggestion is to not allow whitespace in the string portion of
the
> > @s: grain structure.
> 
> Todd: Is your proposal to non allow whitespace in the string portion
of
> the @s: grain structure only for annotateRef?
> 
> Hmmm, I think having whitespace in the string portion of the @s: grain
> structure is fairly important in other uses of the osisRef regex.
> 
I agree that it is useful to allow space.

> General feeling on whether this will be confusing to allow whitespace
> sometimes but not others?
> 
> Could do a separate regex for annotateRef that does not allow the
> whitespace.
> 

It would be better to be consistent.

> Suggestions, comments?

It is possible to require the use of whitespace entities rather than the
characters.  Does anyone know if a whitespace entity is considered
equivalent to a whitespace character from an XML perspective?

> 
<snip>