[osis-core] OSIS_0105:19 Regexs

Todd Tillinghast osis-core@bibletechnologieswg.org
Mon, 8 Apr 2002 01:09:16 -0500


The trouble I have with this that it allows me to say reference=".....".
This should not be allowed.  There should always be a NON "." character
between the delimiting "." literals.  I believe the syntax needs to be
updated to reflect that.

I don't have the syntax for the regex tonight.

Todd

> -----Original Message-----
> From: owner-osis-core@bibletechnologieswg.org [mailto:owner-osis-
> core@bibletechnologieswg.org] On Behalf Of Patrick Durusau
> Sent: Saturday, April 06, 2002 12:56 PM
> To: osis-core@bibletechnologieswg.org
> Subject: Re: [osis-core] OSIS_0105:19 Regexs
> 
> Guys,
> 
> Just to be consistent,
> 
> The regex stuff from yesterday, nothing new:
> 
> 1. Regexs:
> 
> Generally see: http://www.w3.org/TR/xmlschema-2/#regexs
> 
> ReferenceType
> 
> Now reads: ([^.]+)((.[^.]+){0,})?
> 
> Note that "^" begins a negative character group.
> 
> Note that the "." character in XML Schema is the equivalent of:
[^\n\r]
> : any character except newline
> 
> So, [^.] means only newline (excludes all other characters)
> 
> Or more formally from the standard:
> 
> [Definition:]   A * negative character group* is a .positive character
> group. <http://www.w3.org/TR/xmlschema-2/#dt-poschargroup> preceded by
> the |^| character. For all .positive character group.
> <http://www.w3.org/TR/xmlschema-2/#dt-poschargroup> s /P /, ^/ P/ is a
> valid *negative character group*, and / C(^P)/ contains all XML
> characters that are /not/ in /C(P)/ .
> 
> *Negative Character Group*
> |[15]   | | negCharGroup| |   ::=   | |'^' posCharGroup
> <http://www.w3.org/TR/xmlschema-2/#nt-posCharGroup> |
> 
> 
> I assume the intent of the expression is:
> 
> 1. Any legal namestart character, followed by,
> 2. Any legal name character, followed by,
> 3. literal "." character, followed by
> 4. one or more groups of legal name characters separated by a literal
"."
> 
> If that is the case, I would suggest that we re-write ReferenceType to
> read:
> 
> ([\i]([\c])*\.((\c)*\.)?
> 
> Note that \i = any legal initial name character, \c = an y legal name
> character, \. = literal "." or full stop
> 
> Additionally, since we have compScriptureReferenceType (I treat that
> regex below) not sure what ReferenceType is getting us in terms of
> validation? Structure of the references? Perhaps, would welcome some
> discussion on this and WorkType (next).
> 
> (BTW, schema regexs always match from the beginning of the line so no
> need to anchor.)
> 
> WorkType:
> 
> Now reads: ([^.]+(.[^.]+)
> 
> Same problems as above with "^" and invoking of literal full stop.
> 
> Is the intent of this expression the same as ReferenceType?
> 
> In other words to:
> 
> 1. Any legal namestart character, followed by,
> 2. Any legal name character, followed by,
> 3. literal "." character, followed by
> 4. one or more groups of legal name characters separated by a literal
"."
> 
> if so, why would I want both of them? For that matter, the more I
think
> about it, I am not sure what function either one would serve, at least
> in light of our not declaring a set of references to other works.
> 
> Suggestion: Why not settle on an outside reference pointer that
> subclasses xs:string the way we have for enumerated values on
> attributes. You can at this point declare whatever other pointers you
> like, but prepend "x-" to them? That would allow us to later (probably
> by the Fall release of translator and publisher modules, to declare
> references like compScriptureReferenceType that provide validation of
at
> least part of the reference?
> 
> compScriptureReferenceType:
> 
> Now reads (in part) ((...All Book Names...))((.[^.]+){0,}))?
> 
> Same problems as above with "^" and invoking of literal full stop.
> 
> In other words to:
> 
> 1. Book Name, followed by
> 2. literal "." character, followed by
> 3. any digit or letter (one or more) (question, do we need letter for
> some Bible references?), followed by
> 4. literal "." character, followed by
> 5. any digit or letter (one or more) (question, do we need letter for
> some Bible references?), followed by (optional)
> 
> If that is the case, would the following work?
> 
> ((...All Book Names...))\.[A-Za-z0-9]*(\.[A-Za-z0-9]*)?
> 
> Note that this expression requires book name plus chapter, could
someone
> want to just refer to Matthew?
> 
> Proposal:
> 
> The Argument (don't you just love Milton!):
> 
> For elements themselves, we want to allow them to have IDs to which
> other things can point, either by IDREF (milestones) or by linking
from
> within or from without. This is the "who am i" function of an
> identifier. Restricted by the XML Name requirements, if it functions
as
> an ID.
> 
> Other thing we want (I think) is the ability of notes, milestones, and
> other objects to refer to other elements (usually a containment like
> relationship) that they refer to or contain. This is a "i start at"
and
> "i end at" type function. Obviously must use the IDs found on other
> elements.
> 
> We can partially validate scripture references since we are declaring
a
> known set of names and format for the references to the materials
> referenced by those names.
> 
> The Suggestion:
> 
> Separate the notion of Bible references from other references more
> generally. For non-Bible references, simply defer by declaring
non-Bible
> references to be "x-" and to be treated at some later point with a
> validation mechanism like we have for Bible references.
> 
> For verse milestones, restrict the IDs to the
compScriptureReferenceType
> so that we get validation for the "who am i" function here.
> 
> (Agreeing with Troy here that StartVerse and StartVerse as attribute
is
> confusing. Just use ID, datetype ID.)
> 
> Books, divs, paragraphs, IDs can use compScriptureReferenceType but
not
> required.
> 
> Books, divs, paragraphs, notes, etc.,  the "where I point" function
(not
> "who am i") should be IDREF and have the names like: startNote =
> "John.1.1", startDiv = "Gen.1.1" , endNote = "John.1.2", etc. Note
that
> making these IDREFs makes us certain that the IDs appear in the work
> (XML validation process) and enforces as well the use of
> compScriptureReferenceType in the encoding.
> 
> Patrick
> 
> --
> Patrick Durusau
> Director of Research and Development
> Society of Biblical Literature
> pdurusau@emory.edu
> 
>