[osis-core] OSIS work regex

Harry Plantinga osis-core@bibletechnologieswg.org
Wed, 14 Aug 2002 16:48:01 -0400


I was speaking hypothetically -- if we are going to try to
conform to XML name character usage, that is what we would have
to do.  

But, we've already decided not to conform -- we're allowing a number
to start out osisIDs. So, I suggest we allow letters, digits, and _
as start characters.

Also, we seem to be ignoring the use of ideographs and accented 
characters in names. It's OK with me, but I want to make sure it's
intentional.

-Harry

> -----Original Message-----
> From: owner-osis-core@bibletechnologieswg.org 
> [mailto:owner-osis-core@bibletechnologieswg.org] On Behalf Of 
> Todd Tillinghast
> Sent: Wednesday, August 14, 2002 4:35 PM
> To: osis-core@bibletechnologieswg.org
> Subject: RE: [osis-core] OSIS work regex
> 
> 
> The statement below does not make since to me.  It seems you 
> are saying two conflicting things.  In any case, it seems 
> that you are saying that we should conform to the XML standard.  
> 
> I guess what I am suggesting is that we have references that 
> can be XML IDs.  I am not sure what all of the precluded and 
> allowed characters are.  I know that Patrick was much better 
> verse at this when we talked several months ago on this very topic.
> 
> The trouble with this whole line of discussion is that ":", 
> "[", and "]" are not allowed in XML IDs!
> 
> Also the test I did with an "_" leading was ok, it was the 
> leading number that was the problem we had before.  
> 
> SORRY FOR THE BOGUS DETOUR RELATED TO "_"!
> 
> The issue still remains related to OSIS references and 
> identifiers as XML IDs.  I think that is why I was using ".." 
> rather than ":" long ago. If we trade the ":" for ".." and do 
> away with the "[" and "]" then we would be back with a valid 
> XML ID.  (Of course ALLOW "_" and preclude numeral as the 
> leading character.)
> 
> Todd
> 
> > My "XML in a Nutshell" reference book says that XML name start 
> > characters are letters, ideographs, and the underscore, _. 
> If we want 
> > to conform to XML usage, we should allow ideographs,
> underscore,
> > but no _ or digits in osisIDs, I guess.
> > 
> > -Harry
> > 
> > > -----Original Message-----
> > > From: owner-osis-core@bibletechnologieswg.org
> > > [mailto:owner-osis-core@bibletechnologieswg.org] On 
> Behalf Of Todd 
> > > Tillinghast
> > > Sent: Wednesday, August 14, 2002 3:34 PM
> > > To: osis-core@bibletechnologieswg.org
> > > Subject: RE: [osis-core] OSIS work regex
> > >
> > >
> > > I think I am clear now on the proposal.
> > >
> > > Although we don't intend to use our ids as XML IDs, by allowing a 
> > > leading "_" we preclude others from using the same 
> syntax/form and 
> > > set of identifiers in other implementations. This weakens our 
> > > standard.
> > >
> > > I hope that encoders other than those encoding OSIS 
> documents would 
> > > use identifiers that are of the same "currency" as our references 
> > > and identifiers.  By elimination the option for those 
> identifiers to 
> > > XML IDs we limit the possibility for wider adoption, 
> influence and 
> > > interoperability with OSIS document.
> > >
> > > Todd
> > >
> > > >
> > > > Todd,
> > > >
> > > > I don't think Harry meant "_" as an extra delimiter (in the
> > > same sense
> > > > as "." is a delimiter in our syntax but more as a name character
> in
> > > > writing customary citations of names. It is in a sense a
> > > delimiter but
> > > > as part of the name to be matched as a string and not a 
> delimiter.
> > > (Does
> > > > that make any sense at all? Perhaps Harry can state what he
> > > meant more
> > > > clearly. ;-)
> > > >
> > > > Patrick
> > > >
> > > > Todd Tillinghast wrote:
> > > >
> > > > >What extra value does the "_" give us?
> > > > >
> > > > >Are you proposing Bible_.TEV_ ?
> > > > >
> > > > >Or just that "_" would be an option as in 
> > > > >Bible.Todd_New_And_Different_Reference_System ?
> > > > >
> > > > >I can see "_" as an allowable character as long as it 
> is not the
> > > leading
> > > > >character but don't see any value in having it as an 
> additional 
> > > > >delimiter to ".".
> > > > >
> > > > >Todd
> > > > >
> > > > >>-----Original Message-----
> > > > >>From: owner-osis-core@bibletechnologieswg.org
> [mailto:owner-osis-
> > > > >>core@bibletechnologieswg.org] On Behalf Of Harry Plantinga
> > > > >>Sent: Wednesday, August 14, 2002 7:26 AM
> > > > >>To: osis-core@bibletechnologieswg.org
> > > > >>Subject: RE: [osis-core] OSIS work regex
> > > > >>
> > > > >>If schema RegExps behave as they do in Perl, the ? is
> > > superfluous.
> > > > >>Perhaps
> > > > >>
> > > > >>  [\L\N][\.\L\N]*
> > > > >>
> > > > >>The underscore character (_) is pretty commonly used in names
> and
> > > may
> > > > >>
> > > > >be
> > > > >
> > > > >>present in documents converted to OSIS. I can't see that
> > > it would do
> > > > >>
> > > > >any
> > > > >
> > > > >>harm. Could it be included?  Perhaps
> > > > >>
> > > > >> [\L\N_][\.\L\N_]*
> > > > >>
> > > > >>-Harry
> > > > >>
> > > > >>----------------------------------
> > > > >>For the work portion:
> > > > >>
> > > > >><xs:pattern value = "([\L\N\.]([\L\N\.]*)?)" />
> > > > >>
> > > > >>By which I am trying to say, any letter or number combination,
> > > > >>
> > > > >followed
> > > > >
> > > > >>by a period is complusory, followed by any number of optional 
> > > > >>letter/number combinations that also end in a period (periods,
> > > > >>
> > > > >hyphens,
> > > > >
> > > > >>etc., being excluded from the work name).
> > > > >>
> > > > >
> > > >
> > > > --
> > > > Patrick Durusau
> > > > Director of Research and Development
> > > > Society of Biblical Literature
> > > > pdurusau@emory.edu
> > > >
> > > >
> > >
> > >
> 
>