[osis-core] OSIS work regex

Patrick Durusau osis-core@bibletechnologieswg.org
Wed, 14 Aug 2002 19:22:32 -0400


Guys,

I am very near exhaustion and Steve will be taking over tonight to 
shepard the final hours of discussion.

My laptop died and I am having problems trying to hurry through 
installation of validation services on the Linux box. (Will have another 
go in the morning.)

Todd, we may have to fall back on you for not only good advice but 
validation as well! Sorry 'bout that!

On the leading numeral question, we addressed that in Rome and may not 
have communicated it very well but it was one of the reasons we moved 
away from osisIDs being XML IDs. The leading number constraint just did 
not get us anything but conformance with unnecessary pain. I don't 
honestly remember why it was the rule in ISO 8879 but XML requiring it 
was just backwards compatibility, no real justification for it.

Users are familiar with 1Cor, etc. and there is not gain from forcing a 
change.

Signing off now to get some rest. Back at it (hopefully with greater 
clarity) in the morning.

Patrick

Todd Tillinghast wrote:

>We had decided NOT to allow a leading numeral.  I believe that only in
>the recent incarnations of the schema, where we stated from scratch with
>the reg exp, have we dropped that no leading numeral requirement.
>
>The bigger question is will we keep the ":".  That will be much more
>painful to remove than disallowing a leading numeral.  
>
>Also I don't see a reason to not allow ideographs, etc..
>
>Todd
>
>>I was speaking hypothetically -- if we are going to try to
>>conform to XML name character usage, that is what we would have
>>to do.
>>
>>But, we've already decided not to conform -- we're allowing a number
>>to start out osisIDs. So, I suggest we allow letters, digits, and _
>>as start characters.
>>
>>Also, we seem to be ignoring the use of ideographs and accented
>>characters in names. It's OK with me, but I want to make sure it's
>>intentional.
>>
>>-Harry
>>
>>>-----Original Message-----
>>>From: owner-osis-core@bibletechnologieswg.org
>>>[mailto:owner-osis-core@bibletechnologieswg.org] On Behalf Of
>>>Todd Tillinghast
>>>Sent: Wednesday, August 14, 2002 4:35 PM
>>>To: osis-core@bibletechnologieswg.org
>>>Subject: RE: [osis-core] OSIS work regex
>>>
>>>
>>>The statement below does not make since to me.  It seems you
>>>are saying two conflicting things.  In any case, it seems
>>>that you are saying that we should conform to the XML standard.
>>>
>>>I guess what I am suggesting is that we have references that
>>>can be XML IDs.  I am not sure what all of the precluded and
>>>allowed characters are.  I know that Patrick was much better
>>>verse at this when we talked several months ago on this very topic.
>>>
>>>The trouble with this whole line of discussion is that ":",
>>>"[", and "]" are not allowed in XML IDs!
>>>
>>>Also the test I did with an "_" leading was ok, it was the
>>>leading number that was the problem we had before.
>>>
>>>SORRY FOR THE BOGUS DETOUR RELATED TO "_"!
>>>
>>>The issue still remains related to OSIS references and
>>>identifiers as XML IDs.  I think that is why I was using ".."
>>>rather than ":" long ago. If we trade the ":" for ".." and do
>>>away with the "[" and "]" then we would be back with a valid
>>>XML ID.  (Of course ALLOW "_" and preclude numeral as the
>>>leading character.)
>>>
>>>Todd
>>>
>>>>My "XML in a Nutshell" reference book says that XML name start
>>>>characters are letters, ideographs, and the underscore, _.
>>>>
>>>If we want
>>>
>>>>to conform to XML usage, we should allow ideographs,
>>>>
>>>underscore,
>>>
>>>>but no _ or digits in osisIDs, I guess.
>>>>
>>>>-Harry
>>>>
>>>>>-----Original Message-----
>>>>>From: owner-osis-core@bibletechnologieswg.org
>>>>>[mailto:owner-osis-core@bibletechnologieswg.org] On
>>>>>
>>>Behalf Of Todd
>>>
>>>>>Tillinghast
>>>>>Sent: Wednesday, August 14, 2002 3:34 PM
>>>>>To: osis-core@bibletechnologieswg.org
>>>>>Subject: RE: [osis-core] OSIS work regex
>>>>>
>>>>>
>>>>>I think I am clear now on the proposal.
>>>>>
>>>>>Although we don't intend to use our ids as XML IDs, by allowing
>>>>>
>a
>
>>>>>leading "_" we preclude others from using the same
>>>>>
>>>syntax/form and
>>>
>>>>>set of identifiers in other implementations. This weakens our
>>>>>standard.
>>>>>
>>>>>I hope that encoders other than those encoding OSIS
>>>>>
>>>documents would
>>>
>>>>>use identifiers that are of the same "currency" as our
>>>>>
>references
>
>>>>>and identifiers.  By elimination the option for those
>>>>>
>>>identifiers to
>>>
>>>>>XML IDs we limit the possibility for wider adoption,
>>>>>
>>>influence and
>>>
>>>>>interoperability with OSIS document.
>>>>>
>>>>>Todd
>>>>>
>>>>>>Todd,
>>>>>>
>>>>>>I don't think Harry meant "_" as an extra delimiter (in the
>>>>>>
>>>>>same sense
>>>>>
>>>>>>as "." is a delimiter in our syntax but more as a name
>>>>>>
>character
>
>>>in
>>>
>>>>>>writing customary citations of names. It is in a sense a
>>>>>>
>>>>>delimiter but
>>>>>
>>>>>>as part of the name to be matched as a string and not a
>>>>>>
>>>delimiter.
>>>
>>>>>(Does
>>>>>
>>>>>>that make any sense at all? Perhaps Harry can state what he
>>>>>>
>>>>>meant more
>>>>>
>>>>>>clearly. ;-)
>>>>>>
>>>>>>Patrick
>>>>>>
>>>>>>Todd Tillinghast wrote:
>>>>>>
>>>>>>>What extra value does the "_" give us?
>>>>>>>
>>>>>>>Are you proposing Bible_.TEV_ ?
>>>>>>>
>>>>>>>Or just that "_" would be an option as in
>>>>>>>Bible.Todd_New_And_Different_Reference_System ?
>>>>>>>
>>>>>>>I can see "_" as an allowable character as long as it
>>>>>>>
>>>is not the
>>>
>>>>>leading
>>>>>
>>>>>>>character but don't see any value in having it as an
>>>>>>>
>>>additional
>>>
>>>>>>>delimiter to ".".
>>>>>>>
>>>>>>>Todd
>>>>>>>
>>>>>>>>-----Original Message-----
>>>>>>>>From: owner-osis-core@bibletechnologieswg.org
>>>>>>>>
>>>[mailto:owner-osis-
>>>
>>>>>>>>core@bibletechnologieswg.org] On Behalf Of Harry Plantinga
>>>>>>>>Sent: Wednesday, August 14, 2002 7:26 AM
>>>>>>>>To: osis-core@bibletechnologieswg.org
>>>>>>>>Subject: RE: [osis-core] OSIS work regex
>>>>>>>>
>>>>>>>>If schema RegExps behave as they do in Perl, the ? is
>>>>>>>>
>>>>>superfluous.
>>>>>
>>>>>>>>Perhaps
>>>>>>>>
>>>>>>>> [\L\N][\.\L\N]*
>>>>>>>>
>>>>>>>>The underscore character (_) is pretty commonly used in
>>>>>>>>
>names
>
>>>and
>>>
>>>>>may
>>>>>
>>>>>>>be
>>>>>>>
>>>>>>>>present in documents converted to OSIS. I can't see that
>>>>>>>>
>>>>>it would do
>>>>>
>>>>>>>any
>>>>>>>
>>>>>>>>harm. Could it be included?  Perhaps
>>>>>>>>
>>>>>>>>[\L\N_][\.\L\N_]*
>>>>>>>>
>>>>>>>>-Harry
>>>>>>>>
>>>>>>>>----------------------------------
>>>>>>>>For the work portion:
>>>>>>>>
>>>>>>>><xs:pattern value = "([\L\N\.]([\L\N\.]*)?)" />
>>>>>>>>
>>>>>>>>By which I am trying to say, any letter or number
>>>>>>>>
>combination,
>
>>>>>>>followed
>>>>>>>
>>>>>>>>by a period is complusory, followed by any number of
>>>>>>>>
>optional
>
>>>>>>>>letter/number combinations that also end in a period
>>>>>>>>
>(periods,
>
>>>>>>>hyphens,
>>>>>>>
>>>>>>>>etc., being excluded from the work name).
>>>>>>>>
>>>>>>--
>>>>>>Patrick Durusau
>>>>>>Director of Research and Development
>>>>>>Society of Biblical Literature
>>>>>>pdurusau@emory.edu
>>>>>>
>>>>>>
>>>>>
>>>
>

-- 
Patrick Durusau
Director of Research and Development
Society of Biblical Literature
pdurusau@emory.edu