Spaces in IDs: was Re: [osis-core] Harry on osisID's

Patrick Durusau osis-core@bibletechnologieswg.org
Tue, 02 Jul 2002 07:04:34 -0400


Troy,

I think there are two separate questions in your post:

1. Should identifiers be allowed to have spaces?

2. How (can?) a reference system be declared in the OSIS schema for a 
particular work?

Troy A. Griffitts wrote:

> Does the current regex for osisID allow spaces in the segments of the 
> identifier?  If there is no functional reason for dissallowing spaces, 
> I would like them for exactly the things Harry has mentioned below.

The current regex (which follows the rule for XML IDs) does not allow 
spaces. Since we are no longer using datatyping to say this is xs:ID, I 
suppose in theory one could have space characters but wonder if that 
would create problems for both data entry as well as parsing (Steve had 
some comments on the later that I don't recall clearly enough to 
relate). On the topic of data entry, however, I can see that allowing 
space characters could lead to errors that would be hard to find by 
visual inspection. (Remind me to regale all of you with the story of 
Panorama Pro and the added white space in the SGML declaration at some 
point.)

If we follow the XML ID (which relies on the XML Name production) rule, 
then spaces are not allowed. I see problems with data entry/correction 
but I hope Steve will enlighten us with any other problems this will cause.

>
>
>> What I really meant to ask is this: is there a way to write some XML
>> that declares that Augustine's Confessions will have osisIDs of the
>> following sort:
>>
>> Books I .. XIII
>> Each book has chapters i, ii, iii, iv, ...
>> Each chapter has sections 1, 2, ...
>>
>> In addition there is a Prologue wiht sections identified as 
>> Prologue.i, Prologue.ii, etc
>
>
What I hear Harry asking here is: "How do I declare what a valid osisID 
is for this document?" as opposed to simply using syntactically valid 
osisIDs.

In theory at any rate that should be handled in a <refsDecl> sort of 
element in the header. Just creating syntax off the top of my head, I 
would imagine something along the following lines:

Note that osisGeneralRef does NOT exist but would be a renaming of the 
current osisRef simpleType.

<xs:simpleType name="osisRef">
   <xs:annotation>
      <xs:documentation>
            <p>Book IDs are constructed as Book. + number</p>
            <p>Chapter IDs are constructed as Book. + number + "." + 
Chapter. + number</p>
            <p>Section IDs are constructed as Book. + number + "." + 
Chapter. + number "." + Section. + number</p>
            <p>The legitimate ID values are written as "or" patterns 
within a single pattern expression.</p>
       </xs:documentation>
    </xs:annotation>
<xs:restriction base="osisGeneralRef">
    <xs:pattern 
value="Book\.[0-9]{2}|Book\.[0-9]{2}\.Chapter\.[0-9]{2}|Book\.[0-9]{2}\.Chapter\.[0-9]{2}\.Section\.[0-9]{2}"/>
    </xs:restriction>
</xs:simpleType>

A couple of problems:

Note that W3C schemas don't (or at least I can't find it) recognize 
ranges in Roman numerals. (I take that as a display issue but others 
will probably differ.)

The pattern route does not allow me to validate that book IDs follow 
just the book ID pattern, just that it meets one of these patterns. 
Short of having different IDs and associated patterns for each ID, I 
don't see a way around this. If you do, please advise!

I suspect that the validation of a particular pattern for an ID, at 
least in the cases where we want the ID to vary based upon some other 
value (like quality of being a book, chapter, section), is beyond the 
scope of XML Schema, unless we declare unique IDs for each of those 
items. Certainly possible and given the rather flat view of documents in 
W3C schema probably anticipated. Problem is that we are trying to be 
more general for a class of documents.

Comments, suggestions?

Patrick

-- 
Patrick Durusau
Director of Research and Development
Society of Biblical Literature
pdurusau@emory.edu