[osis-core] delimeters for identifiers in an osisID

Todd Tillinghast osis-core@bibletechnologieswg.org
Fri, 16 Aug 2002 01:33:58 -0600


The heart of the problem is that we are not provided a containsToken
function where we can say */*[containsToken(@osisID="Matt.1.1")] and the
function would return ONLY elements with an osisID attribute that
contain the white space delimited TOKEN "Matt.1.1", that is ((the first
token OR preceded by a space) AND (the last token OR followed by a
space)) that EXACTLY matchs "Matt.1.1".  

Since we are only given a contains function we are left with a problem.
Harry's suggestion that we add extra spaces and/or rely on the ones that
are already present rather [ and ] is an intriguing one!  

I have done a little research related to using spaces rather than [ and
] to bound individual identifiers in an osisID.

I have created three test cases with related schema the only difference
between the schema being the pattern statement.  (To simplify the
pattern I have only allowed the numerals 1-9 in the normal structure.)

Test case 1 demonstrates that an "extra" space can be placed before the
first element and after the last element and still be valid when the
pattern of the "primitive" identifiers does not make any mention of such
spaces.  However, it also demonstrates that an encoder can just as
easily omit the NECESSARY space before the first identifier and after
the last identifier with not error.
<xs:pattern
value="((([1-9]){1,}((\.([1-9]){1,}){0,})?):)?([1-9]*((\.([1-9]){1,}){0,
})?)"/>

For this test case consider the results of the following:
//*[contains(@osisID, "1")]
//*[contains(@osisID, " 1 ")]
//*[contains(@osisID, " 1.1 ")]
//*[contains(@osisID, "1.1")]
//*[contains(@osisID, "1.2")]
//*[contains(@osisID, "1.3")]
//*[contains(@osisID, "1.4")]
//*[contains(@osisID, " 1.5 ")]
//*[contains(@osisID, " 1.6 ")]
//*[contains(@osisID, " 1.7 ")]
//*[contains(@osisID, "4.5")]
//*[contains(@osisID, " 4.5 ")]

Test case 2 demonstrates that if a space is added to the pattern for
"primitive" identifiers that NO MATTER WHAT you do when there is more
than one primitive identifier in an osisID, the whole pattern is not
valid.  This is because the valuator first parses the white space
between the individual tokens recognizing the three spaces between
individual identifiers as a SINGLE white space and then finding that the
individual identifier does not have a leading space and is thus invalid.
(Interestingly, I did not get the same problem when there was only a
single identifier.  I consider this a bug in XMLSpy.  In any case the
XMLSpy behavior is correct for the multiple case.)  
<xs:pattern
value="[\s]((([1-9]){1,}((\.([1-9]){1,}){0,})?):)?([1-9]*((\.([1-9]){1,}
){0,})?)[\s]"/>
(I know that \s is any white space but for the test it makes no
difference because I only used space in the test document file.)

Test case 1 and 2 taken together demonstrate that it is NOT POSSIBLE to
create a schema that enforces the requirement that individual
identifiers be preceded by at least one space and followed by at least
one space.  Of course we could not enforce the requirement, but I think
that would make the requirement meaningless because if the presence of
the necessary spaces cannot be guaranteed then ALL reliable uses of OSIS
document will have to assume that the spaces were not correctly placed.

Test case 3 demonstrates the use of [ and ].
<xs:pattern
value="[\[]((([1-9]){1,}((\.([1-9]){1,}){0,})?):)?([1-9]*((\.([1-9]){1,}
){0,})?)[\]]"/>
Consider the following test expressions:
//*[contains(@osisID, "[1]")]
//*[contains(@osisID, "[1.1]")]
//*[contains(@osisID, "[1.2]")]
//*[contains(@osisID, "[1.3]")]
//*[contains(@osisID, "[1.4]")]
//*[contains(@osisID, "[1.5]")]
//*[contains(@osisID, "[1.6]")]
//*[contains(@osisID, "[1.7]")]
//*[contains(@osisID, "[4.5]")]

For the necessity of a delimiter of some kind see reply to Harry's post.

Todd