[osis-core] Getting elements

Todd Tillinghast osis-core@bibletechnologieswg.org
Fri, 16 Aug 2002 03:31:32 -0600


SUMMARY:
Using Harry's proposal we can use the following expression to
reliability find an element based on an identifier AND NO special
patterns would be needed to ensure that white space exists. 

An XPath expression would look something like the following with each
piece shown below REPEATED for each of the different characters allowed
as whitespace:
//*[@osisID="Z.1"] |   <!-- equals to -->
//*[substring(@osisID, 1,string-length("Z.1 "))="Z.1 "]| >!--begins
with-->
//*[substring(@osisID, (string-length(@osisID)-string-length("
Z.1")+1),string-length(" Z.1"))=" Z.1"] | <!--ends with-->
//*[contains(@osisID, " Z.1 ")] <!--in the middle or one of the first
three cases with extra white space before the first identifier and/or
after the last identifier -->

If N is the number of white space characters, there would be 
(1) + (N) + (N) + (N*N) sub expressions respectively, OR'ed with each
other for each element tested for each "lookup".  

Unfortunately, ALL of the sub expressions will have to be tried for ALL
of the elements with an osisID where there is NOT a match just to make
sure there is not a match.

I know of at least four white space characters. (space, tab, and at
least two kinds of line breaks)  With just those kinds of white space
there would be 25 sub expressions executed for each element.  To further
complicate the issue a VERY long and error prone expression would be
required where an XPath related to an osisID is present.


************************************************************************
************************************************************************

THE OTHER OPTION IS TO USE 
//*[contains(@osisID, "[Z.1]")]
and force the use of [ and ] around each identifier.

In either case you may not know if the reference system was defaulted or
not so you may be compelled to look for the default and "fully
qualified" form when looking for defaults.
 
************************************************************************
************************************************************************

I like the simple XPath and better search performance option over the no
brackets option!

************************************************************************
************************************************************************
> I think Todd is right that there can be problems with looking
> for "Z.1" with XPath in the example below. However, I think there
> IS an XPath solution, without using the [] syntax.  The XPath
> code required is a bit messy, but that might be preferred by some
> to a messy XML document.  Also, I suspect the need won't arise
> very frequently, since the problem doesn't arise if you only have
> one osisID in an osisID attribute, and it's less likely to arise
> if you don't do much mixing of different reference schemes.
> 
> So, here's how to search for Z.1 in the standard reference system
> using basic XPath.
> 
> If you have a list of osisIDs like that below, but space separated
> instead of using the [] notation, you can find Z.1 in the default
> reference system by looking for osisID attributes
>   - equal to Z.1, or
>   - starting with "Z.1 ", or
>   - containing " Z.1 ", or
>   - ending with " Z.1".
> 
> There are starts-with() and contains() string functions. There
> is no "ends-with()" function, but you can use substring() and
> string-length() to get the same behavior.
> 
> -Harry
> 
> > I think that the following solves ALL of the problems posed
> > below from the earlier post.
> >
> > <?xml version="1.0" encoding="UTF-8"?>
> > <x osisID="[T:Z.A] [Z.1]">
> > 	<y osisID="[T:Z.1.1] [Z.1.1]">Z.1.1</y>
> > 	<y osisID="[T:Z.1.2] [Z.1.2]">Z.1.2</y>
> > 	<y osisID="[T:Z.1.2] [T:Z.1.3] [T:Z.1.4] [Z.1.3]
> > [Z.1.4.A]">Z.1.3 and Z.1.4</y>
> > 	<y osisID="[T:Z.1.4] [T:Z.1.5] [Z.1.4.B] [Z.1.5]">Z.1.4
> > and Z.1.5</y>
> > </x>
> >


DETAILED CASES ASSUMING ONLY ONE REFERENCE SYSTEM (OR NO DEFAULT
REFERENCE SYSTEM)

With the following case
<div osisID="Z.1"> <!-- line A -->
	<verse osisID="Z.1.1"/> <!-- line B -->
	<verse osisID=" Z.1.2"/><!-- line C -->
	<verse osisID="Z.1.3 Z.1.4"/><!-- line D -->
</div>

equals to "Z.1" gives us line A
starting with "Z.1 " gives us no lines
containing " Z.1 " give us no lines
enging with " Z.1" gives us no lines

With this slight adjustment
<div osisID="Z.1 "> <!-- line A -->
	<verse osisID="Z.1.1"/> <!-- line B -->
	<verse osisID=" Z.1.2"/><!-- line C -->
	<verse osisID="Z.1.3 Z.1.4"/><!-- line D -->
</div>
equals to "Z.1" gives us no lines
starting with "Z.1 " gives us line A
containing " Z.1 " give us no lines
enging with " Z.1" gives us no lines

With this slight adjustment
<div osisID=" Z.1"> <!-- line A -->
	<verse osisID="Z.1.1"/> <!-- line B -->
	<verse osisID=" Z.1.2"/><!-- line C -->
	<verse osisID="Z.1.3 Z.1.4"/><!-- line D -->
</div>
equals to "Z.1" gives us no lines
starting with "Z.1 " gives us no lines 
containing " Z.1 " give us no lines
enging with " Z.1" gives us line A

With this slight adjustment
<div osisID=" Z.1 "> <!-- line A -->
	<verse osisID="Z.1.1"/> <!-- line B -->
	<verse osisID=" Z.1.2"/><!-- line C -->
	<verse osisID="Z.1.3 Z.1.4"/><!-- line D -->
</div>
equals to "Z.1" gives us no lines
starting with "Z.1 " gives us no lines 
containing " Z.1 " give us line A
enging with " Z.1" gives us no lines