[sword-devel] Setting canonical="true" ?
Troy A. Griffitts
scribe at crosswire.org
Thu Mar 1 13:03:18 MST 2012
On 03/01/2012 03:23 PM, DM Smith wrote:
>> In most cases use of the canonical attribute is straightforward, and
>> the default values will almost always produce the intended result.
>> However, there will arise truly difficult cases: for example, one may
>> be encoding an ancient text with annotations of its own. In that case
>> those notes would be canonical, while any added by the current editor
>> would not be. In such cases, the practice chosen and its rationale
>> should be described in the work's documentation.
> So, I take this that if I were creating an accurate representation of
> the 1611 KJV from scans, everything in that "ancient" text would be
> canonical, including introductions, notes, titles, cross-references, and
> so forth.
If you were producing a critical edition of the KJV title, say:
KJV Through the Years
then yes, you would be correct, all KJV material with notes would be
canonical (from a purist point of view), and your modern notes about the
'canonical notes' would not be canonical :)
If you were desiring to digitize the 1611 KJV, just because the work is
old, doesn't mean everything in the work is 'canonical'. What defines
'old'? Even a purist must decide on the base work. If the base work is
the KJV and we're adding modern notes, then you'd be correct. If the
base work is the New Testament, and we're marking up the KJV notes,
merely encoding an old work, then I would disagree.
> If it is not that way and it is to reflect the underlying publication
> then I think there is a problem with the usage of the <transChange
> type="added"> element . In this case these should be marked
> canonical="false" as they are not part of the "base" text.
A transChange relates to translation methodology against an original
text (not what we're calling a 'base text' above).
> I took out the example about notes in a Bible translation. Its intent is
> that canonical is to distinguish what was in the text the translation
> was based from what was not in that base.
> The confusion is that it is not at all clear what current editor means.
> There are many who take the KJV, notes and all, make changes to it, say
> modernizing the spelling, translate it into another language, .... So,
> since their base is not the Hebrew and Greek, but a particular KJV text,
> then according to this definition, the imported notes are now canonical.
But not for us. Our base text is always the study of the Bible, not the
study of a study Bible. does that make sense?
We would never give our users results from ancient notes when they asked
for results only from canonical text.
Now, certainly-- especially where I work-- I can conceive of users who
might mean 'include ancient notes' when they say they only want
canonical material. But these are not the 99.999% of our users.
> But as a module encoder, I'd do it the way the OSIS defaults are
:) good. You are a purist, but you are also practical DM! That's one of
the many things I like about you.
> , with one exception:
>> The <div> element.
OK, I think what you say below, in summary, is:
trojan milestones don't allow schema validators to preserve xml inheritance.
yes. They don't preserve xml hierarchy or enforce logic children
restricted sets or most anything else schema defines.
But that doesn't mean that the specification is wrong because the schema
can't be represented purely in schema.
The OSIS documentation speaks about the use of trojan milestones and the
deficiencies that go along with them, but also the overlapping hierarchy
problem they attempt to solve.
Wanna thumb wrestle for it?
>> The canonical attribute is available on all elements.
> The following elements without canonical:
>> It has a ‘default’ value so it does not have to be entered by the
>> encoder if the default value is acceptable.
> A bit misleading. Only a few (8) element actually have a default. Note,
> chapter is not there. And having it on osisText is silly (see below).
> Default: true <xs:attribute name="canonical" type="xs:boolean"
> use="optional" default="true"/>
> Default: false <xs:attribute name="canonical" type="xs:boolean"
> use="optional" default="false"/>
>> The value of this attribute is "inherited," that is once it is set,
>> any subelement of that element inherits the same setting.
> Default: inherited <xs:attribute name="canonical" type="xs:boolean"
> The rest of the elements.
> The examples on the same page are confusing, as they don't fit with the
> XML inheritance mechanism. They have an explicit value on a parent
> element forcing the inclusion of the attribute on an element with that
> as a default. Having a default value means that that element never
> inherits the value.
> With inheritance, it should be possible at any point in the document,
> using an XML parser to ask what the value of canonical is.
> However, the attribute "canonical" is not actually inheritable,
> according to:
>> 188.8.131.52 Inherited Attributes
>> *Schema Information Set Contribution: Inherited Attributes*
>> [Definition:] An attribute information item A, whether explicitly
>> specified in the input information set or defaulted as described in
>> Attribute Default Value (§184.108.40.206)
>> <http://www.w3.org/TR/2009/WD-xmlschema11-1-20090130/#sic-attrDefault>, is
>> *potentially inherited* by an element information item E if and only
>> if *all* of the following are true:
>> 1 A is among the [attributes]
>> <http://www.w3.org/TR/xml-infoset/#infoitem.element> of one of E's
>> 2 A and E have the same [validation context].
>> 3 *One* of the following is true:
>> 3.1 A is ·attributed to·
>> <http://www.w3.org/TR/2009/WD-xmlschema11-1-20090130/#key-att-to> an
>> Attribute Use
>> <http://www.w3.org/TR/2009/WD-xmlschema11-1-20090130/#au> whose
>> = */true/*.
>> 3.2 A is /not/ ·attributed to·
>> <http://www.w3.org/TR/2009/WD-xmlschema11-1-20090130/#key-att-to> any
>> Attribute Use
>> <http://www.w3.org/TR/2009/WD-xmlschema11-1-20090130/#au> but A has a
>> ·governing attribute declaration·
>> <http://www.w3.org/TR/2009/WD-xmlschema11-1-20090130/#key-governing-ad> whose
>> = */true/*.
>> If and only if an element information item P is not ·skipped·
>> (that is, it is either ·strictly·
>> <http://www.w3.org/TR/2009/WD-xmlschema11-1-20090130/#key-sva> or
>> ·laxly· <http://www.w3.org/TR/2009/WD-xmlschema11-1-20090130/#key-lva>
>> assessed), in the ·post-schema-validation infoset·
>> <http://www.w3.org/TR/2009/WD-xmlschema11-1-20090130/#key-psvi> each
>> of P's element information item [children]
>> <http://www.w3.org/TR/xml-infoset/#infoitem.element> E which is not
>> ·attributed to·
>> <http://www.w3.org/TR/2009/WD-xmlschema11-1-20090130/#key-att-to> a
>> */skip/* Wildcard
>> <http://www.w3.org/TR/2009/WD-xmlschema11-1-20090130/#w>, has a property:
>> PSVI Contributions for element information items
>> [inherited attributes]
>> A list of attribute information items. An attribute information
>> item A is included if and only if *all* of the following are true:
>> 1 A is ·potentially inherited·
>> by E.
>> 2 Let O be A's [owner element]
>> <http://www.w3.org/TR/xml-infoset/#infoitem.attribute>. A does not
>> have the same expanded name
>> as another attribute which is also ·potentially inherited·
>> by E and whose [owner element]
>> <http://www.w3.org/TR/xml-infoset/#infoitem.attribute> is a
>> descendant of O.
> I presume this is a bug in the OSIS Schema.
> From a practical perspective in encoding a whole document, there are
> two scenarios to consider:
> 1) Milestoning structural elements. (BCV: Book, Chapter and Verse encoding)
> 2) Milestoning verses. (BSP: Book, Section and Paragraph encoding,
> First the text of the work has to be within (using my notation)
> (Note: osis2mod expects only one osisText)
> The significant part is the <div>, it cannot be a milestoned form and
> pass validation. The default value of canonical on this element is
> "false". Therefore, all descendants not contained in elements whose
> default is "true" or that explicitly declare canonical="true" inherit
> the value "false".
> Because, divs can be nested, each div resets the state of canonical,
> either to its default of false or to the declared canonical value.
> The fact that <osisText> defaults canonical to true is meaningless. All
> of its children have a default of false. So practically speaking, the
> only element with canonical="true" is a verse and its contents that
> don't have
> The other implication of using the non-milestoned form of <div> is that
> by OSIS semantic, all other <div>s have to be container elements not
> milestoned. (I can quote the OSIS 2.1.1 manual, if needed). Personally,
> I think this is too broad a semantic for <div> and should take into
> consideration the type attribute.
> In case 1), where the document uses the container form for Books (<div
> type="book">), <chapter> and <verse> and uses as needed or semantically
> required, the milestoned form of other container, the intention of the
> OSIS manual is preserved. The defaults work as intended.
> However, in case 2), where the verse is milestoned the text and other
> elements of the verse is not a child of the verse element but rather the
> container that it is in, typically a paragraph or a div. By the rules of
> XML (if inheritance were properly specified), the parent container would
> need to explicitly give or inherit canonical="true".
> With regard to SWORD and JSword, they always work on a fragment of the
> whole document and might not have the parent on which to determine
> whether canonical is true or false. Practically, they assume true.
> If the OSIS schema had the default of canonical on <div> to be true or
> if it were optional (making the default on osisText meaningful), there
> would be no issue.
> This is to say, I think the OSIS Schema has it wrong for a <div>. Until
> or unless it is changed, one nearly always has to have canonical="true"
> on a div.
> In Him,
> On Feb 29, 2012, at 2:46 PM, Troy A. Griffitts wrote:
>> Sorry to only jump in on problems, but...
>> I don't believe the preceding explanation of 'canonical' is correct.
>> OSIS defaults many attributes to canonical, including <verse> and
>> I believe we defined canonical as text belonging to the base work.
>> For us, this is mostly Bibles.
>> For a study Bible, it would exclude all commentary and notes, and only
>> include Biblical text.
>> Basically, canonical for the Open Scripture Information Standard
>> refers to Biblical text, and you'd be hardpressed to use it for
>> anything else practically, though I could see a purist trying to make
>> an argument for it.
>> For example, Josephus would only include the text of Josephus.
>> And while technically true, the practical uses for 'canonical' are
>> things like:
>> Showing Psalm titles even when the user has asked not to show 'titles'
>> Searching typically is only over 'canonical' text
>> -- but we usually work the opposite way: we take out notes, xrefs,
>> headings, and index what is left, so the Josephus example isn't
>> practically a problem for us right now (plus I think our Josephus
>> module only contains Josephus text). And this is simply for indexed
>> searching. Our full text searching allows for your to search any of
>> these other field: notes, xrefs, headings, just about anything in an
>> entry attribute. We have talked about providing indexed searching for
>> some of these things, but really? how often do you search the notes?
>> Just wait the 4 seconds to do the unindexed search. But we have lots
>> of future ideas of how to modularize the search framework so a
>> frontend could supply a filter which outputs what to include in a
>> named lucene index. Anyway, tangent...
>> <verse> already indicates canonical material by default
>> Psalm titles, being canonical and usually not within a verse (unless
>> it's a v11n which includes them in a verse), need to be marked
>> specifically as canonical.
>> If the OSIS docs say different, let me know and I'll poke the editor.
>> On 02/29/2012 07:11 PM, David Haslam wrote:
>>> Thanks DM,
>>> Someone like to volunteer to enhance usfm2osis.pl to ensure that
>>> canonical="true" is set as it should be?
>>> View this message in context:
>>> Sent from the SWORD Dev mailing list archive at Nabble.com
>>> sword-devel mailing list: sword-devel at crosswire.org
>>> <mailto:sword-devel at crosswire.org>
>>> Instructions to unsubscribe/change your settings at above page
>> sword-devel mailing list: sword-devel at crosswire.org
>> <mailto:sword-devel at crosswire.org>
>> Instructions to unsubscribe/change your settings at above page
> sword-devel mailing list: sword-devel at crosswire.org
> Instructions to unsubscribe/change your settings at above page
More information about the sword-devel