[sword-devel] XML attribute delimiters in OSIS files?

Troy A. Griffitts scribe at crosswire.org
Wed Oct 26 11:50:23 MST 2011

Hey guys.  Just did some testing.  If you have a look at 
sword/tests/xmltest and try the problem case:

./xmltest "<title type='nested \"quotation\" '/>"

(xmltest already tries to add an attribute to your input which tests for 
embedded quotes, so you'll see an addedAttribute in your output)

You get:

[scribe at charis tests]$ ./xmltest "<title type='nested \"quotation\" '/>"
<title type='nested "quotation" '/>
<title type='nested "quotation" '/>
<title addedAttribute='with a " quote' type='nested "quotation" '/>
Tag name: [title]
  - attribute: [addedAttribute] = [with a " quote]
     4 parts:
  - attribute: [type] = [nested "quotation" ]
     3 parts:

  isEmpty: 1
  isEndTag: 0

It is a little odd that the second attribute has "3 parts", but looking 
at the example given, it have a space at the end, so I supposed this 
might be correct.

Hope this is helpful in tracking this down,


On 10/26/2011 06:38 PM, DM Smith wrote:
> On 10/26/2011 09:47 AM, Peter von Kaehne wrote:
>> Is there any actual credible reason for having quotation marks in 
>> attributes? I agree that it may be grammatically correct for XML as 
>> such, but OSIS's attributes are defined and do not contain quotation 
>> marks. And x-marked attributes are largely thrown out during the 
>> osis2mod run, no? Or at least ignored - apart from our own - like 
>> x-preverse.
>> Peter
> I had never spent the time to look at the allowable attribute values 
> in an OSIS document. Now, having looked at the schema, it is allowed 
> to nest quotes. See below for details.
> I think there are many good reasons that a single quote will be found 
> in an attribute value. Many languages use it for other things than 
> quoting.
> I can only think of a few, probably obscure, reasons for a double 
> quote to be there. E.g chapterTitle='xxx aka "yyy"', who='James 
> "Jimmy" Smith', ...
> Osis2mod *should* allow for all well-formed, valid (both syntactically 
> and semantically) OSIS documents. Regarding quoting attribute values, 
> the recommendation still stands, use double quotes if at all possible, 
> but also avoid " and ' too. (Note that these entities are 
> only needed within attribute values and never elsewhere in the text.)
> (Below I'm using x at y to mean element x with attribute y.)
> In looking at this, I think there are some bugs in the definition of 
> l at type, lg at type, and rdg at type.
> In Him,
>     DM
> Here are the attributes that allow for arbitrary text:
> actor at who
> <xs:attribute name="who" type="xs:string" use="optional"/>
> contributor at file-as
> <xs:attribute name="file-as" type="xs:string" use="optional"/>
> a at href
> <xs:attribute name="href" type="xs:string" use="required"/>
> abbr at expansion
> <xs:attribute name="expansion" type="xs:string" use="optional"/>
> chapter at chapterTitle
> <xs:attribute name="chapterTitle" type="xs:string" use="optional"/>
> figure at alt, @catalog, @location, @rights, @size, @src
> <xs:attribute name="alt" type="xs:string" use="optional"/>
> <xs:attribute name="catalog" type="xs:string" use="optional"/>
> <xs:attribute name="location" type="xs:string" use="optional"/>
> <xs:attribute name="rights" type="xs:string" use="optional"/>
> <xs:attribute name="size" type="xs:string" use="optional"/>
> <xs:attribute name="src" type="xs:string"/>
> index at index, @level1, @level2, @level3, @level4, @see
> <xs:attribute name="index" type="xs:string" use="required"/>
> <xs:attribute name="level1" type="xs:string" use="required"/>
> <xs:attribute name="level2" type="xs:string" use="optional"/>
> <xs:attribute name="level3" type="xs:string" use="optional"/>
> <xs:attribute name="level4" type="xs:string" use="optional"/>
> <xs:attribute name="see" type="xs:string" use="optional"/>
> item at role
> <xs:attribute name="role" type="xs:string" use="optional"/>
> label at role
> <xs:attribute name="role" type="xs:string" use="optional"/>
> milestone at marker
> <xs:attribute name="marker" type="xs:string" default="DEFAULT" 
> use="optional"/>
> milestoneEnd at start
> <xs:attribute name="start" type="xs:string" use="required"/>
> milestoneStart at end
> <xs:attribute name="end" type="xs:string" use="required"/>
> name at regular
> <xs:attribute name="regular" type="xs:string" use="optional"/>
> q at level, @marker, @who
> <xs:attribute name="level" type="xs:string" use="optional"/>
> <xs:attribute name="marker" type="xs:string" default="DEFAULT" 
> use="optional"/>
> <xs:attribute name="who" type="xs:string" use="optional"/>
> speaker at who
> <xs:attribute name="who" type="xs:string" use="optional"/>
> speech at marker
> <xs:attribute name="marker" type="xs:string" default="DEFAULT" 
> use="optional"/>
> title at short
> <xs:attribute name="short" type="xs:string" use="optional"/>
> w at gloss, @src, @xlit
> <xs:attribute name="gloss" type="xs:string" use="optional"/>
> <xs:attribute name="src" type="xs:string" use="optional"/>
> <xs:attribute name="xlit" type="xs:string" use="optional"/>
> Globally (globalWithType, globalWithoutType)
> @annotateWork, @resp, @n
> <xs:attribute name="annotateWork" type="xs:string" use="optional"/>
> <xs:attribute name="resp" type="xs:string" use="optional"/>
> <xs:attribute name="n" type="xs:string" use="optional"/>
> Milestone attributes
> @sID, @eID
> <xs:attribute name="sID" type="xs:string" use="optional"/>
> <xs:attribute name="eID" type="xs:string" use="optional"/>
> osisID, osisRef, osisAnnotateType regexes allowing quotation marks: 
> (look for [^...] constructs)
> <xs:pattern 
> value="((((\p{L}|\p{N}|_)+)(\.(\p{L}|\p{N}|_))*:)?([^:\s])+)"/>
> <xs:pattern 
> value="(((\p{L}|\p{N}|_)+)((\.(\p{L}|\p{N}|_)+)*)?:)?((\p{L}|\p{N}|_|(\\[^\s]))+)((\.(\p{L}|\p{N}|_|(\\[^\s]))+)*)?(!((\p{L}|\p{N}|_|(\\[^\s]))+)((\.(\p{L}|\p{N}|_|(\\[^\s]))+)*)?)?"/>
> <xs:pattern 
> value="(((\p{L}|\p{N}|_)+)((\.(\p{L}|\p{N}|_)+)*)?:)?((\p{L}|\p{N}|_|(\\[^\s]))+)(\.(\p{L}|\p{N}|_|(\\[^\s]))*)*(!((\p{L}|\p{N}|_|(\\[^\s]))+)((\.(\p{L}|\p{N}|_|(\\[^\s]))+)*)?)?(@(cp\[(\p{Nd})*\]|s\[(\p{L}|\p{N})+\](\[(\p{N})+\])?))?(\-((((\p{L}|\p{N}|_|(\\[^\s]))+)(\.(\p{L}|\p{N}|_|(\\[^\s]))*)*)+)(!((\p{L}|\p{N}|_|(\\[^\s]))+)((\.(\p{L}|\p{N}|_|(\\[^\s]))+)*)?)?(@(cp\[(\p{Nd})*\]|s\[(\p{L}|\p{N})+\](\[(\p{N})+\])?))?)?"/>
> Attribute extension regex:
> <xs:pattern value="x-([^\s])+"/>
> l at type
> <xs:union memberTypes="osisLine attributeExtension xs:string"/>
> lg at type
> <xs:union memberTypes="osisLineGroup attributeExtension xs:string"/>
> <xs:simpleType name="osisLineGroup">
> <xs:restriction base="xs:string">
> <!-- <xs:enumeration value="doxology"/> -->
> </xs:restriction>
> </xs:simpleType>
> rdg at type
> <xs:union memberTypes="osisRdg attributeExtension xs:string"/>
>> -------- Original-Nachricht --------
>>> Datum: Wed, 26 Oct 2011 08:59:14 -0400
>>> Von: DM Smith<dmsmith at crosswire.org>
>>> An: SWORD Developers\' Collaboration Forum<sword-devel at crosswire.org>
>>> Betreff: Re: [sword-devel] XML attribute delimiters in OSIS files?
>>> Ah, now I understand. This is a bug. And should be fixed. (BTW, not 
>>> having
>>> the entire thread reproduced in each email makes it harder to 
>>> understand
>>> the context of the email. I don't like having to go digging for the 
>>> context.
>>> Having looked, I see that the first email in the thread defines
>>> delimiters.)
>>> But I'm not sure where it should be fixed. I haven't looked at the 
>>> code,
>>> but as I recall, we use the SWORD parser to obtain the attribute 
>>> value. My
>>> guess is that it is returning it with the quotes. If the problem is 
>>> there
>>> and we fix it there, it may break a whole host of other things. 
>>> (This parser
>>> is not a true XML parser, but one that is highly optimized for speed 
>>> and
>>> thus we work with it's definition.)
>>> It should be easy to change osis2mod to work. I'll look into doing this
>>> soon.
>>> That said, it is and has been the recommendation that double quotes be
>>> used to wrap attribute values. It is valid to use single quotes, but 
>>> it may
>>> (does) expose bugs. Fixing this bug does not change this 
>>> recommendation.
>>> Until osis2mod has been changed and it is available, it is advisable to
>>> change the input so that the quoting of sID/eID pairs to be identical.
>>> In Him,
>>>     DM
>>> On Oct 26, 2011, at 6:38 AM, David Haslam wrote:
>>>> Mixing double and single quotes, as per earlier messages in this 
>>>> thread.
>>>> Example (minus the chaff):
>>>> sID="reference"
>>>> .....
>>>> eID='reference'
>>>> But this time for the same verse, just as Chris replied, rather 
>>>> than in
>>>> completely separate OSIS elements.
>>>> As this is just an observation, I see no immediate need to give a
>>> detailed
>>>> example of what happens to the module.
>>>> To locate the places where I spotted it yesterday would take some 
>>>> time.
>>>> Perhaps the most interesting thing is that there was no error message
>>> from
>>>> osis2mod.
>>>> And I agree with Chris, the OSIS needs fixing first, before using as
>>> input
>>>> for osis2mod.
>>>> David
>>>> -- 
>>>> View this message in context:
>>> http://sword-dev.350566.n4.nabble.com/XML-attribute-delimiters-in-OSIS-files-tp3907261p3940110.html 
>>>> Sent from the SWORD Dev mailing list archive at Nabble.com.
>>>> _______________________________________________
>>>> sword-devel mailing list: sword-devel at crosswire.org
>>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>>> Instructions to unsubscribe/change your settings at above page
>>> _______________________________________________
>>> sword-devel mailing list: sword-devel at crosswire.org
>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>> Instructions to unsubscribe/change your settings at above page
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page

More information about the sword-devel mailing list